Handbook of Human Motion 9783319308081

352 46 48MB

English Pages [1899]

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Handbook of Human Motion
 9783319308081

Citation preview

Visual Speech Animation Lei Xie, Lijuan Wang, and Shan Yang

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . State of the Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A Typical VSA System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Face/Mouth Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Input and Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mapping Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A Deep BLSTM-RNN-Based Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . RNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . LSTM-RNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Talking Head System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Performances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Selected Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Karaoke Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Technology Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2 3 4 4 5 7 9 13 13 14 16 20 21 22 23 25 26 27

L. Xie (*) School of Computer Science, Northwestern Polytechnical University (NWPU), Xi’an, P. R. China e-mail: [email protected]; [email protected] L. Wang Microsoft Research, Redmond, WA, USA e-mail: [email protected] S. Yang School of Computer Science, Northwestern Polytechnical University, Xi’an, China e-mail: [email protected] # Springer International Publishing AG 2016 B. Müller, S.I. Wolf (eds.), Handbook of Human Motion, DOI 10.1007/978-3-319-30808-1_1-1

1

2

L. Xie et al.

Abstract

Visual speech animation (VSA) has many potential applications in humancomputer interaction, assisted language learning, entertainments, and other areas. But it is one of the most challenging tasks in human motion animation because of the complex mechanisms of speech production and facial motion. This chapter surveys the basic principles, state-of-the-art technologies, and featured applications in this area. Specifically, after introducing the basic concepts and the building blocks of a typical VSA system, we showcase a state-of-the-art approach based on the deep bidirectional long short-term memory (DBLSM) recurrent neural networks (RNN) for audio-to-visual mapping, which aims to create a video-realistic talking head. Finally, the Engkoo project from Microsoft is highlighted as a practical application of visual speech animation in language learning. Keywords

Visual speech animation • Visual speech synthesis • Talking head • Talking face • Talking avatar • Facial animation • Audio visual speech • Audio-to-visual mapping • Deep learning • Deep neural network

Introduction Speech production and perception are both bimodal in nature. Visual speech, i.e., speech-evoked facial motion, plays an indispensable role in speech communication. Plenty of evidence shows that voice and face reinforce and complement each other in human-human communication (McGurk and MacDonald 1976). By viewing the speaker’s face (and mouth), valuable information is provided for speech perception. Visible speech is particularly effective when auditory speech is degraded or contaminated, due to acoustic noise, bandwidth limitation, or hearing impairment. In an early study, Breeuwer et. al. (Breeuwer and Plomp 1985) have already shown that the recognition of short sentences that have been band-pass filtered improves significantly when subjects are allowed to watch the speaker. The same level of improvement can be observed from hearing-impaired listeners and cochlear implant patients (Massaro and Simpson 2014). In the experiments, lipreading provides essential speech perceptual information. The influence of visual speech is not only limited to situations with degraded auditory input. In fact, Sumby and Pollack found that seeing the speaker’s face is equivalent to about 15dB signal-to-noise ratio (SNR) improvement of acoustic signal (Sumby and Pollack 1954). Due to the influence of visual speech in human-human speech communication, researchers have shown their interest in its impact in human-machine interaction. Ostermann and Weissenfeld (Ostermann and Weissenfeld 2004) have shown that trust and attention of humans toward machines increase by 30% when communicating with a talking face instead of text only. That is to say, visual speech is able to attract the attention of a user, making the human-machine interface more engaging.

Visual Speech Animation

3

Hence, visual speech animation (VSA)1 aims to animate the lips/mouth/articulators/ face synchronizing with speech for different purposes. In a broad sense, VSA may include facial expressions (Jia et al. 2011; Cao et al. 2005) and visual prosody (Cosatto et al. 2003) like head (Ben Youssef et al. 2013; Le et al. 2012; Busso et al. 2007; Ding et al. 2015; Jia et al. 2014) and eye (Le et al. 2012; Dziemianko et al. 2009; Raidt et al. 2007; Deng et al. 2005) motions, which are naturally accompanied with human speech. Readers can read Chapter Eye Motion and Chapter Head Motion Generation for more details. Applications of VSA can be found across many domains, such as technical support and customer service, communication aids, speech therapy, virtual reality, gaming, film special effects, education, and training (Hura et al. 2010). Specific applications may include a virtual storyteller for children, a virtual guider or presenter for personal or commercial Web site, a representative of user in computer games, and a funny puppetry for computer-mediated human communications. It is clearly promising that VSA will become an essential multimodal interface in many applications. Speech-evoked face animation is one of the most challenging tasks in human motion animation. Human face has an extremely complex geometric form (Pighin et al. 2006), and the speech-originated facial movements are the result of a complicated interaction between a number of anatomical layers that include the bone, muscle, fat, and skin. As a result, humans are extremely sensitive to the slightest artifacts in an animated face, and even the small subtle changes can lead to unrealistic appearance. To achieve realistic visual speech animation, tremendous efforts from speech, image, computer graphics, pattern recognition, and machine learning communities have been made since several decades ago (Parke 1972). Those efforts have been summarized in proceedings of visual speech synthesis challenge (LIPS) (Theobald et al. 2008), surveys (Cosatto et al. 2003; Ostermann and Weissenfeld 2004), featured books (Pandzic and Forchheimer 2002; Deng and Neumann 2008), and several journal special issues (Xie et al. 2015; Fagel et al. 2010). This book chapter aims to introduce the basic principles, survey the state-of-the-art technologies, and discuss featured applications.

State of the Art After decades of research, a state-of-the-art visual speech animation system currently can realize lifelike or video-realistic performance through 2D, 2.5D, or 3D face modeling and a statistical/parametric text/speech to visual mapping strategy. For instance, in Fan et al. (2016), an image-based 2D video-realistic talking head is introduced. The lower face region of a speaker is modeled by a compact model learned from a set of facial images, called active appearance model (AAM). Given pairs of the audio and visual parameter sequence, a deep neural network model is trained to learn 1

Also called visual speech synthesis, talking face, talking head, talking avatar, speech animation, and mouth animation.

4

L. Xie et al. Video/ Sensor Data

Face/Mouth Model

Visual Parameters Learn Mapping

Data Collection Audio/Text

New Audio/Text

Feature Extraction

Feature Extraction

Audio/Text Feature

Audio/Text Feature Mapping Visual Parameters

Face/Mouth Model

Animation

Fig. 1 The building blocks of a typical VSA system

the sequence mapping from audio to visual space. To further improve the realism of the proposed talking head, the trajectory tiling method is adopted to use the predicted AAM trajectory as a guide to select a smooth real sample image sequence from the recorded database. Based on the similar techniques, Microsoft has released an online visual speech animation system that can help users to learn English (Wang et al. 2012c).

A Typical VSA System As shown in Fig. 1, a typical visual speech animation system is usually composed of several modules: data collection, face/mouth model, feature extraction, and learning a mapping model.

Data Collection According to the source of data used for face/mouth/articulator modeling, a VSA system can be built from images, video recordings, and various motion capture equipments like mocap, electromagnetic articulography (EMA), magnetic resonance imaging (MRI), and X-ray. What type of data are collected essentially depends on the costs, the desired appearance of the face/head, and the application needs. Many approaches choose the straightforward way for data collection: videos of a speaker are recorded by a camera, and the image sequences are used as the source for 2D or 3D face/head modeling (Theobald et al. 2008; Bregler et al. 1997; Cosatto et al. 2003; Cosatto and Graf 1998; Xie and Liu 2007a; Wang et al. 2010a; Cosatto and Graf 2000; Anderson et al. 2013; Ezzat et al. 2002; Ezzat and Poggio 2000; Xie and Liu 2007b), as shown in Fig. 2a. A recent trend to produce quality facial animation is to use 3D motion-captured data (Deng and Neumann 2008), which have been successfully used in movie special effects to drive a virtual character. As shown in Fig. 2b, to record facial movements, an array of high-performance cameras is utilized

Visual Speech Animation

a

5

c

b

Camera Video

Motion Capture

EMA

Fig. 2 Various data collection methods in building a visual speech animation system. (a) Camera video (Theobald et al. 2008), (b) motion capture (Busso et al. 2007), and (c) EMA from http://www. gipsa-lab.grenoble-inp.fr/

to reconstruct the 3D marker locations on a subject’s face. Although the mocap system is quite expensive and difficult to set up, the reconstructed data provide accurate timing and motion information. Once the data are collected, facial animation can be created by controlling underlying muscle structure or blend shapes (see Chapter Blendshape Facial Animation for details). Another data collection system, EMA, as shown in Fig. 2c, is often used to record the complex movements of the lips, jaw, tongue, and even intraoral articulators (Richmond et al. 2011). The sensors, called coils, are attached to different positions on a speaker’s face or in the mouth, and the 3D movements of the sensors are collected in a high frame rate (e.g., 200 frames per second) during the speaker’s talking. Visual speech animation generated by the EMA data is usually used for articulation visualization (Huang et al. 2013; Fagel and Clemens 2004; Wik and Hjalmarsson 2009). In Wang et al. (2012a), an animated talking head is created based on the EMA articulatory data for the pronunciation training purposes.

Face/Mouth Model The appearance of a visual speech animation system is determined by the underlying face/mouth model, while generating animated talking heads that look like real people is challenging. The existing approaches to talking heads use either imagebased 2D models (Seidlhofer 2009; Zhang et al. 2009) or geometry-based 3D ones (Musti et al. 2014).Cartoon avatars are relatively easier to build. The more humanlike, realistic avatars, which can be seen in some games or movies, are much harder to build. Traditionally, expensive motion capture systems are required to track the real person’s motion or, in an even more expensive way, have some artists to manually hand touch every frame. Some desirable features of the next generation avatar are as follows: it should be a 3D avatar to be integrated easily into a versatile 3D virtual world; it should be photo-realistic; it can be customized to any user; last but not least, an avatar should be automatically created with a small amount of recorded data. That is to say, the next generation avatar should be 3D, photo-realistic, personalized or customized, and easy to create with little bootstrapping data.

6

L. Xie et al.

In facial animation world, a great variety of different animation techniques based on 3D models exist (Seidlhofer 2009). In general, these techniques first generate a 3D face model consisting of a 3D mesh, which defines the geometry shape of a face. For that a lot of different hardware systems are available, which range from 3D laser scanners to multi-camera systems. In a second step, either a human-like or cartoon-like texture may be mapped onto the 3D mesh. Besides generating a 3D model, animation parameters have to be determined for the later animation. A traditional 3D avatar requires a highly accurate geometric model to render soft tissues like lips, tongue, wrinkles, etc. It is both computationally intensive and mathematically challenging to make or run such a model. Moreover, any unnatural deformation will make the resultant output fall into the uncanny valley of human rejection. That is, it will be rejected as unnatural. Image-based facial animation techniques achieve great realism in synthesized videos by combining different facial parts of recorded 2D images (Massaro 1998; Zhang et al. 2009; Eskenazi 2009; Scott et al. 2011; Badin et al. 2010). In general, image-based facial animations consist of two main steps: audiovisual analysis of a recorded human subject and synthesis of facial animation. In the analysis step, a database with images of deformable facial parts of the human subject is collected, while the time-aligned audio file is segmented into phonemes. In the second step, a face is synthesized by first generating the audio from the text using a text-to-speech (TTS) synthesizer. The TTS synthesizer sends phonemes and their timing to the face animation engine, which overlays facial parts corresponding to the generated speech over a background video sequence. Massaro (1998), Zhang et al. (2009), Eskenazi (2009), and Scott et al. (2011) show some image-based speech animation that cannot be distinguished from recorded video. However, it is challenging to change head pose freely or to render different facial expressions. Also, it is hard to blend it into 3D scenes seamlessly. Image-based approaches have their advantages that the photo realistic appearance is guaranteed. However, a talking head needs to be not just photo-realistic in a static appearance but also exhibit convincing plastic deformations of the lips synchronized with the corresponding speech, realistic head movements, and natural facial expressions. An ideal 3D talking head can mimic realistic motion of a real human face in 3D space. One challenge for rendering realistic 3D facial animation is on the mouth area. Our lip, teeth, and tongue are mixed with nonrigid tissues, and sometimes with occlusions. This means accurate geometric modeling is difficult, and also it is hard to deform them properly. Moreover, they need to move together in sync with spoken audio; otherwise, people can observe the asynchrony and think it unnatural. In real world, when people talk, led by vocal organs and facial muscles, both the 3D geometry and texture appearance of the face are constantly changing. Ideally, we can capture both geometry change and texture change simultaneously. There is lot of ongoing research for solving this problem. For example, with the help of Microsoft Kinect kinds of motion sensing device, people try to use the captured 3D depth information to better acquire the 3D geometry model. On the other hand, people try to recover the 3D face shape from single or multiple camera views (Wang et al. 2011; Sako et al. 2000; Yan et al. 2010). In 2.5D talking head as above, as there is no captured 3D geometry information available, they adopt the work in Sako et al. (2000) which reconstructs a 3D face model from a single frontal face image. The only required input to the 2D-to-3D system is a

Visual Speech Animation

7

frontal face image of a subject with normal illumination and neutral expression. A semi-supervised ranking prior likelihood models for accurate local search and a robust parameter estimation approach are used for face alignment. Based on this 2D alignment algorithm, 87 key feature points are automatically located, as shown in Fig. 3. The feature points are accurate enough for face reconstruction in most cases. A general 3D face model is applied for personalized 3D face reconstruction. The 3D shapes have been compressed by the PCA. After the 2D face alignment, the key feature points are used to compute the 3D shape coefficients of the eigenvectors. Then, the coefficients are used to reconstruct the 3D face shape. Finally, the face texture is extracted from the input image. By mapping the texture onto the 3D face geometry, the 3D face model for the input 2D face image is reconstructed. They reconstruct a 3D face model for each 2D image sample in recordings, as examples shown in Fig. 3. Thus a 3D sample library is formed, where each 3D sample has a 3D geometry mesh, a texture, and the corresponding UV mapping which defines how a texture is projected onto a 3D model. After 2D-to-3D transformation, original 2D sample recordings turn into 3D sample sequences, which consist of three synchronous streams: geometry mesh sequences for depicting the dynamic shape, texture image sequences for the changing appearance, and the corresponding speech audio. This method combines the best of both 2D image sample-based and 3D model-based facial animation technologies. It renders realistic articulator animation by wrapping 2D video images around a simple and smooth 3D face model. The 2D video sequence can capture the natural movement of soft tissues, and it helps the new talking head to bypass the difficulties in rendering occluded articulators (e.g., tongue and teeth). Moreover, with the versatile 3D geometry model, different head poses and facial expressions can be freely controlled. The 2.5D talking head can be customized to any user by using the 2D video of the user. Techniques based on 3D models impress by their great automatism and flexibility while lacking in realism. Image-based facial animation achieves photo-realism while having little flexibility and lower automatism. The image-based techniques seem to be the best candidates for leading facial animation to new applications, since these techniques achieve photo-realism. The image-based technique combined with a 3D model generates photo realistic facial animations, while providing some flexibility to the user.

Input and Feature Extraction According to the input signal, a visual speech animation system can be driven by text, speech, and performance. The simplest VSA aims to visualize speech pronunciations by an avatar from tracked makers of human performance. Currently, performance-based facial animation can be quite realistic (Thies et al. 2016; Wang and Soong 2012; Weise et al. 2011). The aim of such a system is usually not only for speech visualization. For example, in Thies et al. (2016), an interesting application for real-time facial reenactment is introduced. Readers can go through Chapter Video-based Performance Driven Facial Animation for more details. During the facial data collection process, speech and text are always collected as well. Hence, visual speech can be driven by new voice or text input, achieved by a

8

L. Xie et al.

Fig. 3 Auto-reconstructed 3D face model in different mouth shapes and in different view angles (w/o and w/ texture)

learned text/audio-to-visual mapping model that will be introduced in the next section. To learn such a mapping, a feature extraction module is firstly used to obtain representative text or audio features. The textual feature is often similar to that used for TTS system (Taylor 2009), which may include information about phonemes, syllables, stresses, prosodic boundaries, and part-of-speech (POS) labels. Audio features can be typical spectral features (e.g., MFCC (Fan et al. 2016)), pitch, and other acoustic features.

Visual Speech Animation

9

Mapping Methods Both text- and speech-driven visual speech animation systems desire an input to visual feature conversion or mapping algorithm. That is to say, the lip/mouth/facial movements must be naturally synchronized with the audio speech.2 The conversion is not trivial because of the coarticulation phenomenon of the human speech production mechanism, which causes a given phoneme to be pronounced differently depending on the surrounding phonemes. Due to this phenomenon, learning an audio/text-to-visual mapping becomes an essential task in visual speech animation. Researchers have devoted much effort in this task, and the developed approaches can be roughly categorized into rule based, concatenation, parametric, and hybrid.

Rule Based Due to the limitations of data collection and learning methods, early approaches are mainly based on hand-crafted mapping rules. In these approaches, the counterpart of audio phoneme, viseme, is defined as the basic visual unit. Typically, visemes are manually designed as key images of mouth shapes, as shown in Fig. 4, and empirical smooth functions or coarticulation rules are used to synthesize novel speech animations. Ezzat and Poggio propose a simple approach by morphing key images of visemes (Ezzat and Poggio 2000). Due to the coarticulation phenomenon, morphing between a set of mouth images is apparently not natural. Cohen and Massaro (Cohen and Massaro 1993) propose a coarticulation model, where a viseme shape is defined via dominance functions that are defined in terms of each facial measurement, such as the lips, tongue tip, etc., and the weighted sum of dominance values determines the final mouth shapes. In a recent approach, Sarah et. al. (Taylor et al. 2012) argue that static mouth shapes are not enough, so they redefine visemes as clustered temporal units that describe distinctive speech movements of the visual speech articulators, called dynamic visemes. Concatenation/Unit selection To achieve photo- or video-realistic animation effect, concatenation of real video clips from a recorded database has been considered (Bregler et al. 1997; Cosatto et al. 2003; Cosatto and Graf 1998; 2000). The idea is quite similar with that in concatenative TTS (Hunt and Black 1996). In the off-line stage, a database of recorded videos is cut into sizable clips, e.g., triphone units. In the online stage, given a novel text or speech target, a unit selection process is used to select appropriate units and assemble them in an optimal way to produce the desired target, as shown in Fig. 5. To achieve speech synchronization and a smooth video, the concatenation algorithm should be elaborately designed. In Cosatto et al. (2003), a phonetically labeled target is first produced by a TTS system or by a labeler or an aligner from the recorded audio. From the phonetic target, a graph is created with states corresponding to the frames of the final animation. Each state of the final animation 2

Sometimes this task is called lip synchronization or lip sync for short.

10

L. Xie et al.

Fig. 4 Several defined visemes from Ezzat and Poggio (2000)

(a video frame) is populated with a list of candidate nodes (a recorded video sample from the database). Each state is fully connected to the next, and concatenation costs are assigned for each arc, while target costs are assigned to each node. A Viterbi search on the graph finds the optimal path, i.e., the path that generates the lowest total cost. The balance between the two costs is critical in the final performance, and its weighting is empirically tuned in real applications. The video clips for unit selection are usually limited to lower part of the face that has most speech-evoked facial motions. After selection, the concatenated lower face clips are stitched to a background whole face video, resulting in the synthesized whole face video, as shown in Fig. 6. To achieve seamless stitches, much efforts have been made on image processing. With a relatively large video database, the concatenation approach is able to achieve video-realistic performance. But it might be difficult to add different expressions, and the flexibility of the generated visual speech animation is also limited.

Visual Speech Animation w

ih

11 n

d

ow Frame

Reconstructed Face Images

Image Candidates

Fig. 5 Unit selection approach for visual speech animation (Fan et al. 2016)

Sample

Lower Face Image

Background Image/Sequence

Stitched Image/Sequence

Mask

Fig. 6 Illustration of the image stitching process in a video-realistic talking head (Fan et al. 2016)

Parametric/Statistical Recently, parametric methods have gained much attention due to their elegantly automatic learned mappings from data. Numerous attempts have been made to model the relationship between audio and visual signals, and many are generative probabilistic model based, where the underlying probability distributions of audiovisual data are estimated. Typical models include Gaussian mixture model (GMM), hidden Markov model (HMM) (Xie and Liu 2007a; Fu et al. 2005), dynamical Bayesian network (DBN) (Xie and Liu 2007b), and switching linear dynamical system (SLDS) (Englebienne et al. 2007).

12

L. Xie et al.

The hidden Markov model-based statistical parametric speech synthesis (SPSS) has made a significant progress (Tokuda et al. 2007). Hence, the HMM approach was also intensively investigated for visual speech synthesis (Sako et al. 2000; Masuko et al. 1998). In HMM-based visual speech synthesis, auditory speech and visual speech are jointly modeled in HMMs, and the visual parameters are generated from HMMs by using the dynamic (“delta”) constraints of the features (Breeuwer and Plomp 1985). Convincing mouth video can be rendered from the predicted visual parameter trajectories. This approach is called trajectory HMM. Usually, maximum likelihood (ML) is used as the criterion for HMM model training. However, ML training does not optimize directly toward visual generation error. To compensate this deficiency, a minimum generated trajectory error (MGE) method is proposed in Wang et al. (2011) to further refine the audiovisual joint modeling by minimizing the error between the generation result and the real target trajectories in the training set. Although HMM can model sequential data efficiently, there are still some limitations, such as the wrong model assumptions out of necessity, e.g., GMM and its diagonal covariance, and the greedy, hence suboptimal, search-derived decisiontree-based contextual state clustering. Motivated by the superior performance of deep neural networks (DNN) in automatic speech recognition (Hinton et al. 2012) and speech synthesis (Zen et al. 2013), a neural network-based photo-realistic talking head is proposed in Fan et al. (2015). Specifically, a deep bidirectional long short-term memory recurrent neural network (BLSTM-RNN) is adopted to learn a direct regression model by minimizing the sum of square error (SSE) of predicting visual sequence from label sequence. Experiments have confirmed that the BLSTM approach significantly outperforms the HMM approach (Fan et al. 2015). The BLSTM approach will be introduced in detail later in this chapter.

Hybrid Although parametric approaches have many merits like small footprint, flexibility, and controllability, one obvious drawback of those approaches is the blurring animation due to the feature dimension reduction and the non-perfect learning method. So there are some hybrid visual speech animation approaches that use the predicted trajectory to guide the sample selection process (Wang et al. 2010b), which combines the advantages of both the video-based concatenation and the parametric statistical modeling approaches. In a recent approach (Fan et al. 2016), visual parameter trajectory predicted by a BLSTM-RNN is used as a guide to select a smooth real sample image sequence from the recorded database.

A Deep BLSTM-RNN-Based Approach In the past several years, deep neural networks (DNN) and deep learning methods (Deng and Yu 2014) have been successfully used in many tasks, such as speech recognition (Hinton et al. 2012), natural language processing, and computer vision. For example, the DNN-HMM approach has boosted the speech recognition accuracy

Visual Speech Animation

13

significantly (Deng and Yu 2014). Deep neural networks have been investigated for regression/mapping tasks, e.g., text to speech (Zen et al. 2013), learning clean speech from noisy speech for speech enhancement (Du et al. 2014), and articulatory movement prediction from text and speech (Zhu et al. 2015). There are several advantages of the DNN approaches: it can model long-span, high-dimensional, and the correlation of input features; it is able to learn nonlinear mapping between input and output with a deep-layered, hierarchical, feed-forward, and recurrent structure; it has the discriminative and predictive capability in generation sense, with appropriate cost function(s), e.g., generation error. Recently, recurrent neural networks (RNNs) (Williams and Zipser 1989) and their bidirectional variant, bidirectional RNNs (BRNNs) (Schuster and Paliwal 1997), become popular because they are able to incorporate contextual information that is essential for sequential data modeling. Conventional RNNs cannot well model the long-span relations in sequential data because of the vanishing gradient problem (Hochreiter 1998). Hochreiter and Schmidhuber (1997) found that the LSTM architecture, which uses purpose-built memory cells to store information, is better at exploiting long-range context. Combining BRNNs with LSTM gives BLSTM, which can access long-range context in both directions. Speech, both in auditory and visual forms, is typical sequential data. In a recent study, BLSTM has shown state-of-the-art performance in audio-to-visual sequential mapping (Fan et al. 2015).

RNN Allowing cyclical connections in a feed-forward neural network, a recurrent neural network (RNN) is formed (Williams and Zipser 1989). RNNs are able to incorporate contextual information from previous input vectors, which allows them to remember past inputs and allows them to persist in the network’s internal state. This property makes them an attractive model for sequence-to-sequence learning. For a given input vector sequence x = (x1,x2...,xT), the forward pass of RNNs is as follows: ht ¼ HðWxh xt þ Whh ht1 þ bh Þ,

(1)

yt ¼ Why ht þ by ,

(2)

where t = 1,...,T and T is the length of the sequence; h = (h1,...,hT) is the hidden state vector sequence computed from x; y = (y1,...,yT) is the output vector sequence; W is the weight matrices, where Wxh, Whh, and Why are the input-hidden, hidden-hidden, and hidden-output weight matrices, respectively. bh and by are the hidden and output bias vectors, respectively, and H denotes the nonlinear activation function in the output layer. For the visual speech animation task, because of the speech coarticulation phenomenon, a model is desired to have access to both the past and future contexts. Bidirectional recurrent neural networks (BRNNs), as shown in Fig. 7, fit this task

14

L. Xie et al.

Outputs Backward Layer Forward Layer

Inputs

Fig. 7 Bidirectional recurrent neural networks (BRNNs) !

well. A BRNN computes both forward state sequence h and backward state !

sequence h , as formulated below:   ! ! ! ! , h t ¼ H Wx !h xt þ W ! h h þ b t1 h h

(3)

 h t ¼ H Wx h x t þ W h

(4)

!

!! h

hh

t1

 þ bh ,

!

h þ W h y h t þ by : yt ¼ W ! hy t

(5)

LSTM-RNN Conventional RNNs can access only a limited range of context because of the vanishing gradient problem. Long short-term memory (LSTM) uses purpose-built memory cells, as shown in Fig. 8, to store information, which is designed to overcome this limitation. In sequence-to-sequence mapping tasks, LSTM has been shown capable of bridging very long time lags between input and output sequences by enforcing constant error flow. For LSTM, the recurrent hidden layer function H is implemented as follows: it ¼ σ ðWxi xt þ Whi ht1 þ Wci ct1 þ bi Þ,   f t ¼ σ Wxf xt þ Whf ht1 þ Wci ct1 þ bf ,

(7)

at ¼ τðWxc xt þ Whc ht1 þ bc Þ,

(8)

ct ¼ f t ct1 þ it at ,

(9)

(6)

Visual Speech Animation

15

ht

Fig. 8 Long short-term memory (LSTM)

ot

Output gate

ct Cell

it

Forget gate

ft

Input gate Memory Block

xt ot ¼ σ ðWxo xt þ Who ht1 þ Wco ct þ bo Þ,

(10)

ht ¼ ot θðct Þ,

(11)

where σ is the sigmoid function; i, f, o, a, and c are input gate, forget gate, output gate, cell input activation, and cell memory, respectively. τ and θ are the cell input and output nonlinear activation functions; generally tanh is chosen. The multiplicative gates allow LSTM memory cells to store and access information over long periods of time, thereby avoiding the vanishing gradient problem. Combining BRNNs with LSTM gives rise to BLSTM, which can access longrange context in both directions. Motivated by the success of deep neural network architectures, deep BLSTM-RNNs (DBLSTM-RNNs) are used to establish the audio-to-visual mapping for visual speech animation. Deep BLSTM-RNN is created by stacking multiple BLSTM hidden layers.

The Talking Head System Figure 9 shows the diagram of an image-based talking head system using DBLSTM as the mapping function (Fan et al. 2015). The diagram actually follows the basic structure of a typical visual speech animation system in Fig. 1. The aim of the system is to achieve speech animation with video-realistic effects. Firstly, an audio/visual database of a subject talking to a camera with frontal view of his/her face is recorded as the training data. In the training stage, the audio is converted into a sequence of

16

L. Xie et al.

contextual phoneme labels L using forced alignment, and the corresponding lower face image sequence is transformed into active appearance model (AAM) feature vectors V. Then a deep BLSTM neural network is used to learn a regression model between the two audio and visual parallel sequences by minimizing the SSE of the prediction, in which the input layer is the label sequence L and the output prediction layer is the visual feature sequence V. In the synthesis stage, for any input text with natural or synthesized speech by TTS, the label sequence L is extracted from them ^ are predicted using the well-trained deep and then the visual AAM parameters V ^ can be BLSTM network. Finally, the predicted AAM visual parameter sequence V reconstructed to high-quality photo-realistic face images and rendering the full-face talking head with lip-synced animation.

Label Extraction The input sequence L and output feature sequence V are two time-varying parallel sequences. The input of a desired talking head system can be any arbitrary text along with natural audio recordings or synthesized speech by TTS. For natural recordings, the phoneme/state time alignment can be obtained by conducting forced alignment using a trained speech recognition model. For TTS-synthesized speech, the phoneme/state sequence and time offset are a by-product of the synthesis process. Therefore, for each speech utterance, the phoneme/state sequence and their time offset are converted into a label sequence, denoting as L = (i1,...,it,...,iT), where T is the number of frames in the sequence. The format of the frame-level label it uses the one-hot representation, i.e., one vector for each frame, shown as follows: 2

3

40, . . . , 0, . . . , 1; 1, . . . , 0, . . . , 0 ; 0, 0, 1, . . . , 0 ; 0, 1, 0 5, |fflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflffl} |fflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflffl} |fflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflffl} |fflffl{zfflffl} K

K

K

3

where K denotes the number of phonemes. In Fan et al. (2015), triphone and the information of three states to identify it are used. The first three K-element sub-vectors denote the identities of the left, current, and right phonemes in the triphone, respectively, and the last three elements represent the phoneme state. Please note that the contextual label can be easily extended to contain richer information, like position in syllable, position in word, stress, part of speech, etc. But if the training data is limited, we may consider the phoneme and state level labels only.

Face Model and Visual Feature Extraction In the system (Fan et al. 2015), the visual stream is a sequence of lower face images which are strongly correlated to the underlying speech. As the raw face image is hard to model directly due to the high dimensionality, active appearance model (AAM) (Cootes et al. 2001) is used as face model for visual feature extraction. AAM is a joint statistical model compactly representing both the shape and the texture variations and the correlation between them.

Visual Speech Animation

17

Text & Audio

A/V Database

Text & Audio

Label Extraction

L Deep BLSTM Training

Visual Feature Extraction

Face images

Label Extraction

L

Prediction

V

ˆ V

NN Model

Image Reconstruction

Fig. 9 Diagram of an image-based talking head system using DBLSTM-RNN as the mapping (Fan et al. 2015)

Since the speaker moves his/her head naturally during recording, head pose normalization among all the face images is performed before AAM modeling. With the help of an effective 3D model-based head pose tracking algorithm, the head pose in each image frame is normalized to a fully frontal view and further aligned. Facial feature points and the texture of a lower face used in Fan et al. (2015) are shown in Fig. 10. The shape of the jth lower face, sj, can be represented by the concatenation of the x and y coordinates of N facial feature points:   sj ¼ xj1 , xj2 , . . . , xjN , yj1 , yj2 ,:::, yjN ,

(12)

where j = 1,2,..., J and J is the total number of the face images. In this work, a set of 51 facial feature points is used, as shown in Fig. 10a. The mean shape is simply defined by s0 ¼

XJ

s =J: j¼1 j

(13)

Applying principal component analysis (PCA) to all J shapes, sj can be given approximately by sj ¼ s0 þ

XNshape i¼1

aji~s i ¼ s0 þ aj Ps ,

(14)

18

L. Xie et al.

a

b

Fig. 10 Facial feature points and the texture of a lower face used in Fan et al. (2015). (a) 51 facial feature points. (b) The texture of a lower face

 T where Ps ¼ ~s 1 , ~s 2 , . . . , ~s i , . . . ¼ ~s Nshape denotes the eigenvectors corresponding to   the N shape largest eigenvalues and aj ¼ aj1 , aj2 , . . . , aji , . . . , ajN shape is the jth shape parameter vector. Accordingly, the texture of the jth face image, tj, is defined by a vector concatenating the R/G/B value of every pixel that lies inside the mean shape, so   tj ¼ r j1 , . . . , r jU , gj1 , . . . , gjU , bj1 , . . . , bjU ,

(15)

where j = 1,2,..., J and U is the total number of pixels. As the dimensionality of the texture vector is too high to use PCA directly, EM-PCA (Roweis 1998) is used instead to all J textures. As a result, the jth texture tj can be given approximately by tj ¼ t0 þ

XNtexture i¼1

bjt~t i ¼ t0 þ bj Pt ,

(16)

where t0 is the mean texture. Pt contains the eigenvectors corresponding to the Ntexture largest eigenvalues, and bj is the jth texture parameter vector. The above shape and texture models can only control the shape and texture separately. In order to recover the correlation between the shape and the texture, aj and bj are combined in another round of PCA:

Visual Speech Animation

19



 XNappearance aj , b j ¼ vji v~i ¼ vj Pv , i¼1

(17)

and assuming that Pvs and Pvt are formed by extracting the first Nshape and the last Ntexture values from each component in Pv. Simply combining the above equations gives sj ¼ s0 þ vj Pvs Ps ¼ s0 þ vj Qs ,

(18)

tj ¼ t0 þ vj Pvt Pt ¼ t0 þ vj Qt :

(19)

Now, the shape and texture of the jth lower face image can be constructed by a single vector vj. vj is the jth appearance parameter vector which is used as the AAM visual feature. Subsequently, the lower face sequence with T frames can be represented by the visual feature sequence V = (v1,...,vt,...,vT).

DBLSTM-RNN Model Training In the training stage, multiple sequence pairs of L and V are available. As both sequences are represented as continuous numerical vectors, the network is treated as a regression model minimizing the SSE of predicting V from L. In the synthesis stage, given any arbitrary text along with natural or synthesized speech, they are firstly converted into a sequence of input features and then fed into the trained network. The output of the network is the predicted visual AAM feature sequence. After reconstructing the AAM feature vectors to RGB images, photo-realistic image sequence of the lower face is generated. Finally, the lower face is stitched to a background face and the facial animation of the talking head is rendered. Learning deep BLSTM network can be regarded as optimizing a differentiable error function: EðwÞ ¼

XMtrain k¼1

Ek ðwÞ,

(20)

where Mtrain represents the number of sequences in the training data and w denotes the network internode weights. In the task, the training criterion is to minimize the ^ ¼ ð^ SSE between the predicted visual features V v 1 , ^v 2 ,:::, ^v T Þ and the ground truth V = (v1, v2, ... , vT). For a particular input sequence k, the error function takes the form E k ðw Þ ¼

XT k

E ¼ t¼1 kt

1 XT k ^v k vk 2 , t t t¼1 2

(21)

where Tk is the total number of frames in the kth sequence. In every iteration, the error gradient is computed with the following equation: Δwðr Þ ¼ mΔwðr  1Þ  α

@Eðwðr ÞÞ , @wðr Þ

(22)

20

L. Xie et al.

where 0  α  1 is the learning rate, 0  m  1 is the momentum parameter, and w (r) represents the vector of weights after rth iteration of update. The convergence condition is that the validation error has no obvious change after R iterations. Backpropagation through time (BPTT) algorithm is usually adopted to train the network. In the BLSTM hidden layer, BPTT is applied to both forward and backward hidden nodes and back-propagates layer by layer, taking the error function derivatives with respect to the   output of the network as an example. For k k k k ^ , because the activation function ^v ¼ ^v , . . . , ^v , . . . , ^v in the kth V t

t1

tj

tN appearance

used in the output layer is an identity function, we have ^v ktj ¼

X h

woh zkht ,

(23)

where o is the index of the an output node, zkht is the activation of a node in the hidden layer connected to the node o, and woh is the weight associated with this connection. By applying the chain rule for partial derivatives, we can obtain k @Ekt XNappearance @Ekt @^v tj ¼ , j¼1 @woh @^v ktj @woh

(24)

and according to Eqs. (21) and (23), we can derive  XNappearance  @Ekt k k k ^ ¼  v v tj tj zht , j¼1 @woh

(25)

@Ekt XT @Ekt ¼ : t¼1 @w @woh oh

(26)

Performances The performances of the DBLSTM-based talking head are evaluated on an A/V database with 593 English utterances spoken by a female in a neutral style (Fan et al. 2015). The DBLSTM approach is compared with the previous HMM-based approach (Wang and Soong 2015). The results for FBB128 DBLSTM3 and HMM are shown in Table 1. We can clearly see that the deep BLSTM approach outperforms the HMM approach by a large margin in terms of the four objective metrics. Subjective evaluation is also carried out in Fan et al. (2015). Ten sequences of labels are randomly selected from the test set as the input. The deep BLSTM-based

3

FBB128 means two BLSTM layers sitting on top of one feed-forward layer and each layer has 128 nodes.

Visual Speech Animation Table 1 Performance comparison between deep BLSTM and HMM

21

Comparison HMM DBLSTM

45.7% DBLSTM-RNNs

RMSE (shape) 1.223 1.122

RMSE (texture) 6.602 6.286

28.6% Neutral

RMSE (appearance) 167.540 156.502

CORR 0.582 0.647

25.7% HMM

Fig. 11 The percentage preference of the DBLSTM-based and HMM-based photo-real talking heads

and the HMM-based talking head videos are rendered, respectively. For each test sequence, the two talking head videos are played side by side randomly with original speech. A group of 20 subjects are asked to perform an A/B preference test according to the naturalness. The percentage preference is shown in Fig. 11. It can be seen clearly that the DBLSTM-based talking head is significantly preferred to the HMM-based one. Most subjects prefer the BLSTM-based talking head because its lip movement is smoother than the HMM-based one. Some video clips of the synthesized talking head can be found from Microsoft Research (2015).

Selected Applications Avatars, with lively visual speech animation, are increasingly being used to communicate with users on a variety of electronic devices, such as computers, mobile phones, PDAs, kiosks, and game consoles. Avatars can be found across many domains, such as customer service and technical support, as well as in entertainment. Some of the many uses of avatars include the following: • • • • •

Reading news and other information to users Guiding uses through Web sites by providing instructions and advice Presenting personalized messages on social Web sites Catching users attention in advertisements and announcements Acting as digital assistants and automated agents for self-service, contact centers, and help desks • Representing character roles in games • Training users to perform complex tasks • Providing new branding opportunities for organizations Here, we focus on one application that uses talking head avatar for audio/visual computer-assisted language learning (CALL). Imagine a child learning from his favorite TV star who appears to be personally teaching him English on his handheld device. Another youngster might show off her own avatar that tells mystery stories in a foreign language to her classmates. The

22

L. Xie et al.

speech processing technologies “talking head” are notable in its potential for enabling such scenarios. These features have been successfully tested in a largescale DDR project called Engkoo (Wang et al. 2012c), from Microsoft Research Asia. It is used by ten million English learners in China per month and was the winner of the Wall Street Journal 2010 Asian Innovation Readers’ Choice Award (Scott et al. 2011). Talking head generates karaoke-style short synthetic videos demonstrating oral English. The videos consist of a photo-realistic person speaking English sentences crawled from the Internet. The technology leverages a computer-generated voice with native speaker-like quality and synchronized subtitles on the bottom of the video; it emulates popular karaoke-style videos specifically designed for a Chinese audience in order to increase user engagement. Compared to using prerecorded human voice and video in English education tools, these videos not only create a realistic look and feel but also greatly reduce the cost of content creation by generating arbitrary content sources synthetically and automatically. The potential for personalization is there as well. For example, a choice of voice based on preferred gender, age, speaking rate, or pitch range and dynamics can be made, and the selected type of voice can be used to adapt a pre-trained TTS such that the synthesized voice can be customized.

Motivation Language teachers have been avid users of technology for a while now. The arrival of the multimedia computer in the early 1990s was a major breakthrough because it combined text, images, sound, and video in one device and permitted the integration of the four basic skills of listening, speaking, reading, and writing. Nowadays, as personal computers become more pervasive, smaller, and more portable, and with devices such as smartphones and tablet computers dominating the market, multimedia and multimodal language learning can be ubiquitous and more self-paced. For foreign language users, learning correct pronunciation is considered by many to be one of the most arduous tasks if one does not have access to a personal tutor. The reason is that the most common method for learning pronunciation, using audio tapes, severely lacks completeness and engagement. Audio data alone may not offer users complete instruction on how to move their mouth/lips to sound out phonemes that may be nonexistent in their mother tongue. And audio as a tool of instruction is less motivating and personalized for learners. As supported by studies in cognitive informatics, information is processed by humans more efficiently when both audio and visual techniques are utilized in unison. Computer-assisted audiovisual language learning increases user engagement when compared to audio alone. There are many existing bodies of work that use visualized information and talking head to help language learning. For example, Massaro (1998) used visual articulation to show the internal structure of the mouth, enabling learners to visualize the position and movement of the tongue. Badin et. al (2010) inferred learners’ tongue position and shape to provide visual articulatory

Visual Speech Animation

23

corrective feedback in second language learning. Additionally, a number of studies done in Eskenazi (2009) focused on overall pronunciation assessment and segmental/prosodic error detection to help learners improve their pronunciation with computer feedback. In the project in Wang et al. (2012c), the focus is on generating a photo-realistic, lip-synced talking head as a language assistant for multimodal, web-based, and low-cost language learning. The authors feel that a lifelike assistant offers a more authoritative metaphor for engaging language learners, particularly younger demographics. The long-term goal is to create a technology that can ubiquitously help users anywhere, anytime, from detailed pronunciation training to conversational practice. Such a service is especially important as a tool for augmenting human teachers in areas of the world where native, high-quality instructors are scarce.

Karaoke Function Karaoke, also known as KTV, is a major pastime among Chinese people, with numerous KTV clubs found in major cities in China. A karaoke-like feature has been added to Engkoo, which enables English learners to practice their pronunciation online by mimicking a photo-realistic talking head lip-synchronously within a search and discovery ecosystem. This “KTV function” is exposed as videos generated from a vast set of sample sentences mined from the web. Users can easily launch the videos with a single click at the sentence of their choosing. Similar to the karaoke format, the videos display the sentence on the screen, while a model speaker says it aloud, teaching the users how to enunciate the words, as shown in Fig. 12. Fig. 13 shows the building blocks of the KTV system. While the subtitles of karaoke are useful, it should be emphasized that the pacing offered is especially valuable when learning a language. Concretely, the rhythm and the prosody embedded in the KTV function offer users the timing cues to utter a given sentence properly. Although pacing can be learned from listening to a native speaker, what is offered uniquely in this system is the ability to get this content at scale and on demand. The KTV function offers a low-cost method for creating highly engaging, personalizable learning material utilizing the state-of-the-art talking head rendering technology. One of the key benefits is the generation of lifelike video as opposed to cartoon-based animations. This is important from a pedagogical perspective because the content appears closer in nature to a human teacher, which reduces the perceptive gap that students, particularly younger pupils, need to make from the physical classroom to the virtual learning experience. The technology can drastically reduce language learning video production costs in scenarios where the material requires a human native speaker. Rather than repeatedly taping an actor speaking, the technique can synthesize the audio and video content automatically. This has the potential for further bridging the classroom and e-learning scenarios where a teacher can generate his talking head for students to take home and learn from.

24

L. Xie et al.

Fig. 12 Screenshots of Karaoke-like talking heads on Engkoo. The service is accessible at http:// dict.bing.com.cn

Visual Speech Animation

25

Fig. 13 Using talking head synthesis technology for KTV function on Engkoo

Technology Outlook The current karaoke function, despite its popularity with web users, can be further enhanced to reach the long-term goal, the vision being that of creating an indiscernibly lifelike computer assistant, at low cost and web based, helpful in many language learning scenarios, such as interactive pronunciation drills and conversational training. To make the talking head more lifelike and natural, a new 3D photo-realistic, realtime talking head is proposed with a personalized appearance (Wang and Soong 2012). It extends the prior 2D photo-realistic talking head to 3D. First, approximately 20 minutes of audiovisual 2D video is recorded with prompted sentences spoken by a human speaker. A 2D-to-3D reconstruction algorithm is adopted to automatically wrap the 3D geometric mesh with 2D video frames to construct a training database, as shown in Fig. 14. In training, super feature vectors consisting of 3D geometry, texture, and speech are formed to train a statistical, multi-streamed HMM. The model is then used to synthesize both the trajectories of geometry animation and dynamic texture. As far as the synthesized audio (speech) output is concerned, the research direction is to make it more personalized, adaptive, and flexible. For example, a new algorithm which can teach the talking head to speak authentic English sentences which sound like a Chinese ESL learner has been proposed and successfully tested. Also, synthesizing more natural and dynamic prosody patterns for ESL learners to mimic is highly desirable as an enhanced feature of the talking head. The 3D talking head animation can be controlled by the rendered geometric trajectory, while the facial expressions and articulator movements are rendered with the dynamic 2D image sequences. Head motions and facial expressions can also be separately controlled by manipulating corresponding parameters. A talking head for a movie star or celebrity can be created by using their video recordings. With the new 3D, photo-realistic talking head, the era of having lifelike, web-based, and interactive learning assistants is on the horizon. The phonetic search can be further improved by collecting more data, both in text and speech, to generate the phonetic candidates to cover the generic and localized spelling/pronunciation errors committed by language learners at different levels.

26

L. Xie et al.

Fig. 14 A 3D photo-realistic talking head by combining 2D image samples with a 3D face model

When such a database is available, a more powerful LTS can be trained discriminatively such that the errors observed in the database can be predicted and recovered gracefully. In future work, with regard to the interactivity of the computer assistant, it can hear (via speech recognition) and speak (TTS synthesis), read and compose, correct and suggest, or even guess or read the learner’s intention.

Summary This chapter surveys the basic principles, state-of-the-art technologies, and featured applications in the visual speech animation area. Data collection, face/mouth model, feature extraction, and learning a mapping model are the central building blocks of a VSA system. The technologies used in different blocks depend on the application needs and affect the desired appearance of the system. During the past decades, much effort in this area has been devoted to the audio/text-to-visual mapping problem, and approaches can be roughly categorized into rule based, concatenation, parametric, and hybrid. We showcase a state-of-the-art approach, based on the deep bidirectional long short-term memory (DBLSM) recurrent neural networks (RNN) for audio-to-visual mapping in a video-realistic talking head. We also use the Engkoo project from Microsoft as a practical application of visual speech animation in language learning. We believe that with the fast development of computer graphics, speech technology, machine learning, and human behavior studies, the

Visual Speech Animation

27

future visual speech animation systems will become more flexible, expressive, and conversational. Subsequently, applications can be found across many domains.

References Anderson R, Stenger B, Wan V, Cipolla R (2013) Expressive visual text-to-speech using active appearance models. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE. p 3382 Badin P, Ben Youssef A, Bailly G et al (2010) Visual articulatory feedback for phonetic correction in second language learning. In: Proceedings of Second Language learning Studies: Acquisition, Learning, Education and Technology, 2010 Ben Youssef A, Shimodaira H, Braude DA (2013) Articulatory features for speech-driven head motion synthesis. In: Proceedings of the International Speech Communication Association, IEEE, 2013 Breeuwer M, Plomp R (1985) Speechreading supplemented with formant frequency information from voiced speech. J Acoust Soc Am 77(1):314–317 Bregler C, Covell M, Slaney M (1997) Video rewrite: driving visual speech with audio. In: Proceedings of the 24th annual conference on Computer graphics and interactive techniques, ACM Press, p 353 Busso C, Deng Z, Grimm M, Neumann U et al (2007) Rigid head motion in expressive speech animation: Analysis and synthesis. IEEE Trans Audio, Speech, Language Process 15 (3):1075–1086 Cao Y, Tien WC, Faloutsos P et al(2005) Expressive speech-driven facial animation. In: ACM Transactions on Graphics, ACM, p 1283 Cohen MM, Massaro DW (1993) Modeling coarticulation in synthetic visual speech. In: Models and techniques in computer animation. Springer, Japan, p 139 Cootes TF, Edwards GJ, Taylor CJ (2001) Active appearance models. IEEE Trans Pattern Anal Mach Intell 23(6):681–685 Cosatto E, Graf HP (1998) Sample-based synthesis of photo-realistic talking heads. In: Proceedings of Computer Animation, IEEE, p 103 Cosatto E, Graf HP (2000) Photo-realistic talking-heads from image samples. IEEE Trans Multimed 2(3):152–163 Cosatto E, Ostermann J, Graf HP et al (2003) Lifelike talking faces for interactive services. Proc IEEE 91(9):1406–1429 Deng Z, Neumann U (2008) Data-driven 3D facial animation. Springer Deng L, Yu D (2014) Deep learning methods and applications. Foundations and Trends in Signal Processing, 2014 Deng Z, Lewis JP, Neumann U (2005) Automated eye motion using texture synthesis. IEEE Comput Graph Appl 25(2):24–30 Ding C, Xie L, Zhu P (2015) Head motion synthesis from speech using deep neural networks. Multimed Tools Appl 74(22):9871–9888 Du J, Wang Q, Gao T et al (2014) Robust speech recognition with speech enhanced deep neural networks. In: Proceedings of the International Speech Communication Association, IEEE, p 616 Dziemianko M, Hofer G, Shimodaira H (2009). HMM-based automatic eye-blink synthesis from speech. In: Proceedings of the International Speech Communication Association, IEEE, p 1799 Englebienne G, Cootes T, Rattray M (2007) A probabilistic model for generating realistic lip movements from speech. In: Advances in neural information processing systems, p 401 Eskenazi M (2009) An overview of spoken language technology for education. Speech Commun 51 (10):832–844 Ezzat T, Poggio T (2000) Visual speech synthesis by morphing visemes. Int J Comput Vision 38 (1):45–57

28

L. Xie et al.

Ezzat T, Geiger G, Poggio T (2002) Trainable videorealistic speech animation. In: ACM SIGGRAPH 2006 Courses, ACM, p 388 Fagel S, Clemens C (2004) An articulation model for audiovisual speech synthesis: determination, adjustment, evaluation. Speech Commun 44(1):141–154 Fagel S, Bailly G, Theobald BJ (2010) Animating virtual speakers or singers from audio: lip-synching facial animation. EURASIP J Audio, Speech, Music Process 2009(1):1–2 Fan B, Wang L, Soong FK et al (2015) Photo-real talking head with deep bidirectional LSTM. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, IEEE, p 4884 Fan B, Xie L, Yang S, Wang L et al (2016) A deep bidirectional LSTM approach for video- realistic talking head. Multimed Tools Appl 75:5287–5309 Fu S, Gutierrez-Osuna R, Esposito A et al (2005) Audio/visual mapping with cross-modal hidden Markov models. IEEE Trans Multimed 7(2):243–252 Hinton G, Deng L, Yu D, Dahl GE et al (2012) Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Process Mag 29 (6):82–97 Hochreiter S (1998) The vanishing gradient problem during learning recurrent neural nets and problem solutions. Int J Uncertain Fuzz 6(02):107–116 Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780 Huang D, Wu X, Wei J et al (2013) Visualization of Mandarin articulation by using a physiological articulatory model. In: Signal and Information Processing Association Annual Summit and Conference, IEEE, p 1 Hunt AJ, Black AW (1996) Unit selection in a concatenative speech synthesis system using a large speech database. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, IEEE, p 373 Hura S, Leathem C, Shaked N (2010) Avatars meet the challenge. Speech Technol, 303217 Jia J, Zhang S, Meng F et al (2011) Emotional audio-visual speech synthesis based on PAD. IEEE Trans Audio, Speech, Language Process 19(3):570–582 Jia J, Wu Z, Zhang S et al (2014) Head and facial gestures synthesis using PAD model for an expressive talking avatar. Multimed Tools Appl 73(1):439–461 Kukich K (1992) Techniques for automatically correcting words in text. ACM Comput Surv 24 (4):377–439 Le BH, Ma X, Deng Z (2012) Live speech driven head-and-eye motion generators. IEEE Trans Vis Comput Graph 18(11):1902–1914 Liu P, Soong FK (2005) Kullback-Leibler divergence between two hidden Markov models. Microsoft Research Asia, Technical Report Massaro DW (1998) Perceiving talking faces: from speech perception to a behavioral principle. Mit Press, Cambridge Massaro DW, Simpson JA (2014) Speech perception by ear and eye: a paradigm for psychological inquiry. Psychology Press Masuko T, Kobayashi T, Tamura, M et al (1998) Text-to-visual speech synthesis based on parameter generation from HMM. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, IEEE, p 3745 McGurk H, MacDonald J (1976) Hearing lips and seeing voices. Nature 264:746–748 Microsoft Research (2015) http://research.microsoft.com/en-us/projects/voice_driven_talking_head/ Mikolov T, Chen K, Corrado G et al (2013) Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 Musti U, Zhou Z, Pietikinen M (2014) Facial 3D shape estimation from images for visual speech animation. In: Proceedings of the Pattern Recognition, IEEE, p 40 Ostermann J, Weissenfeld A (2004) Talking faces-technologies and applications. In: Proceedings of the 17th International Conference on Pattern Recognition, IEEE, p 826 Pandzic IS, Forchheimer R (2002) MPEG-4 facial animation. The standard, implementation and applications. John Wiley and Sons, Chichester

Visual Speech Animation

29

Parke FI (1972) Computer generated animation of faces. In: Proceedings of the ACM annual conference-Volume, ACM, p 451 Peng B, Qian Y, Soong FK et al (2011) A new phonetic candidate generator for improving search query efficiency. In: Twelfth Annual Conference of the International Speech Communication Association Pighin F, Hecker J, Lischinski D et al (2006) Synthesizing realistic facial expressions from photographs. In: ACM SIGGRAPH 2006 Courses, ACM, p 19 Qian Y, Yan ZJ, Wu YJ et al (2010) An HMM trajectory tiling (HTT) approach to high quality TTS. In: Proceedings of the International Speech Communication Association, IEEE, p 422 Raidt S, Bailly G, Elisei F (2007) Analyzing gaze during face-to-face interaction. In: International Workshop on Intelligent Virtual Agents. Springer, Berlin/Heidelberg, p 403 Microsoft Research (2015) http://research.microsoft.com/en-us/projects/blstmtalkinghead/ Richmond K, Hoole P, King S (2011) Announcing the electromagnetic articulography (day 1) subset of the mngu0 articulatory corpus. In: Proceedings of the International Speech Communication Association, IEEE, p 1505 Roweis S (1998) EM algorithms for PCA and SPCA. Adv Neural Inf Process Syst:626–632 Sako S, Tokuda K, Masuko T et al(2000) HMM-based text-to-audio-visual speech synthesis. In: Proceedings of the International Speech Communication Association, IEEE, p 25 Schuster M, Paliwal KK (1997) Bidirectional recurrent neural networks. IEEE Trans Signal Process 45(11):2673–2681 Scott MR, Liu X, Zhou M (2011) Towards a Specialized Search Engine for Language Learners [Point of View]. Proc IEEE 99(9):1462–1465 Seidlhofer B (2009) Common ground and different realities: World Englishes and English as a lingua franca. World Englishes 28(2):236–245 Sumby WH, Pollack I (1954) Erratum: visual contribution to speech intelligibility in noise [J. Acoust. Soc. Am. 26, 212 (1954)]. J Acoust Soc Am 26(4):583–583 Taylor P (2009) Text-to-speech synthesis. Cambridge university press, Cambridge Taylor SL, Mahler M, Theobald BJ et al (2012) Dynamic units of visual speech. In: Proceedings of the 11th ACM SIGGRAPH/Eurographics conference on Computer Animation, ACM, p 275 Theobald BJ, Fagel S, Bailly G et al (2008) LIPS2008: Visual speech synthesis challenge. In: Proceedings of the International Speech Communication Association, IEEE, p 2310 Thies J, Zollhfer M, Stamminger M et al(2016) Face2face: Real-time face capture and reenactment of rgb videos. In: Proceedings of Computer Vision and Pattern Recognition, IEEE, p 1 Tokuda K, Yoshimura T, Masuko T et al (2000) Speech parameter generation algorithms for HMM-based speech synthesis. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, IEEE, p 1615 Tokuda K, Oura K, Hashimoto K et al (2007) The HMM-based speech synthesis system. Online: http://hts.ics.nitech.ac.jp Wang D, King S (2011) Letter-to-sound pronunciation prediction using conditional random fields. IEEE Signal Process Lett 18(2):122–125 Wang L, Soong FK (2012) High quality lips animation with speech and captured facial action unit as A/V input. In: Signal and Information Processing Association Annual Summit and Conference, IEEE, p 1 Wang L, Soong FK (2015) HMM trajectory-guided sample selection for photo-realistic talking head. MultimedTools Appl 74(22):9849–9869 Wang L, Han W, Qian X, Soong FK (2010a) Rendering a personalized photo-real talking head from short video footage. In: 7th International Symposium on Chinese Spoken Language Processing, IEEE, p 129 Wang L, Qian X, Han W, Soong FK (2010b) Synthesizing photo-real talking head via trajectoryguided sample selection. In: Proceedings of the International Speech Communication Association, IEEE, p 446

30

L. Xie et al.

Wang L, Wu YJ, Zhuang X et al (2011) Synthesizing visual speech trajectory with minimum generation error. In: IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE, p 4580 Wang L, Chen H, Li S et al (2012a) Phoneme-level articulatory animation in pronunciation training. Speech Commun 54(7):845–856 Wang L, Han W, Soong FK (2012b) High quality lip-sync animation for 3D photo-realistic talking head. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, IEEE, p 4529 Wang LJ, Qian Y, Scott M, Chen G, Soong FK (2012c) Computer-assisted Audiovisual Language Learning, IEEE Computer, p 38 Weise T, Bouaziz S, Li H et al (2011) Realtime performance-based facial animation. In: ACM Transactions on Graphics, ACM, p 77 Wik P, Hjalmarsson A (2009) Embodied conversational agents in computer assisted language learning. Speech Commun 51(10):1024–1037 Williams RJ, Zipser D (1989) A learning algorithm for continually running fully recurrent neural networks. l Comput 1(2):270–280 Xie L, Liu ZQ (2007a) A coupled HMM approach to video-realistic speech animation. Pattern Recogn 40(8):2325–2340 Xie L, Liu ZQ (2007b) Realistic mouth-synching for speech-driven talking face using articulatory modelling. IEEE Trans Multimed 9(3):500–510 Xie L, Jia J, Meng H et al (2015) Expressive talking avatar synthesis and animation. Multimed Tools Appl 74(22):9845–9848 Yan ZJ, Qian Y, Soong FK (2010) Rich-context unit selection (RUS) approach to high quality TTS. In: IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE, p 4798 Zen H, Senior A, Schuster M (2013) Statistical parametric speech synthesis using deep neural networks. In: IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE, p 7962 Zhang LJ, Rubdy R, Alsagoff L (2009) Englishes and literatures-in-English in a globalised world. In: Proceedings of the 13th International Conference on English in Southeast Asia, p 42 Zhu P, Xie, L, Chen Y (2015) Articulatory movement prediction using deep bidirectional long short-term memory based recurrent neural networks and word/phone embeddings. In: Sixteenth Annual Conference of the International Speech Communication Association

Blendshape Facial Animation Ken Anjyo

Contents State of the Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Blendshape Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Blendshape Practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Examples and Practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Techniques for Efficient Animation Production . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Direct Manipulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Use of PCA Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Blendshape Creation, Retargeting, and Transfer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Abstract

Blendshapes are a standard approach for making expressive facial animations in the digital production industry. The blendshape model is represented as a linear weighted sum of the target faces, which exemplify user-defined facial expressions or approximate facial muscle actions. Blendshapes are therefore quite popular because of their simplicity, expressiveness, and interpretability. For example, unlike generic mesh editing tools, blendshapes approximate a space of valid facial expressions. This article provides the basic concepts and technical development of the blendshape model. First, we briefly describe a general face rig framework and thereafter introduce the concept of blendshapes as an established face rigging approach. Next, we illustrate how to use this model in animation practice, while

K. Anjyo (*) OLM Digital, Setagaya, Tokyo, Japan e-mail: [email protected] # Springer International Publishing AG 2016 B. Müller, S.I. Wolf (eds.), Handbook of Human Motion, DOI 10.1007/978-3-319-30808-1_2-1

1

2

K. Anjyo

clarifying the mathematical framework for blendshapes. We also demonstrate a few technical applications developed in the blendshape framework. Keywords

Computer facial animation • Face rig • Blendshapes • Retarget • Deformer • Facial motion capture • Performance capture

State of the Art Digital characters now appear not only in films and video games but also in various digital contents. In particular, facial animation of a digital character should then convey emotions to it, which plays a crucial role for visual storytelling. This requires a digital character animation process as well as its face rigging process (i.e., the setup process) to be very intensive and laborious. In this article, we define face rig as the pair of a deformer and its user interface (manipulator). The deformer means a mathematical model of deforming a face model’s geometry for making animation. The user interface provides animators a toolset of manipulating the face model, based on the deformer. In a production workplace, however, they usually use several deformers at a time, so that the user interface in practice should be more complicated, yet sophisticated, rather than the user interface that we will mention in later sections for blendshapes. A variety of the face rig approaches have been developed. Physics-based models provide the rigorous and natural approaches, having several applications not only in the digital production industry but also in medical sciences, including surgery simulations. The physics-based approaches for computer graphic applications approximate the mechanical properties of the face, such as skin layers, muscles, fatty tissues, bones, etc. Although the physics-based methods may be powerful in making realistic facial animations, artists are then required to have a certain amount of knowledge and experiences regarding background physics. This is not an easy task. On one hand, several commercial 3D CG packages provide proprietary face rig approaches, such as “cluster deformers” (see Tickoo (2009)), which allow the artist to specify the motion space using a painting operation for making 3D faces at key frames. The blendshapes offer a completely different face rig approach. A blendshape model generates face geometry as a linear combination of a number of face poses, each of which is called a blendshape target. These targets typically mean individual facial expressions, shapes that approximate facial muscle actions or FACS (Facial Action Coding System (Ekman and Friesen 1978)) motions. These targets are predefined (designed) by the artist. The blendshapes are therefore parameterized with the weights of the targets, which gives an intuitive and simple way for the artist to make animation. The interface is called sliders and used to control the weights.

Blendshape Facial Animation

3

Fig. 1 Blendshapes user interface example. Left: The slider box and a 3D face model under editing, where the slider box gives a partial view of the blendshape sliders. This is because, in general, the number of sliders is too large to see all sliders at a time. Instead a desired slider can be reached by scrolling the slider box. The 3D face model shows an edited result with the slider operation for right eye blink, right: the face model before the slider operation

Figure 1 presents such a slider interface example and a simple editing result for a blendshape model. The use of motion capture data has become a common approach to make animation of a digital character. As is well known, the original development of motion capture techniques was driven by the needs of life science community, where the techniques are mainly used for the analysis on a subject’s movement. In the digital production industry, facial motion capture data may be used as an input for the synthesis of realistic animations. The original data will then be converted to a digital face model and edited to obtain desired facial animations. Some of the face rig techniques are therefore indispensable in the converting (retargeting) and editing processes.

Blendshape Applications As mentioned earlier, several face rig techniques are used together for practical situations. Even when more sophisticated approaches to facial modeling are used, blendshapes are often employed as a base layer over which physically based or functional (parameterized) deformations are layered.

4

K. Anjyo

In digital production studios and video game companies, they need to develop a sophisticated system that should fully support the artists for efficient and highquality production of visual effects and character animation. The role of blendshape techniques may therefore be a small portion of the system, but is still crucial. Here we briefly describe a few state-of-the-art applications that use blendshape techniques: • Voodoo. This system has been developed in Rhythm & Hues Studios over years, which deals mainly with animation, rigging, matchmove, crowds, fur grooming, and computer vision (see Fxguide (2014)). The system provides several prodigious face rigging tools using blendshapes. For example, many great shots in the film Life of Pi in 2012 were created with this system. • Fez. This is the facial animation system developed in ILM (Bhat et al. 2013; Cantwell et al. 2016; CGW 2014), which involves an FACS implementation using blendshape techniques. It has contributed to recent films, such as Warcraft and Teenage Mutant Ninja Turtles, in 2016. • Face Plus. This is a plug-in for Unity, which is a cross-platform game engine. This plug-in enables us to construct a facial capture and animation system using a web camera (see Mixamo (2013) for details). Based on the blendshape character model created by an artist, the system provides real-time facial animation of the character. In the following sections, we describe the basic practice and mathematical background of the blendshape model.

Blendshape Practice The term “blendshapes” was introduced in the computer graphics industry, and we follow the definition: blendshapes are linear facial models in which the individual basis vectors are not orthogonal but instead represent individual facial expressions. The individual basis vectors have been referred to as blendshape targets and morph targets or (more roughly) as shapes or blendshapes. The corresponding weights are often called sliders, since this is how they appear in the user interface (as shown in Fig. 1). Creating a blendshape facial animation thus requires specifying weights for each frame of the animation, which has traditionally been achieved with key frame animation or by motion capture. In the above discussion, we use a basic mathematical term “vectors.” This section starts with explaining what the vectors mean in making 3D facial models and animations. We then illustrate how to use the blendshapes in practice.

Blendshape Facial Animation

5

Formulation We represent the face model as a column vector f containing all the model vertex coordinates in some order that is arbitrary (such as xyzxyzxyz, or alternately xxxyyyzzz) but consistent across the individual blendshapes. For example, let us consider the face model composed of n = 100 blendshapes, each having p = 1000 vertices, with each vertex having three components x, y, z. Similarly, we denote the blendshape targets as vectors bk, so the blendshape model is represented as f¼

n X

wk bk ,

(1)

k¼0

where f is the resulting face, in the form of a m = 30,000  1 vector (m = 3p); the individual blendshapes b0, b1,   , bn are 30,000  1 vectors; and wk. denotes the weight for bk (1  k  n). We then put b0 as the neutral face. Blendshapes can therefore be considered simply adding vectors. Equation (1) may be referred to as the global or “whole-face” blendshape approach. The carefully sculpted blendshape targets appeared in Eq. (1) then serve as interpretable controls; the span of these targets strictly defines the valid range of expressions for the modeled face. These characteristics differentiate the blendshape approach from those that involve linear combinations of uninterpretable shapes (see a later section) or algorithmically recombine the target shapes using a method other than that in Eq. (1). In particular, from an artist’s point of view, the interpretability of the blendshape basis is a definitive feature of the approach. In the whole-face approach, scaling all the weights by a multiplier causes the whole head to scale, while scaling of the head is more conveniently handled with a separate transformation. To eliminate undesired scaling, the weights in Eq. 1 may be constrained to sum to one. Additionally the weights can be constrained to the interval [0,1] in practice. In the local or “delta” blendshape formulation, one face model b0 (typically the resting face expression) is designated as the neutral face shape, while the remaining targets bk(1  k  n) in Eq. (1) are replaced with the difference bk – b0 between the k-th face target and the neutral face: f ¼ b0 þ

n X

wk ðbk  b0 Þ:

(2)

k¼1

Or, if we use matrix notation, Eq. (2) can be expressed as: f5Bwþb0 ,

(3)

where B is an m  n matrix having bk – b0 as the k-th column vector, and w = (w1, w2,...,wn)T is the weight vector.

6

K. Anjyo

Fig. 2 Target face examples. From left: neutral, smile, disaffected, and sad

In this formulation, the weights are conventionally limited to the range [0,1], while there are exceptions to this convention. For example, the Maya blendshape interface allows the [0,1] limits to be overridden by the artist if needed. If the difference between a particular blendshape bk and the neutral shape is confined to a small region, such as the left eyebrow, then the resulting parameterization offers intuitive localized control. The delta blendshape formulation is used in popular packages such as Maya (see Tickoo (2009)), and our discussion will assume this variant if not otherwise specified. Many comments apply equally (or with straightforward conversion) to the whole-face variant.

Examples and Practice Next, we show a simple example of the blendshape model, which has 50 target faces. The facial expressions in Fig. 1 were also made with this simple model. A few target shapes of the model are demonstrated in Fig. 2, where the leftmost image shows its neutral face. Using the 50 target shapes, the blendshape model provides a mixture of such targets. As mentioned above, the blendshape model is conceptually simple and intuitive. Nevertheless, professional use of this model further requires a large and laborintensive effort of the artists, some of which are listed as follows: • Target shape construction – To express a complete range of realistic expressions, digital modelers often have to create large libraries of blendshape targets. For example, the character of Gollum in The Lord of the Rings had 946 targets (Raitt 2004). Generating a reasonably detailed model can be as much as a year of work for a skilled modeler, involving many iterations of refinement. – A skilled digital artist can deform a base mesh into the different shapes needed to cover the desired range of expressions. Alternatively, the blendshapes can be directly scanned from a real actor or a sculpted model. A common template

Blendshape Facial Animation

7

model can be registered to each scan in order to obtain vertex-wise correspondences across the blendshape targets. • Slider control (see Fig. 1) – To skillfully and efficiently use the targets, animators need to memorize the function of 50 to 100 commonly used sliders. Then locating a desired slider isn’t immediate. – A substantial number of sliders are needed for high-quality facial animation. Therefore the complete set of sliders does not fit on the computer display. • Animation editing – As a traditional way, blendshapes have been animated by key frame animation of the weights. Commercial packages provide spline curve interpolation of the weights and allow the tangents to be specified at key frames. – Performance-driven facial animation is an alternative way to make animation. Since blendshapes are the common approach for realistic facial models, blendshapes and performance-driven animation are frequently used together (see section “Use of PCA Models,” for instance). We then may need an additional process where the motion captured from a real face is “retargeted” to a 3D face model.

Techniques for Efficient Animation Production In previous sections, we have shown that the blendshapes are a conceptually simple, common, yet laborious facial animation approach. Therefore a number of developments have been made to greatly improve efficiency in making blendshape facial animation. However, in this section, let us restrict ourselves to describe only a few of our work, while we also mention some techniques related to blendshapes and facial animation. To know more about the mathematical aspect of blendshape algorithms, we would recommend referring to the survey (Lewis et al. 2014).

Direct Manipulation In general, interfaces should provide both direct manipulation and editing of underlying parameters. While direct manipulation usually provides more natural and efficient results, parameter editing can be more exact and reproducible. Artists might therefore prefer it in some cases. While inverse kinematic approaches to posing human figures have been used for many years, analogous inverse or direct manipulation approaches for posing faces and setting key frames have emerged quite recently. In these approaches, the artist directly moves points on the face surface model, and the software must solve for the underlying weights or parameters that best reproduce that motion, rather than tuning the underlying parameters. Here we consider the cases where the number of sliders is considerably large (i.e., well over 100) for a professional use of the blendshape model. Introducing a direct

8

K. Anjyo

Fig. 3 Example of direct manipulation interface for blendshapes

manipulation approach would then be a legitimate requirement. To achieve this, we solve the inverse problem of finding the weights for given point movements and constraints. In Lewis and Anjyo (2010), this problem is regularized by considering the fact that facial pose changes are proportional to slider position changes. The resulting approach is easy to implement and can cope with existing blendshape models. Figure 3 shows such a direct manipulation interface example, where selecting a point on the face model surface creates a manipulator object termed a pin, and the pins can be dragged into desired positions. According to the pin and drag operations, the system solves for the slider values (the right panel in Fig. 3) for the face to best match the pinned positions. It should then be noted that the direct manipulation developed in Lewis and Anjyo (2010) can interoperate with the traditional parameter-based key frame editing. As demonstrated in Lewis and Anjyo (2010), both direct manipulation and parameter editing are indispensable for blendshape animation practice. There are several extensions of the direct manipulation approach. For instance, a direct manipulation system suitable for use in animation production has been demonstrated in Seo et al. (2011), including treatment of combination blendshapes and non-blendshape deformers. Another extension in Anjyo et al. (2012) describes a direct manipulation system that allows more efficient edits using a simple prior learned from facial motion capture.

Use of PCA Models In performance-driven facial animation, the motion of human actor is used to derive the face model. Whereas face tracking is a key technology for the performancedriven approaches, this article focuses on performance capture methods that drive a

Blendshape Facial Animation

9

face rig. The performance capture methods mostly use PCA basis or blendshape basis. We use principal component analysis (PCA) to obtain a PCA model for the given database of facial expression examples. As usual each element of the database is represented as an m  1 vector x. Let U be an m  r matrix consisting of the r eigenvectors corresponding to the largest eigenvalues of the data covariance matrix. The PCA model is then given as: f5Ucþe0 ,

(4)

where the vector c means the coefficients of those eigenvectors and e0 denotes the mean vector of all elements x in the database. Since we usually have r  m, the PCA model gives a good low-dimensional representation of the facial models x. This also leads us to solutions to statistical estimation problems in a maximum a posteriori (MAP) framework. For example, in Lau et al. (2009), direct dragging and strokebased expression editing are developed in this framework to find an appropriate c in Eq. (4). The PCA approaches are useful if the face model is manipulated only with direct manipulation. Professional animation may also require slider operations, so that the underlying basis should be of blendshapes, rather than PCA representation. This is due to the lack of interpretability of the PCA basis (Lewis and Anjyo 2010). A blendshape representation (3) can be equated to a PCA model (4) that spans the same space: Bw þ bo5Ucþe0 :

(5)

We know from Eq. (5) that the weight vector w and the coefficient vector c can be interconverted:  1 w5 BT B BT ðUcþe0 b0 Þ

(6)

c5UT ðBwþb0 e0 Þ,

(7)

where we use the fact that UTU is an r  r unit matrix in deriving the second Eq. (7). We note that the matrices and vectors in Eqs. (6) and (7), such as (BTB)1BTU and (BTB)1BT(e0 – b0), can be precomputed. Converting from weights to coefficients or vice versa is thus a simple affine transform that can easily be done at interactive rates. This will provide us a useful direct manipulation method for a PCA model, if the model can also be represented with a blendshape model.

Blendshape Creation, Retargeting, and Transfer Creating a blendshape model for professional animation requires sculpting on the order of 100 blendshape targets and adding hundred more shapes in several ways

10

K. Anjyo

(see Lewis et al. (2014), for instance). Ideally, the use of dense motion capture of a sufficiently varied performance should contribute to efficiently create such a large number of blendshape targets. To achieve this, several approaches have been proposed, including a PCA-based approach (Lewis et al. 2004) and a sparse matrix decomposition method (Neumann et al. 2013). Expression cloning approaches (Noh and Neumann (2001); Sumner and Popović (2004), for instance) are developed for retargeting the motion from one facial model (the “source”) to drive a face (the “target”) with significantly different proportions. The expression cloning problem was posed in Noh and Neumann (2001), where the solution was given as a mapping by finding corresponding pairs of points on the source and target faces using face-specific heuristics. The early expression cloning algorithms do not consider adapting the temporal dynamics of the motion to the target, which means that they work well if the source and target are of similar proportions. The movement matching principle in Seol et al. (2012) provides an expression cloning algorithm that can cope with the temporal dynamics of face movement by solving a space-time Poisson equation for the target blendshape motion. Relating to expression cloning, we also mention model transfer briefly. This is the case where the source is a fully constructed blendshape model and the target consists of only a neutral face (or a few targets). Deformation transfer (Sumner and Popović 2004) then provides a method of constructing the target blendshape model, which is mathematically equivalent to solving a certain Poisson equation (Botsch et al. 2006). We also have more recent progresses for the blendshape model transfer, including the one treating with a self-collision issue (Saito 2013) and the technique allowing the user to iteratively add more training poses for blendshape expression refinement (Li et al. 2010).

Conclusion While the origin of blendshapes may lie outside academic forums, blendshape models have evolved over the years along with a variety of advanced techniques including those described in this article. We expect more scientific insights from visual perception, psychology, and biology will strengthen the theory and practice of the blendshape facial models. In a digital production workplace, we should also promote seamless integration of the blendshape models with other software tools to establish a more creative and efficient production environment. Acknowledgments I would like to thank J. P. Lewis for mentoring me over the years in the field of computer facial animation research and practice. Many thanks go to Ayumi Kimura for her fruitful discussions and warm encouragements in preparing and writing this article. I also thank Gengdai Liu and Hideki Todo for their helpful comments and creation of the images in Figs. 1, 2, and 3.

Blendshape Facial Animation

11

References Anjyo K, Todo H, Lewis JP (2012) A practical approach to direct manipulation blendshapes. J Graph Tools 16(3):160–176 Bhat K, Goldenthal R, Ye Y, Mallet R, Koperwas M (2013) High fidelity facial animation capture and retargeting with contours. In: Proceedings of the 12th ACM SIG-GRAPH/Eurographics Symposium on Computer Animation, 7–14 Botsch M, Sumner R, Pauly M, Gross M (2006) Deformation transfer for detail-preserving surface editing. In: Proceedings of Vision, Modeling, and Visualization (VMV), 357–364 Cantwell B, Warner P, Koperwas M, Bhat K (2016) ILM facial performance capture, In ACM SIGGRAPH2016 Talks, 26:1–26:2 CGW web page (2014) http://www.cgw.com/Publications/CGW/2014/Volume-37-Issue-4-JulAug-2014-/Turtle-Talk.aspx Ekman P, Friesen W (1978) Facial action coding system: manual. Consulting Psychologists Press, Palo Alto Fxguide web page (2014) https://www.fxguide.com/featured/voodoo-magic/ Lau M, Chai J, Xu Y-Q, Shum H-Y (2009) Face poser: interactive modeling of 3D facial expressions using facial priors. ACM Trans Graph 29(1), 3:1–3:17 Lewis JP, Anjyo K (2010) Direct manipulation blendshapes. IEEE Comput Graph Appl 30 (4):42–50 Lewis JP, Mo Z, Neumann U (2004) Ripple-free local bases by design. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), 684–688 Lewis JP, Anjyo K, Rhee T, Zhang M, Poghin F, Deng Z (2014) Practice and theory of blend-shape facial models. Eurographics 2014 (State of the Art Reports), 199–218 Li H, Weise T, Pauly M (2010) Example-based facial rigging. ACM Trans Graph 29(3), 32:1–32:6 Mixamo web page (2013) https://www.mixamo.com/faceplus Neumann T, Varanasi K, Wenger S, Wacker M, Magnor M, Theobalt C (2013) Sparse localized deformation components. ACM Trans Graph 32(6), 179:1–179:10 Noh J, Neumann U (2001) Expression Cloning. In: SIGGRAPH2001, Computer Graphics Proceedings, ACM Press/ACM SIGGRAPH, 277–288 Raitt B (2004) The making of Gollum. Presentation at U. Southern California Institute for Creative Technologies’s Frontiers of Facial Animation Workshop, August 2004 Saito J (2013) Smooth contact-aware facial blendshape transfer. In: Proceedings of Digital Production Symposium 2013 (DigiPro2013), ACM. 7–12 Seo J, Irving J, Lewis JP, Noh J (2011) Compression and direct manipulation of complex blendshape models. ACM Trans Graph 30(6), 164:1–164:10 Seol Y, Lewis JP, Seo J, Choi B, Anjyo K, Noh J (2012) Spacetime expression cloning for blendshapes. ACM Trans Graph 31(2), 14:1–14:12 Sumner RW, Popović J (2004) Deformation transfer for triangle meshes. ACM Trans Graph 23 (3):399–405 Tickoo S (2009) Autodesk maya 2010: a comprehensive guide. CADCIM Technologies, Schererville

Eye Animation Andrew T. Duchowski and Sophie Jörg

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Eye Rotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Listing’s and Bonders’ Laws . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Modeling Physiologically Plausible Eye Rotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Modeling Induced Torsion During Vergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fixations and Saccades . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parametric Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Modeling Microsaccadic Jitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Modeling Saccadic Velocity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pupil Dilation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Procedural Eye Movement Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Periocular Motions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Running the Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Summary: Listing the Sources of Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2 3 4 4 5 6 8 9 12 13 13 14 14 15 16 16

Abstract

The synthesis of eye movements involves modeling saccades (the rapid shifts of gaze), smooth pursuits (object tracking motions), binocular rotations implicated in vergence, and the coupling of eye and head rotations. More detailed movements include dilation and constriction of the pupil (pupil unrest) as well as small fluctuations (microsaccades, tremor, and drift, which we collectively call A.T. Duchowski (*) Clemenson University, Clemson, SC, USA e-mail: [email protected] S. Jörg School of Computing, Clemson University, Clemson, SC, USA e-mail: [email protected] # Springer International Publishing AG 2016 B. Müller, S.I. Wolf (eds.), Handbook of Human Motion, DOI 10.1007/978-3-319-30808-1_3-1

1

2

A.T. Duchowski and S. Jörg

microsaccadic jitter) made during fixations, when gaze is held nearly steady. In this chapter, we focus on synthesizing physiologically plausible eye rotations, microsaccadic jitter, and pupil unrest. We review concepts relevant to the animation of eye motions and provide a procedural model of gaze that incorporates rotations adhering to Donders’ and Listing’s laws, the saccadic main sequence, along with gaze jitter and pupil unrest. We model microsaccadic jitter and pupil unrest by 1/f α or pink noise.

Keywords

Eye movements • Saccades • Fixations • Microsaccadic jitter

Introduction Engaging virtual characters are highly relevant in many applications, from entertainment to virtual reality (e.g., training). Realistic eye motions are important for increasing the perceived believability of virtual actors (avatars) and physical humanoid robots. Looser and Wheatley [36] show that people are influenced by the eyes more than by other facial features when rating the animacy of virtual characters. In film, eye movements are important for conveying the character’s emotions and thoughts, e.g., as in Neil Burger’s feature film Limitless. In virtual environments, eye gaze is of vital importance for the correct identification of deictic reference – what is being looked at [42]. As characters become more realistic and humanlike, correct gaze animation becomes even more important. Garau et al. [20] found a strong subjective interaction effect between the realism of a character and their gaze; for a more realistic character, more elaborate gaze behavior is preferred, whereas for a less realistic character, random gaze patterns received better ratings. The dynamics of eye motions, however, have received little attention since Lee et al.’s [35] seminal Eyes Alive model which focused largely on saccadic eye movements, implementing what is known as the saccadic main sequence [5]. The rapid advancement of eye tracking technology has revitalized interest in recording eye movements for inclusion in computer graphics and interactive systems [17, 57, 65]. Ruhland et al. [53] survey the state of the art in eye and gaze animation, where efforts aimed at modeling the appearance and movement of the human eye are reviewed. Beyond the synthesis of saccades (the rapid shifts of gaze), their report also considers tracking motions known as smooth pursuits, binocular rotations implicated in vergence (used for depth perception), and the coupling of eye and head rotations (e.g., the vestibulo-ocular reflex (VOR)). Ruhland et al. [53] furthermore review high-level aspects of gaze behavior including past efforts to model visual attention, expression of emotion, nonverbal interaction, conversation and listening behavior, verbal interaction, and speech-driven gaze. In this chapter, we extend their review by focusing on several important aspects missing from their survey, namely, oculomotor rotations of the eyeball adhering to

Eye Animation

3

Donders’ and Listing’s laws [59], the detailed motions of the eye during fixations [38] that can be modeled with microsaccadic jitter, and rapid fluctuations of the pupil (pupil unrest) [54]. We present previous findings on these topics and derive a physiologically plausible procedural eye movement model where microsaccadic jitter and pupil unrest are modeled by 1/fα pink noise. Our chapter contribution is based on prior publications presented at Computer Graphics International (CGI) [14], Motion in Games (MIG) [15], and the Symposium on Eye Tracking Research & Applications (ETRA) [18].

Eye Rotation Almost all normal primate eye movements used to reposition the fovea result as combinations of five basic types: saccadic, smooth pursuit, vergence, vestibular, and small movements associated with fixations [51]. These smaller motions consist of drift, tremor, and microsaccades [52]. Other movements such as adaptation and accommodation refer to nonpositional aspects of eye movements (i.e., pupil dilation, lens focusing). In general, the eyes move within six degrees of freedom: three translations within the socket and three rotations, although physical displacement is required for translations to occur (e.g., a push of a finger). There are six muscles responsible for movement of the eyeball: the medial and lateral recti (sideway movements), the superior and inferior recti (up/down movements), and the superior and inferior obliques (twist) [12]. The neural system involved in generating eye movements is known as the oculomotor plant [51]. Eye movement control signals emanate from several functionally distinct brain regions. Areas in the occipital cortex are thought to be responsible for high-level visual functions such as recognition. The superior colliculus bears afferents emanating directly from the retina, particularly from peripheral regions conveyed through the magnocellular pathway. The semicircular canals react to head movements in three-dimensional space. All three areas (i.e., the occipital cortex, the superior colliculus, and the semicircular canals) convey efferents to the eye muscles through the mesencephalic and pontine reticular formations. Classification of observed eye movement signals relies in part on the known functional characteristics of these cortical regions [16]. Eye movement models typically do not consider the oculomotor plant for the purposes of animation; rather, signal characteristics are of greater importance. For example, Komogortsev et al. have developed a sophisticated model of the oculomotor plant but for biometric identification purposes rather than for animation [30, 31]. Prior models of eye rotation have been developed from the perspective of capturing and modeling observed gaze behavior but do not necessarily take into account their synthetic reproduction, i.e., animation [24, 50]. What are also often overlooked are constraints of orbital rotations following Listing’s and Donders’ laws. In this chapter we discuss previous models based on quaternion rotation and show how they can be implemented in a straightforward manner to ensure physiologically plausible eye rotation.

4

A.T. Duchowski and S. Jörg

Listing’s and Bonders’ Laws Listing’s and Donders’ laws state that eyeball rotations can effectively be modeled as compositions of rotations exclusively about the vertical and horizontal axes, with negligible torsion when head direction is fixed. Further implications arise during ocular vergence movement, as discussed below. Using recorded gaze movements from monkeys and humans, Tweed et al. [59] define the eyeball’s primary position as one in which the gaze vector is orthogonal to Listing’s plane, the plane of displacement, which essentially models the tilt of the head. Listing’s law states that the eye assumes only those orientations that can be reached from the primary position by a single rotation about an axis lying in Listing’s plane. For modeling purposes, Listing’s law, a refinement of Donders’ law, states that in the absence of head tilt and with static visual surrounding, we can effectively ignore the eyeball’s torsional component when modeling saccades [48]. In practice, torsion fluctuations of up to about 5 have been observed [19]. Interestingly, Tweed and Vilis [60] show that the primary gaze direction varies between primates (e.g., preferred head tilt varies within humans as well as within monkeys, and between the two groups, with monkeys generally carrying their heads tilted slightly more back than humans, on average). We believe that this variability in preferred primary gaze direction is a factor in believability of virtual actors which may not have been previously exploited. Ma and Deng [37], for example, describe a model of gaze driven by head direction/rotation, but their gaze-head coupling model is designed in a fashion that seems contrary to physiology: head motion triggers eye motion. Instead, because the eyes are mechanically “cheaper” and faster to rotate, the brain usually triggers head motion when the eyes exceed a certain saccadic amplitude threshold (about 30 ; see Murphy and Duchowski [41] for an introductory note on this topic). Nevertheless, Ma and Deng and then later Peters and Qureshi [47] both provide useful models of gaze/head coupling with a good “linkage” between gaze and head vectors. Our model currently focuses only on gaze direction and assumes a stationary head but eventually could offer extended functionality, expressed in terms of quaternions, which are generally better suited for expressing rotations than vectors and Euler angles. In our model, the eyes are the primary indicators of attention, with head rotation following gaze rotation when a rotational threshold (e.g., 30 ) is exceeded.

Modeling Physiologically Plausible Eye Rotation A coordinated model of the movement of the head and eyes relies on a plausible description of the eyeball’s rotation within its orbit. Eyeball rotation, at any given instant in time t, is described by the deviation of the eyeball from its primary position. This rotation can be described by the familiar Euler angles used to denote roll, pitch, and yaw. Mathematically, a concise and convenient representation of all

Eye Animation

5

three angles in one construct is afforded by a quaternion that describes the eyeball’s orientation, which describes the direction of the vector ge emanating from the eyeball center and terminating at the 3D position of gaze in space, pt ¼ ðxt , yt , zt Þ. Tweed et al. [59] specify the quaternion in question in relation to the bisector Ve of the current reference gaze direction ge and primary gaze vector gp. To precisely model Listing’s law, assuming Ve is a normalized forward-pointing vector orthogonal to the displacement plane (which may be tilted back), the quaternion q expressing the rotation between gp and ge is q ¼ ½Ve  ge , ðVe  ge Þ in a righthanded coordinate system. The quaternion q is a vector with four components, q ¼ ðq0 , qτ , qV , qH Þ , with q0 the scalar part of q and qτ, qV, and qH the torsional, vertical, and horizontal rotational components, respectively. Listing’s and Donders’ laws are important for setting up traditional computer graphics rotations of the eyeball because together they not only specify a convenient simplification but they also specify a physiologically observable constraint, namely, qτ ¼ 0 , negligible visible torsion. Moreover, head/eye movement coordination is made implicit since Listing’s plane can be used to model the orientation of the head with quaternion q fixed to lie in Listing’s plane. This is accomplished by first setting the gaze direction vector gr to point at reference point pt, then specifying the rotation quaternion’s plane by parameters f, fV, and fH, which are used to express qτ as a function of qH and qV : qT ¼ f þ f V qV þ f H qH . If f is not 0, then the reference position pt does not satisfy Listing’s law (see Fig. 1). Quaternion pffiffiffiffiffiffiffiffiffiffiffiffi ffi e¼ 1  f 2 , f , 0, 0 , however, does and has the same direction as gr. To force pffiffiffiffiffiffiffiffiffiffiffiffiffi  q to adhere to Listing’s law, we set up quaternion e1 ¼ 1  f 2 ,  f , 0, 0 and right-multiply q. This fixes the reference position adjusting the quaternion’s torsional component. To find Listing’s plane, the normal vector is computed by specifying the quaternion of the primary position relative to e as p ¼ ðV 1 , V 0 ,  V 3 , V 2 Þ where V ¼ ð1,  f V ,  f H Þ=jð1,  f V ,  f H Þj. Quaternion q is then left-multiplied by p1 giving p1 qe1 as the corrected rotation quaternion satisfying Listing’s law such that qτ ¼ 0. It is important to note that gr describes the orientation of the head. That is, gr can be used to model an avatar’s preferred head tilt, and thus primary gaze direction, or rotation undergone during vergence eye movements (see below).

Modeling Induced Torsion During Vergence Torsional eye movements are associated with vergence or can occur in response to specific visual stimuli [62]. Tweed et al.’s [59] quaternion framework models Listing’s law by negating cyclotorsion (qτ ¼ 0). The shape of the surface to which the eye position quaternions are constrained resembles a plane when only the eyes move and the head is stationary. When gaze shifts involve both the eye and head

6

A.T. Duchowski and S. Jörg

a

looking ahead

b

c

no torsion

d

slight torsion

180° torsion

Fig. 1 Eye rotation from primary position (a): (b) no torsion modeling Listing’s and Donders’ laws; (c) slight torsion due to Listing’s plane tilt; (d) implausible torsion though mathematically possible if quaternions are not constrained

(e.g., during VOR), the rotation surface twists and becomes non-planar [21]. This twist is similar to that produced by a Fick gimbal model of rotations in which the horizontal axis is nested within a fixed vertical axis.1 Tweed et al.’s [59] quaternion framework does not explicitly consider vergence movements. Mok et al. [39] suggest that eye positions during vergence remain restricted to a planar surface (Listing’s plane), but that surface is rotated relative to that observed for far targets. The rotation is such that during convergence both eyes undergo extorsion during downward gaze shifts and intorsion during upward gaze shifts, i.e., Listing’s plane of the left eye is rotated to the left about the vertical axis (qV > 0) while that of the right is rotated to the right (qV < 0). For example, vergence of degree θ can be modeled by constructing quaternion b q ¼ ð cos ðθ=2Þ, sin ðθ=2ÞvÞ where v denotes the vertical axis (0, 1, 0) and then 1 rotating gr via b q , as illustrated by Fig. 2. During convergence, primary position qgr b is rotated temporally by approximately two-thirds the angle of vergence. Our model currently does not take into account refinements concerning VOR within the basic quaternion framework but produces correct vergence eye movements in the context of head-free (instead of head-fixed) rotations and allows targeting of the eye at a look point pt. This point may in turn be animated to rotate the eye, e.g., via a procedural model (see below).

Fixations and Saccades Fixations, the stationary eye movements required to maintain steady gaze when looking at a visual target, are never perfectly still but instead are composed of small involuntary eye movements, composed of microsaccades, drift, and tremor 1

The non-commutativity of rotations leads to false torsions from equivalent rotations around eye-fixed and head-fixed axes; under normal circumstances, the eye assumes orientations given by Euler rotations satisfying Listing’s law [50].

Eye Animation

a

7

b

looking in, level

c

looking in and up

looking in and down

Fig. 2 Eye rotation during 30 vergence (qV ¼ 0:07): (a) no torsion (f ¼ 0:00) when gaze level; (b) slight intorsion (f ¼ 0:06) looking up; (c) slight extorsion (f ¼ 0:06) looking down. The plane drawn behind the (left) eye is a visualization of the rotation of gaze direction gr during convergence

[52]. Microsaccades play a vital role in maintaining visibility during fixation but are perhaps the least understood of all eye movement types, despite their critical importance to normal vision [38]. If microsaccadic eye movements were perfectly counteracted, visual perception would rapidly fade due to adaptation [22, 27]. Microsaccades contribute to maintaining visibility during fixation by shifting the retinal image in a fashion that overcomes adaptation, generating neural responses to stationary stimuli in visual neurons. Martinez-Conde et al. [38] note that microsaccades are unnoticed, but this generally refers to oneself – it is not possible to detect one’s own eye movements when looking in a mirror. Because the perceptual system is sensitive to, and amplifies, small fluctuations [61], noticing others’ eye movements, even subtle ones, may be important, especially during conversation, turn-taking, etc. (see Vertegaal [63]). Even though microsaccades are the largest and fastest fixational eye movement, they are relatively small in amplitude, carrying the retinal image across a range of several dozen to several hundred photoreceptor widths [38]. Microsaccades and saccades share many physical and functional characteristics suggesting that both eye movements have a common oculomotor origin, i.e., a common neural generator for both (current evidence points to a key role of the superior colliculus). While microsaccades play a crucial role in maintaining visibility during fixation, they may also betray our emotional state, as they reflect human brain activities during attention and cognitive tasks [28]. Laretzaki et al. [34] show that fixational distribution is more widespread during periods of psychological threat versus periods of psychological safety. That is, the dispersion of microsaccades is larger under perception of threat than under perception of safety. DiStasi et al. [13] also note that saccadic and microsaccadic velocity decrease with time-on-task, whereas

8

A.T. Duchowski and S. Jörg

drift velocity increases, suggesting that ocular instability increases with mental fatigue. Thus, dispersion of the microsaccadic position distribution can be made to increase with (simulated) increased fatigue. Microsaccades and saccades follow the main sequence that describes the relationship between their amplitude (θ) and duration (Δt) and can be modeled by the linear equation Δt ¼ 2:2θ þ 21 ðmillisecondsÞ

(1)

for saccadic amplitudes up to about 20 [5, 29].2 The main sequence gives us a plausible range of durations and corresponding eyeball rotations that are intuitively understood: the larger the eye rotation (θ), the more time required to rotate that eye. All these insights can be used to develop parametric models for the synthesis of fixations and saccades.

Parametric Models Animating gaze shifts of virtual humans often involve the use of parametric models of human gaze behavior [4, 46]. While these types of models enable virtual humans to perform natural and accurate gaze shifts, signal characteristics, and in particular noise, are rarely addressed, if at all. Noise, however, although a nuisance from a signal processing perspective, is a key component of natural eye movements. To generate a stream of synthetic gaze points resembling captured data pt ¼ ðxt , yt Þ (the z-coordinate can be dropped if the points are projected on a 2D viewing plane), a reasonable strategy is to guide synthetic gaze to a sequence of known points, i.e., a grid of points that is used to calibrate the eye tracker to human viewers, or, e.g., a model of reading behavior where gaze is directed toward as yet unread or previously read words (regressions or revisits) and to lines above and below the current line of text [10], or a set of points selected by an artist. Given such a sequence (e.g., see Fig. 3), several characteristics need to be added, namely: 1. A model of the spatiotemporal fixation perturbation (microsaccadic jitter) [15] 2. A model of saccadic velocity (i.e., position and duration) 3. Control of the simulation time step and sampling rates (see section “Running the Simulation”) We suggest to model the spatiotemporal perturbation of gaze points at a fixation, which arises from microsaccades, drift, and tremor, with 1/f α or pink noise, which we call microsaccadic jitter.

2

In their development of Eyes Alive, Lee et al. [35] (see also Gu et al. [23]) expressed the main sequence as Δt ¼ dθ þ D0 (milliseconds) with d  [2,2.7] ms/deg and D0  [20,30] ms.

Eye Animation

9

Modeling Microsaccadic Jitter A key aspect for the realistic synthesis of eye motion is the inclusion of microsaccadic gaze jitter. While the recorded eye movement signal is well understood from the point of view of analysis, surprisingly little work exists on its detailed synthesis. Most analytical approaches are concerned with gaze data filtering, e.g., signal smoothing and/or processing for the detection of specific events such as saccades, fixations, or more recently further distinction between ambient and focal fixations [32]. During analysis of recorded eye movements, gaze data is commonly filtered, especially when detecting fixations (e.g., see Duchowski et al. [18], who advocate the use of the Savitzky-Golay filter for signal analysis and detection of fixations). The signal processing approach (filtering) still dominates even in very recent approaches to synthesis, e.g., Yeo et al.’s [65] Eyecatch simulation, where the simulation used the Kalman filter to produce gaze, but focused primarily on saccades and smooth pursuits (see below). Microsaccades, tremor, or drift were not modeled. As noted by Yeo et al., simulated gaze behavior looked qualitatively similar to gaze data captured by an eye tracker, but comparison of synthesized trajectory plots showed absence of gaze jitter that was evident in the raw data. The distribution of microsaccade amplitudes tends to a 1 asymptote, making it a convenient upper amplitude threshold, although microsaccade amplitude distribution tends to peak at about 12 arcmin [38]. Amplitude distribution can be modeled by the Poisson probability mass function P ðx, λÞ ¼ λx eλ =x! with x ¼ 6, shifted by x  8:5 and scaled by 5.5 and approximated by a normal distribution 5.5 pffiffiffi  N x  8:0, μ ¼ λ, σ ¼ λ . The resultant normal distribution resembles the microsaccade distribution reported by Martinez-Conde et al. [38] and provides a starting point for modeling of microsaccadic jitter, suggesting that perturbation about the point of fixation can be modeled by the normal distribution N ðμ ¼ 0, σ ¼ 12=60Þ (arcmin) for each of the x- and y-coordinate offsets to the fixation modeled during simulation (setting σ ¼ 0 yields no jitter during fixation and can be used to simulate keyframed saccades). Modeling microsaccadic jitter by the normal distribution yields white noise perturbation. White noise perturbation is a logical starting point for modeling microsaccadic jitter, but it is uncorrelated and therefore not necessarily a good choice. Recorded neural spikes are superimposed with noise that exhibits non-Gaussian characteristics and can be approximated as 1/f α noise [64]. Pulse trains of nerve cells belonging to various brain structures have been observed and characterized as 1/f noise [61]. The 1/f regime accomplishes a tradeoff: the perceptual system is sensitive to and amplifies small fluctuations; simultaneously, the system preserves a memory of past stimuli in the long-time correlation tails. The memory inherent in the 1/f system can be used to achieve a priming effect: the succession of two stimuli separated by 50–100 ms at the same location results in a stronger response to the second stimulus.

10

A.T. Duchowski and S. Jörg

a

900

actual gaze data

Gaze point data

800

y-coordinate (pixels)

700 600 500 400 300 200 100 0

0

100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600

x-coordinate (pixels)

b

900

synthetic for rendering

Gaze point data

800

y-coordinate (pixels)

700 600 500 400 300 200 100 0

0

100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600

x-coordinate (pixels)

c

900

synthetic data with noise

Gaze point data

800

y-coordinate (pixels)

700 600 500 400 300 200 100 0

0

100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600

x-coordinate (pixels)

Eye Animation

11

To model microsaccades by 1/f α (pink) noise, the white noise perturbations modeled by normal distribution N ð0, σ ¼ 12=60Þ (arcmin visual angle) are digitally filtered by a digital pink noise filter with system function: H n ðzÞ ¼

n zq 1 k ∏ , Gn ðαÞ k¼1 z þ pk

n

Gn ðαÞ ¼ ∏ k¼1

a þ ak ak α þ 1

(2)

where Gn(α) is the nth order approximation to an ideal analog pink noise filter with pffiffi a system function GðsÞ ¼ 1=ð sÞ , ak ¼ tan2 ðkθÞ, and qk ¼

1  αak , 1 þ αak

pk ¼

α  αk α þ αk

with f0 the unity gain frequency, α ¼ tan ðπ f 0 T Þ, and T the sampling period for filter order n  ℤ. With α ¼ 1:0, the filter produces pink noise given white noise as input [25]. For other values of a, α very good approximation for θ is θ ¼ π=ð2n þ 2  2αÞ. We chose a 4th order filter for reshaping of the microsaccadic jitter modeled by Gaussian noise, N ð0, σ ¼ 12=60Þ arcmin visual angle. More formally, we define the pink noise filter as a function of two parameters, P ðα, f 0 Þ, where 1/f α describes the pink noise power spectral distribution and f0 the filter’s unity gain frequency (or more simply its gain). Setting α ¼ 1 produces 1/f noise. Setting α ¼ 0 produces white, uncorrelated noise, with a flat power spectral distribution, likely a poor choice for modeling biological motion such as microsaccades. We found α ¼ 0:6 and f 0 ¼ 0:85 gave fairly natural microsaccadic jitter [15]. In practice, a look point drives the rotation of the eyeball. We can therefore model microsaccades as separate x- and y-directional offsets to the main view vector. This then requires two pink noise filters, one for each of the two dimensions. Setting the simulation up this way allows independent control of horizontal and vertical microsaccades so that, for example, by controlling α, horizontal microsaccades can be made to be more noisy (more noise devoted to the high-frequency portion of the spectrum) than vertical microsaccades. The above model of microsaccadic jitter does not consider where fixations are made, i.e., the above model can be used to add perturbations to randomly distributed fixations. We can guide gaze to a sequence of fixation points, specified as a series of 2D look point coordinates. Microsaccadic jitter is then used to perturb the look point about each fixation point. More formally, we simulate a sequence of look points via the following fixation simulation, developed by Duchowski et al. [18]: ptþh ¼ pt þ P ðα, f 0 Þ

(3)

ä Fig. 3 Generation of a sequence of synthetic gaze points based on raw gaze data captured by an eye tracker in (a): (b) microsaccadic jitter at fixation points identified from raw gaze data; (c) addition of simulated eye tracker noise which obscures microsaccadic jitter but produces gaze distributed about calibration points resembling raw gaze data

12

A.T. Duchowski and S. Jörg Saccade acceleration, velocity, and position profiles Parameterized curve (arbitrary units)

1

0.5

0

-0.5

-1

position velocity acceleration 0

0.2

0.4

0.6

0.8

1

Time (normalized, arbitrary units)

Fig. 4 Parametric saccade position model derived from an idealized model of saccadic force-time function assumed by Abrams et al.’s [1] symmetric-impulse variability model: scaled position 60H € ðtÞ (t), velocity 31:H ðtÞ, and acceleration 10H

where pt is the look point at simulation time t and h is the simulation time step.

Modeling Saccadic Velocity To effect movement of the look point pt between fixation points, a model of saccades is required, specifying both movement and duration of the gaze point. We start with an approximation to a force-time function assumed by a symmetric-impulse variability model [1]. This function, qualitatively similar to symmetric limb movement trajectories, describes an acceleration profile that rises to a maximum, returns to zero about halfway through the movement, and then is followed by an almost mirrorimage deceleration phase. To model a symmetric acceleration function, we can choose a combination of € ðtÞ ¼ h10 ðtÞ þ h11 ðtÞ where Hermite blending functions h11(t) and h10(t), so that H 3 2 3 2 € ðtÞ is acceleration of the gaze h10 ðtÞ ¼ t  2t þ t, h11 ðtÞ ¼ t  t , t  ½0, 1, and H point over normalized time interval t  ½0, 1 . Integrating acceleration produces velocity, H_ ðtÞ ¼ 12 t4  13 þ 12 t2 which when integrated once more produces posi1 5 tion HðtÞ ¼ 10 t  14 t4 þ 16 t3 on the normalized interval t  ½0, 1 (see Fig. 4). Given an equation for position over a normalized time window (t  ½0, 1), we can now stretch this time window at will to any given length t  ½0, Δt. Because the distance between gaze target points is known a priori, we can use these distances (pixel distances converted to amplitudes in degrees visual angle) as input to the main sequence (1) to obtain saccade length.

Eye Animation

13

Assuming data collected from the eye tracker does not deviate greatly from the main sequence found in the literature [5, 29], we set the expected saccade duration to that given by (1) but augmented with a 10 targeting error. We also add in a slight temporal perturbation to the predicted saccade duration, based on empirical observations. Saccade duration is thus modeled as Δt ¼ 2:2N ðθ, σ ¼ 10 Þ þ 21 þ N ð0, 0:01Þ ðmillisecondsÞ

(4)

Pupil Dilation Pupil dilation is controlled by top-down and bottom-up processes. There is evidence that it responds to cognitive load [2, 6], ambient light changes [7], and visual detection [49]. However, for the purposes of animation, pupil unrest, the slight oscillation between pupil dilation and constriction, is perhaps more interesting to model. Stark et al. [54] describe pupil diameter fluctuations as noise in a biological system with the major component of the pupil unrest as random noise in the 0.05–0.3 Hz range, with transfer function GðsÞ ¼ :16expð: 18sÞ=ð1 þ 0:1sÞ3 and gain equal to 0.16. This transfer function can be modeled by a third-order Butterworth filter with system function G3 ðsÞ ¼ 1=ðs3 þ 2s2 þ 2s þ 1Þ with cutoff frequency set to 1.5915 (see Hollos and Hollos [26]). Such a filter can thus be used to smooth Gaussian noise (e.g., N ð0, 0:5Þ) but will result in uncorrelated noise. In recent work on eye capture, Bérard et al. [9] model pupil constriction/dilation but only via linear interpolation of keyframes in response to light conditions. They did not, however, procedurally animate the pupil as a function of pupil unrest. Pamplona et al. [45] modeled pupil unrest (hippus) but via small random variations to light intensity, likely to be white noise although they did not specify this directly. We model pupil unrest directly, via pink noise perturbation. We can model pupil diameter oscillation with pink noise by once again filtering white noise with the same digital pink noise filter as for microsaccadic perturbations. For pupil oscillation, we chose a 4th order filter for reshaping pupil oscillation modeled as Gaussian noise, N ð0, σ ¼ 0:5Þ. We found pink noise parameters α ¼ 1:6 and f 0 ¼ 0:35 produced fairly natural simulation of pupil unrest [15].

Procedural Eye Movement Model In the previous sections, we reviewed important concepts relevant to the animation of eye movements and presented components of a procedural model for eye movement synthesis, consisting of rotations that adhere to Donders’ and Listing’s laws, and a model of gaze that incorporates the saccadic main sequence, along with gaze jitter and pupil unrest. We model both microsaccades and pupil unrest by 1/f α or pink noise, where 0 < α < 2, with exponent α usually close to 1. The use of 1/f α pink noise to model

14

A.T. Duchowski and S. Jörg

microsaccadic jitter and pupil oscillation is a key aspect of our simulation. Various colors of noise have appeared in the computer graphics literature, e.g., blue noise for anti-aliasing, green noise for halftoning, and white noise for common random number generation [66]. Pink noise is regarded as suitable for describing physical and biological distributions, e.g., plants [11, 43] and galaxies [33], as well as the behavior of biosystems in general [56]. Aks et al. [3] suggest that memory across eye movements may serve to facilitate selection of information from the visual environment, leading to a complex and self-organizing (saccadic) search pattern produced by the oculomotor system reflecting 1/f pink noise. To complete our model, we add periocular motions to our model and run our simulation.

Periocular Motions The motions of the upper and lower eyelids comprise saccadic lid and smooth pursuit movements, where the eyelid motion is closely related to the motion of the corresponding eyeball and blinks. We create the saccadic lid and smooth pursuit movements by rigging the eyelids to the eyeball. To model blinks, we approximate the eyelid closure function proposed by Trutoiu et al. [58]. We use a piecewise function to model the eyelid blink in two temporal components, the faster closure followed by a slower opening:  CðtÞ ¼

a  ðt  μ Þ2 , tμ b  eclogðtμþ1Þ , otherwise

(5)

with C ¼ 1 indicating lid fully open and C ¼ 0 lid fully closed, where t  ½0, 100 represents normalized percent blink duration (scalable to an arbitrary duration), μ ¼ 37 the point when the lid should reach full (or nearly-full) closure, a ¼ 0:98 indicating percent lid closure at the start of the blink, with b ¼ 1:18 and c ¼ μ=100 parameters used to shape the asymptotic lid opening function. Trutoiu et al. [58] recorded blink frequencies from their actors of 6.6, 8.2, and 27.0 blinks per minute, or 14 blinks per minute on average. These rates appear to be within normal limits reported by Bentivogolio et al. [8], namely, 17 blinks per minute, ranging from 4.5 while reading to 26 during conversation. Simulating conversation, our procedural model uses 25 blinks per minute as the average with a mean duration of 120 ms. Unlike Steptoe et al. [55], we do not use kinematics to model blinks; rather we use a simplified stochastic model of blink duration.

Running the Simulation When running the simulation, it is important to keep the simulation time step (h) small, e.g., h ¼ 0:0001. When about to execute a saccade, set the saccade clock t ¼ 0, and then while t < Δt, perform the following simulation steps:

Eye Animation

15

1: t ¼ t=Δt ðscale interpolant to time windowÞ 2: pt ¼ Ci1 þ H ðtÞCi ðadvance positionÞ 3: t ¼ t þ h ðadvance time by the time step hÞ where Ci denotes the ith 2D look point sequence coordinates and pt is the saccade position, both in vector form. Setting time step h to an arbitrarily small value allows dissociation of the simulation clock from the sampling rate. We can thus sample the synthetic eye tracking data at arbitrary sampling periods, e.g., d ¼ 1, d ¼ 16, or d ¼ 33 ms for sampling rates of 1000 Hz, 60 Hz, or 30 Hz, respectively. Unfortunately, eye trackers’ sampling rates are not precise, or rather, eye trackers’ sampling periods are generally non-uniform, most likely due to competing processes on the computer used to run the eye tracking software and/or due to network latencies. The simulation sampling period can be modeled by adding in a slight random temporal perturbation, e.g., N ð0, σ ¼ 0:5Þ milliseconds.

Summary: Listing the Sources of Variation To recount, the stochastic model of eye movements is based on infusion of probabilistic noise at various points in the simulation: • Fixation durations, modeled in this instance by N ð1:081, σ ¼ 2:9016Þ (seconds), the average and standard deviation from Duchowski et al. [18], • Microsaccadic fixation jitter, modeled by pink noise P ðα ¼ :6, f 0 ¼ :85Þ (degrees visual angle), • Saccade durations, modeled by (4), and • Sampling period N ð1, 000=F , σ ¼ 0:5Þ (milliseconds), with F the sampling frequency (Hz). For rendering purposes, the eye movement data stream is appended with: • Blink duration, modeled as N ð120, σ ¼ 70Þ (ms), and • Pupil unrest, modeled by pink noise P ðα ¼ 1:6, f 0 ¼ :35Þ (relative diameter). Collectively, the above sources of error can be considered as stochastic perturbation of the gaze point about its current location, i.e., ptþh ¼ pt ¼ P ðα, f 0 Þ þ η

(6)

where the primary source of microsaccadic jitter is represented by pink noise P ðα, f 0 Þ and η represents the various sources of variation, above. See Duchowski et al. [18] for further details. For affective eye movement synthesis, modulating α will result in modulation of jitter. Modulation of f0 controls the amount of dispersion of fixational points. Both

16

A.T. Duchowski and S. Jörg

model parameters can thus be used to control the expected appearance of emotional state. What remains is tuning of these parameters to effect emotional expression. Results from our perceptual experiments thus far have showed that animations based on the procedural model with pink noise jitter were consistently evaluated and perceived as highly natural in the presence of other animations as alternatives [15]. We have also found that some jitter, but not too much, captures visual attention better than when jitter is excessive or not present.

Conclusion In this chapter, we have summarized some of the physiological characteristics of eye motions and presented a physiologically plausible procedural model of eye movements, complete with blinks, saccades, and fixations augmented by microsaccadic jitter and pupil unrest, both modeled by 1/f α or pink noise. Our procedural model of gaze motion constitutes the basis of a “bottom-up” model of gaze rotations, modeled by quaternions which are used to effect eyeball rotation in response to a “look point” in space projected onto a 2D plane in front of the eye. The location of this look point is determined by the procedural model simulated over time, which is tasked with producing a characteristic fixation/saccade signal that models recorded gaze data. The procedural model differs from others, e.g., those driven by saliency such as Oyekoya et al.’s [44], or those driven by head movement propensity [47]. These latter models can be considered “top-down” models as they are more concerned with prediction of locations of gaze that, say, an autonomous avatar is likely to make. Our model is concerned with low-level signal characteristics of the fixation point regardless of how it was determined. Subjective evaluations have shown that the absence of noise is clearly unnatural. Microsaccadic jitter therefore appears to be a crucial ingredient in the quest toward natural eye movement rendering. Gaze jitter is naturally always present since the eyes are never perfectly still. We believe that correctly modeling the jitter that characterizes gaze fixation is a key factor in promoting the believability and acceptance of synthetic actors and avatars, thereby bridging the Uncanny Valley [40]. Acknowledgments This material is based in part upon work supported by the US National Science Foundation under Grant No. IIS-1423189. Any opinions, findings, and conclusions expressed in this material are those of the authors and do not necessarily reflect the views of the NSF.

References 1. Abrams RA, Meyer DE, Kornblum S (1989) Speed and accuracy of saccadic eye movements: characteristics of impulse variability in the oculomotor system. J Exp Psychol Hum Percept Perform 15(3):529–543 2. Ahern S, Beatty J (1979) Pupillary responses during information processing vary with scholastic aptitude test scores. Science 205(4412):1289–1292

Eye Animation

17

3. Aks DJ, Zelinsky GJ, Sprott JC (2002) Memory across eye-movements: 1 / f dynamic in visual search. Nonlinear Dynamics Psychol Life Sci 6(1):1–25 4. Andrist S, Pejsa T, Mutlu B, Gleicher M (2012) Designing effective gaze mechanisms for virtual agents. In: Proceedings of the 2012 ACM Annual Conference on Human Factors in Computing Systems, CHI’12. ACM, New York, pp 705–714. doi:10.1145/2207676.2207777, http://doi.acm.org/10.1145/2207676.2207777 5. Bahill AT, Clark M, Stark L (1975) The main sequence. A tool for studying human eye movements. Math Biosci 24(3/4):191–204 6. Beatty J (1982) Task-evoked pupillary responses, processing load, and the structure of processing resources. Psychol Bull 91(2):276–292 7. Beatty J, Lucero-Wagoner B (2000) The pupillary system. In: Cacioppo JT, Tassinary LG, Bernston GG (eds) Handbook of psychophysiology, 2nd edn. Cambridge, Cambridge University Press, pp 142–162 8. Bentivoglio AR, Bressman SB, Cassetta E, Carretta D, Tonali P, Albanese A (1997) Analysis of blink rate patterns in normal subjects. Mov Disord 12(6):1028–1034 9. Bérard P, Bradley D, Nitti M, Beeler T, Gross M (2014) High-quality capture of eyes. ACM Trans Graph 33(6):2231–22312. doi:10.1145/2661229.2661285 10. Campbell CS, Maglio PP (2001) A robust algorithm for reading detection. In: ACM workshop on perceptive user interfaces. ACM Press, New York, pp 1–7 11. Condit R, Ashton PS, Baker P, Bunyavejchewin S, Gunatilleke S, Gunatilleke N, Hubbell SP, Foster RB, Itoh A, LaFrankie JV, Lee HS, Losos E, Manokaran N, Sukumar R, Yamakura T (2000) Spatial patterns in the distribution of tropical tree species. Science 288(5470):1414–1418 12. Davson H (1980) Physiology of the eye, 4th edn. Academic, New York, NY 13. Di Stasi LL, McCamy MB, Catena A, Macknik SL, Cañas JJ, Martinez-Conde S (2013) Microsaccade and drift dynamics reflect mental fatigue. Eur J Neurosci 38(3):2389–2398 14. Duchowski A, Jörg S (2015) Modeling physiologically plausible eye rotations: adhering to Donders’ and listing’s laws. In: Proceedings of computer graphics international (short papers) (2015) 15. Duchowski A, Jörg S, Lawson A, Bolte T, Świrski L, Krejtz K (2015) Eye movement synthesis with 1/f pink noise. In: Motion in Games (MIG) 2015, Paris, France 16. Duchowski AT (2007) Eye tracking methodology: theory & practice, 2nd edn. Springer, London, UK 17. Duchowski AT, House DH, Gestring J, Wang RI, Krejtz K, Krejtz I, Mantiuk R, Bazyluk B (2014) Reducing visual discomfort of 3D stereoscopic displays with gaze-contingent depth-offield. In: Proceedings of the ACM symposium on applied perception, SAP’14. ACM, New York, NY, pp 39–46. doi:10.1145/2628257.2628259, http://doi.acm.org/10.1145/2628257.2628259 18. Duchowski AT, Jörg S, Allen TN, Giannopoulos I, Krejtz K (2016) Eye movement synthesis. In: Proceedings of the ninth biennial acm symposium on eye tracking research & applications, ETRA’16. ACM, New York, NY, pp 147–154. doi:10.1145/2857491.2857528, http://doi.acm. org/10.1145/2857491.2857528 19. Ferman L, Collewijn H, Van den Berg AV (1987) A direct test of listing’s law – I. human ocular torsion measured in static tertiary positions. Vision Res 27(6):929–938 20. Garau M, Slater M, Vinayagamoorthy V, Brogni A, Steed A, Sasse MA (2003) The impact of avatar realism and eye gaze control on perceived quality of communication in a shared immersive virtual environment. In: Human factors in computing systems: CHI 03 conference proceedings. ACM Press, New York, pp 529–536 21. Glenn B, Vilis T (1992) Violations of listing’s law after large eye and head gaze shifts. J Neurophysiol 68(1):309–318 22. Grzywacz NM, Norcia AM (1995) Directional selectivity in the cortex. In: Arbib MA (ed) The handbook of brain theory and neural networks. Cambridge, MA, The MIT Press, pp 309–311 23. Gu E, Lee SP, Badler JB, Badler NI (2008) Eye movements, saccades, and multi-party conversations. In: Deng Z, Neumann U (eds) Data-driven 3D facial animation. Springer, London, UK, pp 79–97. doi:10.1007/978-1-84628-907-1_4

18

A.T. Duchowski and S. Jörg

24. Haslwanter T (1995) Mathematics of three-dimensional eye rotations. Vision Res 35 (12):1727–1739 25. Hollos S, Hollos JR (2015) Creating noise. Exstrom Laboratories, LLC, Longmont, CO, http://www.abrazol.com/books/noise/ (last accessed Jan. 2015). ISBN 9781887187268 (ebook) 26. Hollos S, Hollos JR (2015) Recursive Digital Filters: A Concise Guide. Exstrom Laboratories, LLC, Longmont, CO, http://www.abrazol.com/books/filter1/ (last accessed Jan. 2015). ISBN 9781887187244 (ebook) 27. Hubel DH (1988) Eye, brain, and vision. Scientific American Library, New York, NY 28. Kashihara K, Okanoya K, Kawai N (2014) Emotional attention modulates microsaccadic rate and direction. Psychol Res 78:166–179 29. Knox PC (2012) The parameters of eye movement (2001). Lecture Notes, URL: http://www.liv. ac.uk/~pcknox/teaching/Eymovs/params.htm (last accessed November 2012) 30. Komogortsev OV, Karpov A (2013) Liveness detection via oculomotor plant characteristics: attack of mechanical replicas. In: Proceedings of the IEEE/IARP international conference on biometrics (ICB), pp 1–8 31. Komogortsev OV, Karpov A, Holland CD (2015) Attack of mechanical replicas: liveness detection with eye movements. IEEE Trans Inform Forensics Secur 10(4):716–725 32. Krejtz K, Duchowski AT, Çöltekin A (2014) High-level gaze metrics from map viewing: charting ambient/focal visual attention. In: Kiefer P, Giannopoulos I, Raubal M, Krüger A (eds) 2nd international workshop in eye tracking for spatial research (ET4S) 33. Landy SD (1999) Mapping the universe. Sci Am 224:38–45 34. Laretzaki G, Plainis S, Vrettos I, Chrisoulakis A, Pallikaris I, Bitsios P (2011) Threat and trait anxiety affect stability of gaze fixation. Biol Psychol 86(3):330–336 35. Lee SP, Badler JB, Badler NI (2002) Eyes alive. ACM Trans Graph 21(3):637–644. doi:10.1145/566654.566629, http://doi.acm.org/10.1145/566654.566629 36. Looser CE, Wheatley T (2010) The tipping point of animacy. How, when, and where we perceive life in a face. Psychol Sci 21(12):1854–62 37. Ma X, Deng Z (2009) Natural eye motion synthesis by modeling gaze-head coupling. In: IEEE virtual reality, pp 143–150. Lafayette, LA 38. Martinez-Conde S, Macknik SL, Troncoso Xoana G, Hubel DH (2009) Microsaccades: a neurophysiological analysis. Trends Neurosci 32(9):463–475 39. Mok D, Ro A, Cadera W, Crawford JD, Vilis T (1992) Rotation of listing’s plane during vergence. Vision Res 32(11):2055–2064 40. Mori M (1970) The uncanny valley. Energy 7(4):33–35 41. Murphy H, Duchowski AT (2002) Perceptual gaze extent & level of detail in VR: looking outside the box. In: Conference abstracts and applications (sketches & applications), Computer graphics (SIGGRAPH) annual conference series. ACM, San Antonio, TX 42. Murray N, Roberts D, Steed A, Sharkey P, Dickerson P, Rae J, Wolff R (2009) Eye gaze in virtual environments: evaluating the need and initial work on implementation. Concurr Comput 21:1437–1449 43. Ostling A, Harte J, Green J (2000) Self-similarity and clustering in the spatial distribution of species. Science 27(5492):671 44. Oyekoya O, Steptoe W, Steed A (2009) A saliency-based method of simulating visual attention in virtual scenes. In: Reality V (ed) Software and technology. New York, ACM, pp 199–206 45. Pamplona VF, Oliveira MM, Baranoski GVG (2009) Photorealistic models for pupil light reflex and iridal pattern deformation. ACM Trans Graph 28(4):106:1–106:12. doi:10.1145/ 1559755.1559763, http://doi.acm.org/10.1145/1559755.1559763 46. Pejsa T, Mutlu B, Gleicher M (2013) Stylized and performative gaze for character animation. In: Navazo I, Poulin P (eds) Proceedings of EuroGrpahics. EuroGraphics 47. Peters C, Qureshi A (2010) A head movement propensity model for animating gaze shifts and blinks of virtual characters. Comput Graph 34:677–687

Eye Animation

19

48. Porrill J, Ivins JP, Frisby JP (1999) The variation of torsion with vergence and elevation. Vision Res 39:3934–3950 49. Privitera CM, Renninger LW, Carney T, Klein S, Aguilar M (2008) The pupil dilation response to visual detection. In: Rogowitz BE, Pappas T (eds) Human vision and electronic imaging, vol 6806. SPIE, Bellingham, WA 50. Quaia C, Optican LM (2003) Three-dimensional Rotations of the Eye. In: Kaufman PL, Alm A (eds) Adler’s phsyiology of the eye: clinical application, 10th edn. C. V. Mosby Co., St. Louis, pp 818–829 51. Robinson DA (1968) The oculomotor control system: a review. Proc IEEE 56(6):1032–1049 52. Rolfs M (2009) Microsaccades: Small steps on a long way. Vision Res 49(20):2415–2441. doi:10.1016/j.visres.2009.08.010, http://www.sciencedirect.com/science/article/pii/ S0042698909003691 53. Ruhland K, Andrist S, Badler JB, Peters CE, Badler NI, Gleicher M, Mutlu B, McDonnell R (2014) Look me in the eyes: a survey of eye and gaze animation for virtual agents and artificial systems. In: Lefebvre S, Spagnuolo M (ed) Computer graphics forum. EuroGraphics STAR – State of the Art Report. EuroGraphics. 54. Stark L, Campbell FW, Atwood J (1958) Pupil unrest: an example of noise in a biological servomechanism. Nature 182(4639):857–858 55. Steptoe W, Oyekoya O, Steed A (2010) Eyelid kinematics for virtual characters. Comput Animat Virtual World 21(3–4):161–171 56. Szendro P, Vincze G, Szasz A (2001) Pink-noise behaviour of biosystems. Eur Biophys J 30 (3):227–231 57. Templin K, Didyk P, Myszkowski K, Hefeeda MM, Seidel HP, Matusik W (2014) Modeling and optimizing eye vergence response to stereoscopic cuts. ACM Trans. Graph 33(4):8 Article 145 (July 2014), DOI = http://dx.doi.org/10.1145/2601097.2601148 58. Trutoiu LC, Carter EJ, Matthews I, Hodgins JK (2011) Modeling and animating eye blinks. ACM Trans Appl Percept (TAP) 2(3):17:1–17:17 59. Tweed D, Cadera W, Vilis T (1990) Computing three-dimensional eye position quaternions and eye velocity from search coil signals. Vision Res 30(1):97–110 60. Tweed D, Vilis T (1990) Geometric relations of eye position and velocity vectors during saccades. Vision Res 30(1):111–127 61. Usher M, Stemmler M, Olami Z (1995) Dynamic pattern formation leads to 1/f noise in neural populations. Phys Rev Lett 74(2):326–330 62. van Rijn LJ (1994) Torsional eye movements in humans. Ph.D. thesis, Erasmus Universiteit Rotterdam, Rotterdam, The Netherlands 63. Vertegaal R (1999) The GAZE groupware system: mediating joint attention in mutiparty communication and collaboration. In: Human factors in computing systems: CHI’99 conference proceedings. ACM Press, New York, pp 294–301 64. Yang Z, Zhao Q, Keefer E, Liu W (2009) Noise characterization, modeling, and reduction for in vivo neural recording. In: Bengio Y, Schuurmans D, Lafferty J, Williams CKI, Culotta A (eds) Advances in neural information processing systems, vol 22., pp 2160–2168 65. Yeo SH, Lesmana M, Neog DR, Pai DK (2012) Eyecatch: simulating visuomotor coordination for object interception. ACM Trans Graph 31(4):42:1–42:10 66. Zhou Y, Huang H, Wei LY, Wang R (2012) Point sampling with general noise spectrum. ACM Trans Graph 31(4):76:1–76:11. doi:10.1145/2185520.2185572, URL: http://doi.acm.org/10. 1145/2185520.2185572

Head Motion Generation Najmeh Sadoughi and Carlos Busso

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . State of the Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Role of Head Motion in Human Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Relation Between Head Motion and Speech . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Head Movement Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rule-Based Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Data-Driven Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hybrid Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Open Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Speech-Driven Models Using Synthetic Speech . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Exploring Entrainment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Modeling Personality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Joint Models to Integrate Head Motion with Other Gestures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2 4 4 7 8 9 12 15 19 19 20 20 21 21 22

Abstract

Head movement is an important part of body language. Head motion plays a role in communicating lexical and syntactic information. It conveys emotional and personality traits. It plays an important role in acknowledging active listening. Given these communicative functions, it is important to synthesize Conversation Agents (CAs) with meaningful human-like head motion sequences, which are timely synchronized with speech. There are several studies that have focused on synthesizing head movements. Most studies can be categorized as rule-based or data-driven frameworks. On the one hand, rule-based methods define rules that map semantic labels or communicative goals to specific head motion sequences, N. Sadoughi (*) • C. Busso (*) Multimodal signal Processing Lab, University of Texas at Dallas, Dallas, TX, USA e-mail: [email protected]; [email protected] # Springer International Publishing Switzerland 2016 B. Müller, S.I. Wolf (eds.), Handbook of Human Motion, DOI 10.1007/978-3-319-30808-1_4-1

1

2

N. Sadoughi and C. Busso

which are appropriate for the underlying message (e.g., nodding for affirmation). However, the range of head motion sequences that are generated by these systems are usually limited, resulting in repetitive behaviors. On the other hand, datadriven methods rely on recorded head motion sequences which are used either to concatenate existing sequences creating new realizations of head movements or to build statistical frameworks that are able to synthesize novel realizations of head motion behaviors. Due to the strong correlation between head movements and speech prosody, these approaches usually rely on speech to drive the head movements. These methods can capture a broader range of movements displayed during human interaction. However, even when the generated head movements may be tightly synchronized with speech, they may not convey the underlying discourse function or intention in the message. The advantages of rule-based and data-driven methods have inspired several studies to create hybrid methods that overcome the aforementioned limitations. These studies have been proposed to generate the movements using parametric or nonparametric approaches, constraining the models not only on speech, but also on the semantic content. This chapter reviews most influential frameworks to generate head motion. It also discusses open challenges that can move this research area forward. Keywords

Conversational agent • Rule-based animation • Data-driven animation • Speechdriven animation • Head movement generation • Semantic content • Backchannel • Nonverbal behaviors • Expressive head motion • Rapport • Embodied conversational agents • Visual prosody

Introduction Head movement is an integral part of our body language used during human interactions. Head motion can play a communicative role displaying emblems (i.e., gestures conveying specific meaning) and regulators (i.e., gestures to control turntaking sequence) (Heylen 2005). It is also important to establish rapport by providing suitable backchannels while listening to conversational partners (Gratch et al. 2006; Huang et al. 2011). Having rhythmic head movements coupled with speech prosodic features increases speech intelligibility by signaling syntactic boundaries (Munhall et al. 2004). Head movements are also used to convey the mood of the speaker (Busso et al. 2007a, b). It can also be used to express uncertainty (Marsi and van Rooden 2007). Given the key role of head motion during human interaction, it is not surprising that there is a need to model and capture all these aspects to generate believable conversational agents (CAs). CAs without head movements are perceived less warm (Welbergen et al. 2015) and less natural (Busso et al. 2007a; Mariooryad and Busso 2012). This chapter describes influential frameworks proposed to synthesize head motion sequences for CAs, describing the main challenges. For head movement generation, previous studies have proposed frameworks that can be categorized into two main approaches: rule-based methods (Cassell et al.

Head Motion Generation

3

1994; Kopp et al. 2006; Liu et al. 2012; Pelachaud et al. 1996) and data-driven frameworks (Busso et al. 2005; Chiu and Marsella 2011; Chuang and Bregler 2005; Deng et al. 2004; Mariooryad and Busso 2012; Sadoughi et al. 2014; Taylor et al. 2006). The predominant approach to generate head motion is rule-based systems, where hand crafted head movements such as shaking and nodding are carefully selected and stored. The semantic content is then analyzed generating head motion sequences following the selected rules. These methods usually define several heuristic rules derived from previous psychological and observational studies, mapping the syntactic and semantic structure of the utterance to prototypical head motion sequences. These methods also define rules to synchronize head movements with the underlying speech. The second category corresponds to data-driven methods, where head motion sequences are generated from existing recordings. These data-driven methods either concatenate existing head motion sequences according to given criteria or learn statistical models to capture the distribution of head motion. A prevalent modality used in previous data-driven studies is a set of speech prosodic features, leveraging the strong coupling between head motion and speech (Busso and Narayanan 2007). The two main approaches for head movement generation have advantages and disadvantages. On the one hand, rule-based systems have the advantage of considering the meaning of the message to choose appropriate movements. However, the head movements may seem repetitive, since the range and variability of head motions are usually limited to the predefined sequences per type of movement stored in the system. Under similar conditions, the system will tend to generate similar behaviors oversimplifying the complex relationship between verbal and nonverbal information. Furthermore, forcing synchronization between behaviors and speech is challenging (e.g., the coupling between speech and head motion). On the other hand, data-driven frameworks have the potential to capture the range of behaviors seen in real human interaction, creating novel realizations that resemble natural head motions. When speech features are used to generate head motion, the models can automatically learn the synchronization between the prosodic structure and head movements. However, using solely data-driven models may disregard the semantic content of the message, resulting in movements that are not aligned with the message. These systems may generate perfectly synchronized emblems contradicting the message (e.g., shaking the head during affirmation). To balance the tradeoff between naturalness and appropriateness, studies have attempted to bridge the gap between both methods creating hybrid approaches that leverage the advantages of both methods, overcoming their limitations (Chiu et al. 2015; Sadoughi and Busso 2015; Sadoughi et al. 2014; Stone et al. 2004). This chapter describes the role of head motion in human interaction, emphasizing the importance of synthesizing behaviors that properly convey the relation between head motion and other verbal and nonverbal channels. We review influential studies proposing rule-based, data-driven, and hybrid frameworks. The chapter also discusses open challenges that can lead to new advances in this research area.

4

N. Sadoughi and C. Busso

Table 1 Some of the head motion roles identified by Heylen (2005) Head motion functions Show affirmation or negation Show inclusivity or intensification Organize the interaction Mark the listing Mark the contrast Show the level of understanding Mark uncertain statements Facilitate turn taking/giving Signal ground holding Signal the mood Signal shyness and hesitation Backchannel requests

State of the Art Head motion plays an important role during human communication. This section summarizes relevant studies describing the function of head motion during human interaction (section “Role of Head Motion in Human Interaction”), emphasizing the strong relationship with other verbal and nonverbal channels (section “Relation between Head Motion and Speech”).

Role of Head Motion in Human Interaction Heylen (2005) surveyed studies analyzing the role of head motion during human conversation, listing 25 different roles, including enhancing communicative attention, marking contrast between sentences, and communicating the degree of understanding. Table 1 lists some of these functions. It is clear that head movement is an essential part of body language, which facilitates human-human interaction not only while speaking, but also while listening (McClave 2000). Speakers use head movements to reinforce the meaning of the message. We often use emblems such as head nods for affirmations, head shakes for negation, and head tilt with words like “um,” “uh,” and “well” (Lee and Marsella 2006; Liu et al. 2012). Lee and Marsella (2006) investigated the nonverbal behaviors of individuals during dyadic interactions. They annotated the videos in terms of a set of discourse functions including affirmation, negation, contrast, intensification, inclusivity, obligation, listing, assumption, possibility, response, request, and word search. They found that generally there are nonverbal behavior patterns related to head motion accompanying these labels (e.g., head shake during negation, head nod during affirmation, head shake during the use of words such as “really,” and lateral head sweep during the use of words such as “everything,” “all,” and “whole”). Head

Head Motion Generation

5

motion is also used to parse syntactic information, creating visual markers to segment phrases within an utterance. Hadar et al. (1983) recorded head movements from four subjects during conversation, reporting that after removing pauses of more than 1 s, 58.8% of the still head poses occurred during speech pauses. Graf et al. (2002) observed frequently an initial head movement after a pause, which is followed by speech. Another important communicative function of head motion is to stress words, functioning as a visual marker for intonation (Graf et al. 2002; Moubayed et al. 2010). These aspects are important during human interaction. For example, Munhall et al. (2004) studied the effect of head movements on speech intelligibility. They conducted an evaluation where an animated face replicated the original head and face motions of recorded sentences. The task was to recognize speech in the presence of noisy audio. They evaluated two conditions for the animated face: with head motion and without head motion. They count the number of correctly identified syllables, showing improved performance when the animated face contained head motion. Head motion also plays a key role while listening, where people provide nonverbal feedback to the speaker using primarily their head movements. A common behavior is to nod to acknowledge active listening (McClave 2000). Ishi et al. (2014) analyzed the occurrences of head movements during the listeners’s backchannels such as “yes” and “uhm,” observing one or multiple head nods which were timely synchronized with the verbal backchannel. Head movements also convey the affective state of the speaker (Busso et al. 2007b; Busso and Narayanan 2007). In our previous work, we studied the displacement of head motions of an individual expressing different emotions: happiness, anger, sadness, and neutrality (Busso and Narayanan 2007). Our results showed significant difference in head motions across all emotions, except between happiness and anger. In another study (Busso et al. 2007a), we demonstrated that head motion behaviors are discriminative for emotion recognition. Using only global statistics derived from head motion trajectories at the sentence level, we were able to recognize between these four emotional states with 65.5% accuracy (performance at chance was 25%). This study also demonstrated the contribution of head motion on emotional perception. We generated animations of expressive sentences. The novelty of the approach was that we purposely created mismatches between the emotion in the sentence (e.g., happiness) and the emotion on the head motion sequence (e.g., sadness). The corpus used in this study has sentences read by an actor conveying the same emotions. Therefore, we were able to create these mismatches by temporally aligning the corresponding frames across emotions. The evaluators rated the emotional content in terms of activation, valence and dominance, using a five Likert-like scale for each emotional dimension. Figure 1 shows the results for valence (1– very positive; 5– very negative). The first bar in each of the plot represents the matched condition, where the emotion on the sentence match the emotion of the head motion sequence. The next three bars provide the perception achieved in mismatched conditions by changing the emotion of the head motion sequences. The bar “FIX” represents the perception achieved without any head motion. Finally, the bar “WAV” gives the perception achieved when the stimuli

6

N. Sadoughi and C. Busso

a

b

5

5

4

4

3

3

2

2

1

1 HAP

NEU

SAD

ANG

FIX

WAV

ANG NEU

Happiness

SAD

HAP

FIX

WAV

FIX

WAV

Anger

c

d

5

5

4

4

3

3

2

2

1

1 SAD

NEU

HAP

ANG

Sadness

FIX

WAV

NEU

SAD

HAP

ANG

Neutral

Fig. 1 These figures show the perceived valence (1: positive, 5: negative) for four emotional categories, when the head movements are generated with the same emotional class (i.e., matched condition, first bar), with three other emotions (i.e., mismatched condition, second to fourth bars), when the head is fixed (FIX), and when the evaluators only listened to the audio (WAV)

only included speech. These figures show that expressive head motion sequences change the emotional perception of the animation. For the neutral sentences, adding an angry head motion sequence makes the animation more negative, and adding a happy head motion sequence makes the animation more positive. Similarly, Lance and Marsella (2007) proposed to include emotional head movements during gaze shift when synthesizing the animations. The result of their study showed that people distinguished differences between high versus low level of arousal and high versus low level of dominance. These results indicate that modeling expressive behaviors in synthesizing head motion sequences is important. Head motion also affects perception of personality traits. Arellano et al. (2011) performed a perceptual evaluation on static images of a character with various head orientations and gazes. The result showed that people’s perception of personality

Head Motion Generation

7

traits such as agreeableness and emotional stability was affected by head orientation, while for gaze no significant difference was revealed. Arya et al. (2006) analyzed the effect of head movement and facial actions on the perceived personality by others. They performed perceptual evaluations on a set of animated videos to see the effect of visual cues on the perception of personality based on two parameters: affiliation and dominance. They created several videos, each with a specific facial action, and asked the evaluators to rank them with a set of attributes. The result of this study showed that dynamic head movements such as head tilt and eye gaze aversion communicate a sense of dominance for the character. Moreover, the results showed that the higher the frequency and intensity of the head movements are, the higher is their perceived level of dominance.

Relation Between Head Motion and Speech Data-driven models have the potential of capturing naturalistic variations of the behaviors (Foster 2007). One useful and accessible modality that can be used to drive facial behaviors is speech. Spoken language carries important information beyond the verbal message that a CA engine should capitalize on. Therefore, this chapter focuses the discussion on data-driven frameworks relying on speech features. Head motion and speech are intrinsically connected at various levels (Busso and Narayanan 2007). As mentioned in section “Role of Head Motion in Human Interaction,” head motion conveys visual markers of intonation, defining syntactic boundaries and stressed segments. As a result, speech features and head motion sequences are highly correlated. Several studies have reported a strong correlation between speech prosody features and head movements. Munhall et al. (2004) analyzed the correlation between head motion and the prosodic features including fundamental frequency and RMS energy. The study focused on recordings from a single subject. The correlation at the sentence level between head motion and the fundamental frequency was ρ ¼ 0:63 , and between head motion and the RMS energy was ρ ¼ 0:324. Kuratate et al. (1999) showed a correlation of ρ ¼ 0:88 between head motion and the fundamental frequency for an American English speaker. We also reported similar results in Busso et al. (2005) using pitch, intensity, and their first and second order derivatives as speech prosodic features. We evaluate the relationship between head and speech features using canonical correlation analysis (CCA). CCA projects two modalities with similar or different dimensions into a common space where their correlation is maximized. The CCA for head and speech features was ρ ¼ 0:7 at the sentence level highlighting the strong connection between them. The study was further extended in Busso and Narayanan (2007), observing similar results. Studies have also shown co-occurrence of head movements and speech prosody events. Graf et al. (2002) showed that although the amplitude and direction of the movements may vary according to idiosyncratic characteristics, semantic content of the message, and affective state of the speaker, there is a common synchrony of the timings between pitch accents and head events. Mcclave (2000) reported that there is

8

N. Sadoughi and C. Busso

Table 2 Perception of CAs synthesized with and without head motion reported in previous studies ranging from 1 (bad) to either 5 or 7 (great). Some of these values are approximated from figures in their corresponding publications Study Busso et al. (2007a) Mariooryad and Busso (2012) Welbergen et al. (2015)

Criterion Naturalness (1–5) Naturalness (1–5) Warmth (1–7) Competence (1–7) Human-likeness (1–7)

With head movement 3.61 ~2.90

Without head movement 3.10 2.32

~5.10 ~6.10 ~4.50

~4.55 ~5.90 ~4.50

co-occurrence between head movement patterns and the meaning of speech. For instance, head shakes happen during expressions of inclusivity and intensification. Lee and Marsella (2009) proposed a hidden Markov model (HMM) classifier to detect head nods based on features selected from speech including part of speech (PoS) (e.g., conjunction, proper noun, adverb, and interjection), dialog acts (e.g., backchannel, inform, suggest), phrases, and verb boundaries. Their classifier showed high performance, indicating a close connection between head nods, dialog acts, and timing of the uttered words.

Head Movement Synthesis It is important to generate head movements for CAs, due to its role in conveying the intended message, emotion, and level of rapport displayed by speakers. Table 2 lists some studies which have compared facial animations synthesized with and without head motion, in terms of naturalness, warmth, competence, and human-likeness. These studies show the clear benefit of using head motion. Head motion has three degrees of freedom (DOF) for rotation and three DOF for translation. While some methods consider all six DOF, most studies rely only on the three rotation angles (Fig. 2). The studies focused on the synthesis of head motion are usually based on rule-based or data-driven methods. Rule-based systems define several rules about the shape and timing of the head movements and use a predefined handcrafted dictionary of head gestures to synthesize them. While these gestures are usually selected based on the meaning of the message, their variations are limited to the list of gestures defined in the system. Also, local synchronization and timing of these gestures with speech is challenging. Data-driven methods utilize prerecorded motion databases. These methods usually concatenate the prerecorded motions to create a new realization or create them by sampling from the models trained on the recordings. Due to the correlation between head movements and speech prosody features, these methods usually consider speech prosody features in generating the

Head Motion Generation

9

Fig. 2 Three degrees of freedom for head motion rotation. Some studies also include three degrees of freedom for head translation

movements. This approach also facilitates the synchronization between speech and gestures, capturing subtle timing relations between prosody and head motion. Also, these methods have the potential to capture the range of motions seen in real recordings. However, their main drawback is that these methods disregard the meaning of the message while creating the movements. Therefore, the movements are not constrained to convey the same meaning as the speech and may even contradict the message (e.g., head nods for negations). Foster (2007) compared rule-based and data-driven generation of head movements. The result of this evaluation showed that people preferred facial animations generated with data-driven methods more than rule-based methods; however, the difference was not statistically significant. The study also concluded that the range of the displays for data-driven method was more similar to the original recordings than the displays obtained with rule-based systems. Rule-based systems and data-driven methods have key features that are ideal to synthesize human-like head motions. This section describes influential studies for rule-based systems (section “Rule-Based Methods”) and data-driven methods (section “Data-Driven Models”). It also summarizes efforts to create hybrid approaches which leverage the benefits of both of these methods (section “Hybrid Approaches”).

Rule-Based Methods Rule-based methods define rules for head movements to communicate the meaning more clearly. Table 3 summarizes some of the rules defined by previous studies. One of the most influential studies on rule-based systems was presented by Cassell et al. (1994). They designed a system to generate synchronized speech, intonations, and gestures (including head motion) by defining rules. For example, they generated head nods during emphatic segments or backchannels. Their system approximated gaze with head orientation, and, therefore, all the rules for gaze behaviors involved specific head motions. For example, the CA would look up during question, look at the listener during answer, look away at the beginning of a long turn, and look at the

10

N. Sadoughi and C. Busso

Table 3 This table provides a brief summary of the rules proposed in the previous studies which provide a mapping from the discourse function or intention to specific head movements Study Cassell et al. (1994)

Pelachaud et al. (1996)

Liu et al. (2012)

Gratch et al. (2006)

Marsella et al. (2013)

Mapping Lexical/emotional affiliate Backchannel

Head movement/pose Head nods

Emphasis Question Answer Beginning of turn Turn request Anger

Head nods Look up Look away Look away Look up Forward pose

Sadness Disgust Fear Sadness Surprise Backchannel End of question End of turn when giving the turn to the interlocutor Keeping a turn by a short pause Thinking, but keeping the turn Thinking and preparing the next utterance, e.g., “uhmm” Lowering of pitch of interlocutor

Downward Backward and up Backward Downward Backward Head nod Head nod Head nod

Raised loudness of interlocutor Speech disfluency of interlocutor Posture/gaze shift of interlocutor Nods or shakes of interlocutor Affirmation

Head nod Posture/gaze shift Mimic Mimic Big nod, tilt left nod, tilt right nod Shake, small shake Tilt right, tilt left Tilt half nod left Small nod

Negation Contrast Mental state Emphasis

Head nod Head tilt Head tilt Head nod

listener for short turns. The extension of this framework resulted in REA, a CA which responded with gestures to different discourse functions (Cassell et al. 1999). DeCarlo et al. (2004) presented RUTH, a platform architecture for embodied CAs. The inputs of this platform are enriched transcriptions with prosodic and gestural markers at the word level. The prosodic markers correspond to the tones and break indices (ToBI), which define pitch accents and boundary tones (Silverman

Head Motion Generation

11

et al. 1992). The gestural markers are predefined behaviors. For head motion, they defined 14 types of head motions. They considered variations of head nods (upward, downward, upward with some rightward, upward with some leftward, downward with some rightward, and downward with some leftward) and head tilts (clockwise, counter clockwise, clockwise with downward nodding, counter clockwise with downward nodding). They also defined gestures to move (forward, backward) or turn (to the right or to the left) the head. These gestures are then rendered, synchronizing the behaviors at the points specified by the tags. There are studies that have attempted to incorporate head motions conveying emotions using rule-based systems. Pelachaud et al. (1996) developed a system to generate expressive facial and head movements, as well as eye movements. As input, they used the transcriptions tagged with accents, the desired emotional state, and its intensity. They used head and eye movements as regulators, which facilitate the communication between the speaker and listener, in a rule-based manner. They also defined rules specifying the head direction depending on the target emotional state, following the results from previous psychological studies. The emotional head motion rules include moving forward during anger, downward during sadness, and backward during surprise. Marsella et al. (2013) proposed an emotionally aware rule-based system. Their framework relied on syntactic and acoustic analysis on the input speech consisting of either natural or synthesized speech. They used the acoustic analysis to find the emotional state and word emphasis and the syntactic analysis to find the appropriate category of behaviors to be synthesized for each communicative goal. Their proposed system also handles co-articulation for consecutive behaviors that are close in time, leading to novel realizations while transitioning from one gesture to another. To define the rules, there are studies investigating video recordings of human interaction, aiming to identify consistent patterns between movements (including head motion) and discourse features. Kipp (2003) analyzed human gestures in 23 clips of a TV show. They developed ANVIL, a video annotation toolkit to annotate the gestures. They found a common set of 15 gestures occurring across the two speakers. They defined these gesture profiles, including the position and orientation of the head and hands during the gestures. For synthesis, they automatically annotate the transcription with words, PoS, what the utterance is about (theme, rheme, and focus), and discourse relations (opposition, repetition, and listing). They used carefully designed rules to map these tags to a set of semantic tags. Using these semantic tags, and the statistics derived from their annotated corpus for these tags, they choose the most probable gesture considering local and global constraints. Following a similar approach, Liu et al. (2012) proposed a rule-based approach to appropriately generate head tilts and head nods, where the rules were derived by observing and analyzing human interaction data. First, they annotated the phrases in their database with a list of dialog acts, along with head nods and tilts. Second, they created a mapping between dialog acts and the corresponding head movements. They found frequent occurrences of head nods during backchannels and the last syllable of strong phrase boundaries. They also found head tilts during weak phrase boundaries and segments when the individual was either thinking or embarrassed.

12

N. Sadoughi and C. Busso

They exploited these relations in the generation of head nods and tilts for humanrobot interaction (HRI). They used a fixed shape trajectory for head nod and tilt, driven by the rules learnt from their corpus. They used perceptual evaluational to measure the perceived naturalness of the head movements visualized on robots. Their results showed improved naturalness when both head nods and tilts are incorporated in the system compared with the case when they used only head nods and when they used the original sequences. Generating head motion for CAs is not only important while speaking but also while listening. Showing rapport is one of the aspects that needs to be considered for generating believable CAs. Gratch et al. (2006) proposed a virtual listener called virtual rapport, which aims to create a sense of rapport with the user, by defining heuristic rules from previous psychological studies. For example, the CA nods whenever it senses that the users lower their pitch or raise their loudness. It also nods when the speaker nods. Similar rule-based systems include the work of Rickel and Johnson (1998) (Steven) André et al. (1996) (the DFKI Persona) Beskow and McGlashan (1997) (Olga), Lester et al. (1999) (pedagogical agents), and Smid et al. (2004). An important drawback of rule-based systems is that they cannot easily capture the rich and complex variability observed in natural recordings, resulting many times in repetitive behaviors (Foster 2007).

Data-Driven Models The second category of approaches to generate head motion corresponds to datadriven frameworks. Data-driven approaches usually utilize motion capture recordings of head motion trajectories. Table 4 summarizes some of these frameworks, highlighting the input that drives the approaches. Studies have created head movements by blending segments of recorded motion capture data. Chuang and Bregler (2005) designed a system to create emotional facial gestures. For head motion, they stored segments of pitch contours and their corresponding head motions. During synthesis, they searched for a combination of pitch contours in their stored libraries, finding the best matching contours. Then, they connected the corresponding sequence of head motion segments, re-sampling the sequence to match the timing of the input. Deng et al. (2004) proposed a similar approach using K Nearest Neighbors (KNNs). They stored the training audio and head motion trajectories indexed by the audio features and used dynamic programming to search for the most appropriate set of motion segments using seven nearest neighbors. They allowed the user to specify head poses for key frames, which were added as constraints in the dynamic programming search. They considered smoothness of the trajectory as one of the factors in their optimization process, avoiding sudden transitions. Le et al. (2012) proposed a framework based on Gaussian mixture model (GMM) to generate head motion driven by prosodic features (loudness and fundamental frequency). Their framework learns three separate joint GMMs, modeling the

Head Motion Generation

13

Table 4 Brief summary of data-driven methods proposed in previous studies. The table lists the corresponding input during testing and the approach used to synthesize the head motion sequences Study Chuang and Bregler (2005) Deng et al. (2004)

Input Pitch, target expressive style

Method KNN, path searching KNN, path optimization

Busso et al. (2005) Busso et al. (2007a) Sargin et al. (2008) Mariooryad and Busso (2012) Chiu and Marsella (2011) Levine et al. (2010)

Pitch, five formants, 13-MFCC, 12-LPC Pitch, intensity Pitch, intensity, emotion Pitch, intensity Pitch, intensity Pitch, intensity Pitch, intensity, syllable length

Le et al. (2012)

Pitch, intensity

CRBMs HCRFs, reinforcement learning GMMs

HMMs HMMs PHMMs DBNs

relation between speech prosody features and (1) head poses, (2) the velocity of head motion, and (3) the acceleration of head motion. They approximate the joint distribution of them, by assuming that they are independent (product of the probabilities provided by the GMMs). Having the head pose at the two previous frames, and the prosodic features for the current frame, they find the current head poses by maximizing the final joint distribution using gradient descent. A key advantage of this approach is that it can run online facilitating real-time implementation. There are other data-driven studies that use probabilistic modeling strategies that capture the temporal dynamic of head motion. Examples of these frameworks include HMMs and dynamic Bayesian networks (DBNs). For example, we presented a framework based on HMM for modeling the relationship between speech prosodic features and head motion (Busso et al. 2005). We used vector quantization to quantize the space of head motion and designed HMMs to learn the joint representation of head movements and speech prosodic features. Figure 3 gives the block diagram of this study, where the RMS energy, the fundamental frequency, and their first and second order derivatives were used to create a 6D feature vector. The HMMs represents head poses where their transitions were learned during training. The HMMs decode the most likely sequence of head poses given a speech signal, where head poses transitions that are common are rewarded and uncommon transitions are penalized. Given the discrete representation of head poses used in this study, we smoothed the angular trajectories of the generated head poses. This framework was very effective in generating head motion sequences that are timely aligned with prosodic information. Following this study, we extended the HMM approach to incorporate the relationship between prosody and head motion under different emotional states (Busso et al. 2007a, b). The results showed that the models were able to generate expressive head motions accompanying speech. Other studies have also proposed speech-driven models to synthesize head motion. Sargin et al. (2008) used parallel HMMs (PHMMs) to jointly model speech and head movements

14

N. Sadoughi and C. Busso

Feature Extraction

Head Motion Synthesis

Spherical Cubic Interpolation

HMM Sequence Generator

+

Vector Quantization

Noise Generation

Fig. 3 The block diagram of the approach proposed by Busso et al. (2005), where HMMs are used to synthesize head motion sequences driven from speech

a

b

(Sargin et al. 2008)

c

(Mariooryad and Busso 2012)

(Taylor et al. 2006)

Fig. 4 (a) PHMM proposed by Sargin et al. (2008) to model the relationship between head movement primitives and speech prosody, (b) The DBN model proposed by Mariooryad and Busso (2012) to jointly model the relationship between head and eyebrow movements with speech prosody features, and (c) CRBM proposed by Taylor et al. (2006) to learn the human motion

by simultaneously clustering and segmenting the two modalities. PHMM consists of several left-to-right HMMs, where each branch models a head motion primitive automatically extracted from the data (see Fig. 4a). PHMM jointly solves the segmentation and clustering of head motion sequences. In their study, they found the most probable state sequence and their corresponding head motion values for a given speech signal. DBNs are another suitable framework to capture the relation between head motion and speech. DBN is a generative model that provides the flexibility to impose different structures by introducing nodes representing variables and direct links representing conditional dependencies between the variables. Therefore, it can model the dependencies between two temporal sequences in a principled way. Notice that HMM is a particular type of DBN. We have demonstrated the potential of DBNs to model the relation between head motion and speech. Mariooryad and Busso (2012) designed several structures of DBNs to capture the joint representation

Head Motion Generation

15

of speech with not only head movements but also eyebrow movements. Figure 4b shows an example, where the Head&Eyebrow node represents a jointly discrete state describing eyebrow and head motion. During training, all the variables are available. During synthesis, the Head&Eyebrow node is not available but is approximated by propagating the evidences from the Speech node. The animations generated with this method were compared with subjective and objective metrics demonstrating the need to jointly model eyebrow and head motion together. There are data-driven methods relying on conditional restricted Boltzmann machines (CRBMs) (Chiu and Marsella 2011). CRBM provides an efficient nonlinear tool for modeling the global dynamics and local constraints of a temporal signal (see Fig. 4c). Given N þ 1 frames, the model learns the mapping between the visible units and hidden layers, which will reconstruct missing observation during synthesis. For synthesis, the model takes the first N frames, aiming to estimate the N þ 1 frame. In addition, the auto regressive connections between the previous frames and the current frame learn the temporal constraints of the data. During synthesis, this model generates the ðN þ 1Þth sample using contrastive divergence, based on the previous N frames. These properties make CRBM very useful for predicting and generating temporal sequences. Taylor et al. (2006) demonstrated the benefits of using this framework for modeling human motion trajectories (e.g., walking). They used CRBM with auto regressive connections for predicting the human motion pose for the next frame, given the previous N frames. Following this study, Taylor and Hinton (2009) proposed to add an extra variable to constrain the CRBM’s generation based on specific stylized walk sequences such as drunk, strong, and graceful. The success of this framework in this domain motivated Chiu and Marsella (2011) to use a variation of CRBM to generate head motion sequences. They proposed a hierarchical factored conditional RBM (HFCRBM), which predicts the current head pose based on the previous two poses, constrained on speech prosody features. The aforementioned studies utilized either the concatenation approach or statistical models to learn the relation between head motion and speech. Levine et al. (2010) combined both strategies by using hidden conditional random fields (HCRFs) to model the relationship between a set of kinematic features of joint movements and speech prosodic features. The premise of the study is that prosody is related to the head motion kinematic rather than the actual head motion. For synthesis, they inferred the kinematic features based on the prosodic features. Next, they searched through the recordings, in an online manner (forward path), using a cost function that incorporates the inferred kinematic features. They used Markov decision process (MDP) to ensure smoothness in the head motion trajectories.

Hybrid Approaches Combining rule-based and data-driven approaches to exploit the benefits from each method results in an enhanced system. Several studies have focused on bridging the gap between these two methods (Huang et al. 2011; Sadoughi and Busso 2015;

16

N. Sadoughi and C. Busso

a

b

(Sadoughi and Busso 2014)

(Sadoughi et al. 2015)

Fig. 5 Dynamic Bayesian Network of the proposed constrained models (Sadoughi and Busso 2015; Sadoughi et al. 2014). The systems generate behaviors constrained by (a) the underlying discourse function or (b) the target gesture

Sadoughi et al. 2014; Stone et al. 2004). We describe some of these studies in this section. Stone et al. (2004) proposed a system to generate head and body movements, by concatenating prerecorded audio and motion units. The key aspect of the approach is that the units are associated with communicative functions or intents. Therefore, they can decompose any new utterance into units and solve a dynamic search through their data to find the best combination matching the intended communicative function. Since the approach uses a concatenative framework, the dynamic search also smoothes transition between speech segments and between motion sequences. It also synchronizes emphasis points across speech and gestures. To achieve this goal, they annotate the emphasis segments on their recordings. During testing, they provide tags describing emphasis on the transcriptions, coordinating the emphasis on motion sequences to start and end at the corresponding frames. The intermediate frames, which are the frames in between the emphatic points, are derived by interpolation. The limitation of this work is that the variations of speech and motion sequences are limited to the indexed phrases found in the recordings. There are other studies that have combined rule-based and data-driven approaches by adding meaningful constraints to their models. We have designed a speech-driven model to synthesize eyebrow and head movements constrained by discourse functions (Sadoughi and Busso 2015; Sadoughi et al. 2014). Figure 5a describes the structure of our first model (Sadoughi et al. 2014), built upon the DBN model proposed by Mariooryad and Busso (2012) (see Fig. 4b). In this structure, the Constraint node is added as a child of the hidden state Hh &e, which controls the dependency between speech and head and eyebrow motion. During training and synthesis, the Constraint node is given as input, which dictates the behaviors generated by the system. This study used the IEMOCAP database (Busso et al. 2008), which consists of dyadic interaction between two actors. We manually annotated two discourse functions corresponding to affirmation and question. The models were trained and tested with data from a single subject. The results showed

Head Motion Generation

17

Fig. 6 MSP-AVATAR, a corpus designed to generate behaviors constrained by the communicative function of the message. The figure shows the placement of the reflective markers, the skeleton used to reconstruct the data, and the setting of the recordings

that evaluators preferred the constrained models for questions. For affirmation, the results were not conclusive. A challenge in creating models constrained by the semantic meaning of the sentence is the lack of motion capture databases with appropriate annotations for discourse functions. To overcome this limitation, we recorded the MSP-AVATAR corpus (Sadoughi et al. 2015), a motion capture corpus of dyadic interactions, capturing facial expressions and upper body motion, including head motion (Fig. 6). In each session, two actors improvised scenarios carefully designed to include a set of the following communicative functions: contrast, affirmation, negation, question, uncertainty, suggest, warn, inform. We also considered scenarios to include iconic gestures for words such as large and small, and deictic gestures for pronouns such as you and I. This corpus contains the audio and video of both authors and motion capture recordings from one of them. Figure 6 shows the placement of the markers, the marker’s skeleton, and the setting of the recordings of the corpus. Using this corpus, we are currently extending our framework to combine rule-based and data-driven models by considering these discourse functions. An alternative framework to bridge rule-based and data-driven models is to generate the behaviors dictated by the predefined rules using data-driven models. To understand how this framework works, consider the SAIBA framework proposed by Kopp et al. (2006). SAIBA is a behavior generation framework for embodied conversational agents (ECAs) composed of three layers: intent planning, behavior planning, and behavior realization. The first two layers define the intent of the message and the gestures required to convey the communicative goal. We envision rule-based systems to create these layers. The last layer generates the intended behavior by setting the amplitude and timing constraints. We envision data-driven models to create this layer. Data-driven models will generate novel realization of specific gestures defined by the behavior planning layer. We have explored this approach for head and hand gestures (Sadoughi and Busso 2015). Figure 5b illustrates the proposed system, where the Constraint node is placed as a parent of the

18

N. Sadoughi and C. Busso

Fig. 7 This figure shows the overall block diagram of the method proposed by Sadoughi and Busso (2015) to retrieve arbitrary prototypical head movements. The approach only requires few examples of the target behaviors

hidden state Hh &e. The key novelty is that the constraints correspond to specific behaviors. For head motion, we only considered head nods and head shakes, but the system is flexible to incorporate other behaviors. Notice that we need several examples of the target behaviors to train the proposed model. We addressed this key problem with a semi-supervised approach to retrieve examples of the target behaviors from the database. This framework, illustrated in Fig. 7, requires few samples for training, which are used to automatically retrieve similar examples from the database. The first step searches for possible matches using one-class support vector machine (SVM). We use temporal reduction and multi-scale windows to handle similar gestures with different durations. The classifiers are fast. They are set to identify many candidates segments conveying the target gesture. The second step uses dynamic time alignment kernel (DTAK) to improve the precision of the system by removing samples that are not similar to the given examples. We use the retrieved samples to train the speech-driven framework described in Fig. 5b, generating novel data-driven realization of the target behaviors (e.g., head shakes and head nods). Another interesting domain to combine rule-base and data-driven systems is in the generation of behaviors while the CAs are listening. As we mentioned in section “Rule-Based Methods,” Gratch et al. (2006) proposed a rule-based systems to generate a virtual rapport. Their team extended their framework, entitled virtual rapport 2.0, by shifting their approach towards a more data-driven approach (Huang et al. 2011). The approach relies on an interesting data collection design described in Huang et al. (2010) to analyze human responses during an interaction. They collected data from subjects watching a story teller in a prerecorded video. Their task was to press a key each time they felt backchannel (verbal and nonverbal feedbacks, such as head nods, “uh-huh,” or “OK”) were appropriate. The subjects were informed of the interaction goal, which was to promote rapport. They collect multiple subjects under the same interaction to separate the idiosyncratic responses from essential responses. Using these recordings, they train a conditional random field (CRF) model, which uses these recorded videos to predict when and how to generate backchannels. The input of their system includes pause and the user’s eye gaze, generating different types of nodding as output. For predicting the end of the speaking turn, they defined rules using the verbal and nonverbal cues observed in their data. To make the CA system more friendly, they embedded a smile detector,

Head Motion Generation

19

smiling whenever the system detects that the user smiles. This version created a higher sense of rapport in the speakers.

Open Challenges Generating meaningful head motion sequences conveying the range of behaviors observed during human interaction is an important problem. This area offers interesting challenges that future research should address. We describe some of these challenges in this section.

Speech-Driven Models Using Synthetic Speech An important limitation for speech-driven methods is the assumption that natural speech is available to synthesize head motion. Having prerecorded audio for each sentence spoken by the CA is not realistic in many domains. Instead, text-to-speech (TTS) systems provide the flexibility to scale the system beyond prerecorded sentences. An advantage for rule-based systems is that the rules are generally derived from transcriptions instead of speech features. Therefore, they can easily handle CAs using synthetic speech. For speech-driven frameworks, however, the models rely on acoustic features derived from natural speech. Using synthetic speech is a major limitation. There are very few studies that have addressed this problem. Welbergen et al. (2015) provided a framework to generate head movements for a CA driven by synthetic speech. They used the probabilistic model proposed by Le et al. (2012). They tested their framework with synthetic speech, performing subjective evaluation to assess the warmth, competence, and human-likeness of their animations. The results showed that adding head movements by using an online implementation of their framework increases the perception level of these social attributes. Although the approach proposed by Welbergen et al. (2015) uses synthetic speech, their system has a mismatch between train and test conditions. During training, the models are built with original speech. During synthesis, the models are driven by features extracted from synthetic speech. Features extracted from synthetic speech do not have the same dynamic range as features derived from original speech. Given these clear differences in the feature space between natural and synthetic speech, this mismatch produces very limited range of behaviors. We are investigating systematic approaches to address this problem by using adaptation techniques that reduce the mismatch between train and test conditions and increase the range of behaviors generated by the models (Sadoughi and Busso 2016). Solving this problem can dramatically increase the application domain where speech-driven animation can be used.

20

N. Sadoughi and C. Busso

Exploring Entrainment As discussed in section “Role of Head Motion in Human Interaction,” head motion conveys the emotional state of the message (Busso et al. 2007a). An open challenge is to identify effective frameworks to generate head motion sequences that elicit a target emotion. While predefined rules can be used (Pelachaud et al. 1996), datadriven frameworks may provide more realistic sequences (Busso et al. 2007a). These systems, which are able to convey expressive behaviors, open opportunities to explore entrainment effects between the user and the CA. Entrainment is the phenomenon during human interaction where interlocutors mirror the behaviors of other. This phenomenon, which affect lexical, prosodic, and gestural cues, has been also observed during human computer/robot interaction (Bell et al. 2003; Breazeal 2002). Interestingly, we have also observed entrainment effects on emotional behaviors during dyadic interactions (Mariooryad and Busso 2013). Can a CA manipulate its emotional reactions to affect the affective state of the user? The head movement of the CA can be modulated with the appropriate emotional cues to increase the emotional entrainment with the user. Attempting to capture this subtle communicative aspect can lead to more effective CAs with better rapport with the user. Although Huang et al. (2011) proposed to make the CA more friendly by producing smiles as a response of smiles from users, this area offers opportunities to systematically design interfaces beyond that, leveraging the findings from entrainment studies (Jakkam and Busso 2016). The first step in this direction requires investigation of emotional entrainment in human conversations (Mariooryad and Busso 2013; Xiao et al. 2015). The investigation can be used to create appropriate affective cues for the CAs, increasing the emotional entrainment by developing models that incorporate relevant factors.

Modeling Personality Since individual differences play an important role in the range of head motion shown during human interaction (Youssef et al. 2013), the model to synthesize head motion sequences should carefully consider personality and idiosyncratic differences. Note that personality and emotional displays are interconnected. For instance, an introvert and an extrovert person will express their emotions differently under the same circumstance. There are studies proposing framework to incorporate personality traits in their CAs, especially for rule-based methods. Using a rule-based strategy, Poggi et al. (2005) proposed to modulate the goals of the message according to the personality traits of the ECA. Kipp (2003) proposed to investigate the gesture profiles displayed by two speakers. They found that the gestures of the speakers were different in important ways. There were some gestures only used by one of the speakers. They also found important differences in the frequency of their gestures, the timing patterns, and the mapping functions used to link semantic tag to actual gestures. They utilized all these aspects to personalize their animated characters. However, this was a limited study, and more effort is required to generalize the models to a broader range of personalities.

Head Motion Generation

21

Joint Models to Integrate Head Motion with Other Gestures Another remaining challenge is how to integrate the generated head movements with the movements of other parts of the body. There is a high synchrony between head movements and facial gestures. For example, we have reported a high CCA between head and eyebrow movements ( ρ ¼ 0:89 ) (Mariooryad and Busso 2012). When individual speech-driven models are separately used to synthesize individual behaviors, the relationship between these behaviors may not be preserved. For example, we can perfectly capture the timing relationship between speech and head motion and between speech and eyebrow motion. However, the generated head and eyebrow motion may not be perceived realistic when rendering the CA, as these behaviors may fail to capture the relation between head and eyebrow motion. In Mariooryad and Busso (2012), we proposed to jointly model head and eyebrow motion in a speech-driven framework. The result of this study showed that people preferred the animations synthesized by the joint model rather than the ones where the behaviors were independently generated. Capturing these subtle dependencies is not only important for generating realistic behaviors but also for conveying synthesized behaviors that increase speech intelligibility (Munhall et al. 2004). Having a datadriven model which incorporates all these relations will result in a more convincing animation. The challenge is that modeling more modalities will increase the complexity of the model. Extending data-driven approaches without significantly increasing their complexity is an open challenge.

Conclusions This chapter gives an overview on the studies relevant to head motion generation. We started by reviewing the importance of head motion in human interactions. Head movements play an important role in face to face communication. They provide semantic and syntactic cues while speaking. We use head motion as a backchannel while listening to other. They play an important role in conveying personality and emotional traits. The functions are important for communication, so realistic CAs should have well-designed head motion sequences that are timely synchronized with speech. The chapter overviewed different methods to generate head movements, which can be categorized into two main approaches: rule-based and data-driven frameworks. Rule-based methods rely on heuristic rules to generate the head movements based on the underlying communicative goal of the message. Data-driven methods rely on recorded head motion sequences to generate new instances. Within datadriven methods, we focused the review on speech-driven frameworks which leverages the close relationship between prosody and head motion. Rule-based and datadriven methods have their own advantages and disadvantages. We reviewed hybrid approaches which have attempted to bridge the gap between these methods, overcoming their limitations.

22

N. Sadoughi and C. Busso

There are still open challenges in generating realistic head motion sequences. We discussed opportunities which we believe can result in head motion sequences that are more effective and engaging. Previous studies have built the foundation to understand better the role of head motion. They have also provided convincing frameworks to generate human-like head motion sequences. They offer a perfect platform for future studies to advance even more this research area. Acknowledgments This work was funded by National Science Foundation under grant IIS-1352950.

References André E, Müller J, Rist T (1996) The PPP persona: a multipurpose animated presentation agent. In: Workshop on advanced visual interfaces, Gubbio, pp 245–247 Arellano D, Varona J, Perales FJ, Bee N, Janowski K, André EE (2011) Influence of head orientation in perception of personality traits in virtual agents. In: The 10th international conference on autonomous agents and multiagent systems-Volume 3, Taipei, pp 1093–1094 Arya A, Jefferies L, Enns J, DiPaola S (2006) Facial actions as visual cues for personality. Comput Anim Virtual Worlds 17(3–4):371–382 Bell L, Gustafson J, Heldner M (2003) Prosodic adaptation in human-computer interaction. In: 15th international congress of phonetic sciences (ICPhS 03), Barcelona, pp 2453–2456 Beskow J, McGlashan S (1997) Olga – a conversational agent with gestures. In: Proceedings of the IJCAI 1997 workshop on animated interface agents: making them intelligent, Nagoya Breazeal C (2002) Regulation and entrainment in human-robot interaction. Int J Robot Res 21 (10–11):883–902 Busso C, Narayanan S (2007) Interrelation between speech and facial gestures in emotional utterances: a single subject study. IEEE Trans Audio, Speech Lang Process 15(8):2331–2347 Busso C, Deng Z, Neumann U, Narayanan S (2005) Natural head motion synthesis driven by acoustic prosodic features. Comput Anim Virtual Worlds 16(3–4):283–290 Busso C, Deng Z, Grimm M, Neumann U, Narayanan S (2007a) Rigid head motion in expressive speech animation: analysis and synthesis. IEEE Trans Audio, Speech Lang Process 15 (3):1075–1086 Busso C, Deng Z, Neumann U, Narayanan S (2007b) Learning expressive human-like head motion sequences from speech. In: Deng Z, Neumann U (eds) Data-driven 3D facial animations. Springer-Verlag London Ltd, Surrey, pp 113–131 Busso C, Bulut M, Lee C, Kazemzadeh A, Mower E, Kim S, Chang J, Lee S, Narayanan S (2008) IEMOCAP: Interactive emotional dyadic motion capture database. J Lang Resour Eval 42 (4):335–359 Cassell J, Pelachaud C, Badler N, Steedman M, Achorn B, Bechet T, Douville B, Prevost S, Stone M (1994) Animated conversation: rule-based generation of facial expression gesture and spoken intonation for multiple conversational agents. In: Computer graphics (Proc. of ACM SIGGRAPH’94), Orlando, pp 413–420 Cassell J, Bickmore T, Billinghurst M, Campbell L, Chang K, Vilhjalmsson H, Yan H (1999) Embodiment in conversational interfaces: Rea. In: International conference on human factors in computing systems (CHI-99), Pittsburgh, pp 520–527 Chiu C-C, Marsella S (2011) How to train your avatar: a data driven approach to gesture generation. In: Intelligent virtual agents, Reykjavik, pp 127–140 Chiu C-C, Morency L-P, Marsella S (2015) Predicting co-verbal gestures: a deep and temporal modeling approach. In: Intelligent virtual agents, Delft, pp 152–166

Head Motion Generation

23

Chuang E, Bregler C (2005) Mood swings: expressive speech animation. ACM Trans Graph 24 (2):331–347 DeCarlo D, Stone M, Revilla C, Venditti JJ (2004) Specifying and animating facial signals for discourse in embodied conversational agents. Comput Anim Virtual Worlds 15(1):27–38 Deng Z, Busso C, Narayanan S, Neumann U (2004) Audio-based head motion synthesis for avatarbased telepresence systems. In: ACM SIGMM 2004 workshop on effective telepresence (ETP 2004). ACM Press, New York, pp 24–30 Foster ME (2007) Comparing rule-based and data-driven selection of facial displays. In: Workshop on embodied language processing, association for computational linguistics, Prague, pp 1–8 Graf HP, Cosatto E, Strom V, Huang FJ (2002) Visual prosody: facial movements accompanying speech. In: Proceedings of IEEE international conference on automatic faces and gesture recognition, Washington, DC, pp 396–401 Gratch J, Okhmatovskaia A, Lamothe F, Marsella S, Morales M, van der Werf R, Morency L (2006) Virtual rapport. In: 6th international conference on intelligent virtual agents (IVA 2006), Marina del Rey Hadar U, Steiner TJ, Grant EC, Rose FC (1983) Kinematics of head movements accompanying speech during conversation. Hum Mov Sci 2(1):35–46 Heylen D (2005) Challenges ahead head movements and other social acts in conversation. In: Artificial intelligence and simulation of behaviour (AISB 2005), social presence cues for virtual humanoids symposium, page 8, Hertfordshire Huang L, Morency L-P, Gratch J (2010) Parasocial consensus sampling: combining multiple perspectives to learn virtual human behavior. In: Proceedings of the 9th international conference on autonomous agents and multiagent systems: volume 1-volume 1, Toronto, pp 1265–1272 Huang L, Morency L-P, Gratch J (2011) Virtual rapport 2.0. In: Intelligent virtual agents, Reykjavik, pp 68–79 Ishi CT, Ishiguro H, Hagita N (2014) Analysis of relationship between head motion events and speech in dialogue conversations. Speech Commun 57:233–243 Jakkam A, Busso C (2016) A multimodal analysis of synchrony during dyadic interaction using a metric based on sequential pattern mining. In: IEEE international conference on acoustics, speech and signal processing (ICASSP 2016), Shanghai, pp 6085–6089 Kipp M (2003) Gesture generation by imitation: from human behavior to computer character animation. PhD thesis, Universität des Saarlandes, Saarbrücken Kopp S, Krenn B, Marsella S, Marshall AN, Pelachaud C, Pirker H, Thórisson KR, Vilhjálmsson H (2006) Towards a common framework for multimodal generation: the behavior markup language. In: International conference on intelligent virtual agents (IVA 2006), Marina Del Rey, pp 205–217 Kuratate T, Munhall KG, Rubin PE, Vatikiotis-Bateson E, Yehia H (1999) Audio-visual synthesis of talking faces from speech production correlates. In: Sixth European conference on speech communication and technology, Eurospeech 1999, Budapest, pp 1279–1282 Lance B, Marsella SC (2007) Emotionally expressive head and body movement during gaze shifts. In: Intelligent virtual agents, Paris, pp 72–85 Le BH, Ma X, Deng Z (2012) Live speech driven head-and-eye motion generators. IEEE Trans Vis Comput Graph 18(11):1902–1914 Lee J, Marsella S (2006) Nonverbal behavior generator for embodied conversational agents. Intell Virtual Agents 4133:243–255 Lee JJ, Marsella S (2009) Learning a model of speaker head nods using gesture corpora. In: Proceedings of the 8th international conference on autonomous agents and multiagent systems-volume 1, volume 1, Budapest, pp 289–296 Lester J, Stone B, Stelling G (1999) Lifelike pedagogical agents for mixed-initiative problem solving in constructivist learning environments. User Model User-Adap Inter 9(1–2):1–44 Levine S, Krähenbühl P, Thrun S, Koltun V (2010) Gesture controllers. ACM Trans Graph 29 (4):1–124

24

N. Sadoughi and C. Busso

Liu C, Ishi CT, Ishiguro H, Hagita N (2012) Generation of nodding, head tilting and eye gazing for human-robot dialogue interaction. In: Human-Robot interaction (HRI), 2012 7th ACM/IEEE international conference on, Boston, pp 285–292 Mariooryad S, Busso C (2012) Generating human-like behaviors using joint, speech-driven models for conversational agents. IEEE Trans Audio, Speech Lang Process 20(8):2329–2340 Mariooryad S, Busso C (2013) Exploring cross-modality affective reactions for audiovisual emotion recognition. IEEE Trans Affect Comput 4(2):183–196 Marsella S, Xu Y, Lhommet M, Feng A, Scherer S, Shapiro A (2013) Virtual character performance from speech. In ACM SIGGRAPH/Eurographics symposium on computer animation (SCA 2013), Anaheim, pp 25–35 Marsi E, van Rooden F (2007) Expressing uncertainty with a talking head. In: Workshop on multimodal output generation (MOG 2007), Aberdeen, pp 105–116 McClave EZ (2000) Linguistic functions of head movements in the context of speech. J Pragmat 32 (7):855–878 Moubayed SA, Beskow J, Granström B, House D (2010) Audio-visual prosody: perception, detection, and synthesis of prominence. In: COST 2102 training school, pp 55–71 Munhall KG, Jones JA, Callan DE, Kuratate T, Vatikiotis-Bateson E (2004) Visual prosody and speech intelligibility: head movement improves auditory speech perception. Psychol Sci 15 (2):133–137 Pelachaud C, Badler N, Steedman M (1996) Generating facial expressions for speech. Cognit Sci 20 (1):1–46 Poggi I, Pelachaud C, de Rosis F, Carofiglio V, de Carolis B (2005) Greta. a believable embodied conversational agent. In: Stock O, Zancanaro M (eds) Multimodal intelligent information presentation, Text, speech and language technology. Springer Netherlands, Dordrecht, pp 3–25 Rickel J, Johnson WL (1998) Task-oriented dialogs with animated agents in virtual reality. In: Workshop on embodied conversational characters, Tahoe City, pp 39–46 Sadoughi N, Busso C (2015) Retrieving target gestures toward speech driven animation with meaningful behaviors. In: International conference on Multimodal interaction (ICMI 2015), Seattle, pp 115–122 Sadoughi N, Busso C (2016) Head motion generation with synthetic speech: a data driven approach. In: Interspeech 2016, San Francisco, pp 52–56 Sadoughi N, Liu Y, Busso C (2014) Speech-driven animation constrained by appropriate discourse functions. In: International conference on multimodal interaction (ICMI 2014), Istanbul, pp 148–155 Sadoughi N, Liu Y, Busso C (2015) MSP-AVATAR corpus: motion capture recordings to study the role of discourse functions in the design of intelligent virtual agents. In: 1st international workshop on understanding human activities through 3D sensors (UHA3DS 2015), Ljubljana Sargin ME, Yemez Y, Erzin E, Tekalp AM (2008) Analysis of head gesture and prosody patterns for prosody-driven head-gesture animation. IEEE Trans Pattern Anal Mach Intell 30(8):1330–1345 Silverman K, Beckman M, Pitrelli J, Ostendorf M, Wightman C, Price P, Pierrehumbert J, Hirschberg J (1992) ToBI: a standard for labelling english prosody. In: 2th international conference on spoken language processing (ICSLP 1992), Banff, pp 867–870 Smid K, Pandzic I, Radman V (2004) Autonomous speaker agent. In: IEEE 17th international conference on computer animation and social agents (CASA 2004), Geneva, pp 259–266 Stone M, DeCarlo D, Oh I, Rodriguez C, Stere A, Lees A, Bregler C (2004) Speaking with hands: creating animated conversational characters from recordings of human performance. ACM Trans Graph (TOG) 23(3):506–513 Taylor GW, Hinton GE (2009) Factored conditional restricted Boltzmann machines for modeling motion style. In: Proceedings of the 26th annual international conference on machine learning, Montreal, pp 1025–1032 Taylor GW, Hinton GE, Roweis ST (2006) Modeling human motion using binary latent variables. Adv Neural Inf Process Syst 1345–1352

Head Motion Generation

25

Welbergen H, Ding Y, Sattler K, Pelachaud C, Kopp S (2015) Real-time visual prosody for interactive virtual agents. In: Intelligent virtual agents, Delft, pp 139–151 Xiao B, Georgiou P, Baucom B, Narayanan S (2015) Modeling head motion entrainment for prediction of couples’ behavioral characteristics. In: Affective computing and intelligent interaction (ACII), 2015 international conference on, Xi’an, pp 91–97 Youssef AB, Shimodaira H, Braude DA (2013) Head motion analysis and synthesis over different tasks. Intell Virtual Agents 8108:285–294

Hand Gesture Synthesis for Conversational Characters Michael Neff

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . State of the Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gesture Generation Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gesture Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gesture Animation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Additional Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2 4 4 4 6 8 8 9

Abstract

This chapter focuses on the generation of animated gesticulations, co-verbal gestures that are designed to accompany speech. It begins with a survey of research on human gesture, discussing the various forms of gesture, their structure, and timing requirements relative to speech. The two main problems for synthesizing gesture animation are determining what gestures a character should perform (the specification problem) and then generating appropriate motion (the animation problem). The specification problem has used a range of input, including speech prosody, spoken text, and a communicative intent. Both rule-based and statistical approaches are employed to determine gestures. Animation has also used a range of procedural, physics-based, and data-driven approaches in order to solve a significant set of expressive and coordination requirements. Fluid gesture animation must also reflect the context and include listener behavior and floor management. This chapter concludes with a discussion of future challenges.

M. Neff (*) Department of Computer Science & Program for Cinema and Digital Media, University of California – Davis, Davis, CA, USA e-mail: [email protected] # Springer International Publishing Switzerland 2016 B. Müller, S.I. Wolf (eds.), Handbook of Human Motion, DOI 10.1007/978-3-319-30808-1_5-1

1

2

M. Neff

Keywords

Gesture • Character animation • Nonverbal communication • Virtual agents • Embodied conversational agents

Introduction Do gestures communicate? Yes, they do. This has been the conclusion of several meta-studies on the impact of gesture (Goldin-Meadow 2005; Hostetter 2011; Kendon 1994). It is also one of the distinguishing features of gestures in animation. While all movement communicates to some degree, gestures often play a role that is explicitly communicative. Another distinguishing feature for the gestures that we are most often interested in is that they are co-verbal. That is, they occur with speech and they are inextricably linked to that speech in both content and timing. McNeill argues that gestures and language are not separate, but gestures are part of language (McNeill 2005). There are different forms of movement that can broadly be called “gesture.” Building on the categories of Kendon (1988), McNeill defined “Kendon’s Continuum” (McNeill 1992, 2005) to capture the range of gesture types people employ: • Gesticulation: gesture that conveys a meaning related to the accompanying speech. • Speechlike gestures: gestures that take the place of a word(s) in a sentence. • Emblems: conventionalized signs, like a thumbs-up. • Pantomime: gestures with a story and are produced without speech. • Sign language: signs are lexical words. As you move along the continuum, the degree to which speech is obligatory decreases, and the degree to which gestures themselves have the properties of a language increases. This chapter will focus on gesticulations, which are gestures that co-occur with speech as they are most relevant to conversational characters. Synthesis of the whole spectrum, however, presents worthwhile animation problems. Emblems and pantomimes are useful in situations where speech may not be possible. Sign languages are the native language of many members of the deaf community, and sign synthesis can increase their access to computational sources. The problems of gesticulations are unique, however, since they are co-present with speech and do not have linguistic structure on their own. Kendon introduced a three-level hierarchy to describe the structure of gestures (Kendon 1972). The largest structure is the gesture unit. Gesture units start in a retraction or rest pose, continue with a series of gestures, and then return to a rest pose, potentially different from the initial rest pose. A gesture phrase encapsulates an individual gesture in this sequence. Each gesture phrase can in turn be broken down into a sequence of gesture phases. A preparation is a motion that takes the hands to the required position and orientation for the start of the gesture stroke. A

Hand Gesture Synthesis for Conversational Characters

3

prestroke hold is a period of time in which the hands are held in this configuration. The stroke is the main meaning carrying movement of the gesture and has the most focused energy. It may be followed by a poststroke hold in which the hands are held at the end position. The final phase is a retraction that returns the hands to a rest pose. All phases are optional except the stroke. There are some gestures in which the stroke does not involve any movement (e.g., a raised index finger). These are variously called an independent hold (Kita et al. 1998) or a stroke hold (McNeill 2005). The pre- and poststroke holds were proposed by Kita (1990) and act to synchronize the gesture with speech. The prestroke hold delays the gesture stroke until the corresponding speech begins, and the poststroke hold occurs while the corresponding speech is completing. Much like they allow mental processing in humans, they can be used in synthesis systems to allow time for planning or other processing to take place. The existence of gesture units is important for animation systems as it indicates a potential need to avoid generating a sequence of singleton gestures that return to a rest pose after each gesture. While this would offer the simplest synthesis solution, people are quite sensitive to the structure of gestural communication. A study (Kipp et al. 2007) showed that people found a character that used multiple phrase gesture units more natural, friendly, and trustworthy than a character that performed singleton gestures, which was viewed as more nervous. These significant differences in appraisal occurred despite only 1 of 25 subjects being able to actually identify the difference between the multiphrase g-unit clips and single phrase g-unit clips. This illustrates what appears to be a common occurrence in our gesture research: people will react to differences in gesture performance without being consciously aware of what those differences are. Gestures are synchronized in time with their co-expressive speech. About 90 % of the time, the gesture occurs slightly before the co-expressive speech (Nobe 2000) and rarely occurs after (Kendon 1972). Research on animated characters does indicate a preference for this slightly earlier timing of gesture, but also suggests that people may not be particularly sensitive to errors in timing, at least within a +/ .6 second range (Wang and Neff 2013). A number of categorizations of gesture have been proposed. One of the best known is from McNeill and Levy (McNeill 1992; McNeill and Levy 1982) and contains the classes iconics, metaphorics, deictics, and beats. Iconic gestures create images of concrete objects or actions, such as illustrating the size of a box. Metaphorics create images of the abstract. For instance, a metaphoric gesture could make a cup shape with the hand, but refer to holding an idea rather than an actual object. Metaphoric gestures are also used to locate ideas spatially, for instance, putting positive things on the left and negative to the right and then using this space to categorize future entities in the conversation. Deictics locate objects and entities in space, as with pointing, creating a reference and context for the conversation. They are often performed with a hand that is closed except for an extended index finger, but can be performed with a wide range of body parts. Deixis can be abstract or concrete. Concrete deixis points to an existing reference (e.g., an object or person) in space, whereas abstract deixis creates a reference point in space for an idea or

4

M. Neff

concept. Beats are small back-and-forth or up-and-down movements of the hand, performed in rhythm to the speech. They serve to emphasize important sections of the speech. In later work, McNeill (2005) argued that it is inappropriate to think of gesture in terms of categories, but the categories should instead be considered dimensions. This reflects the fact that any individual gesture may contain several of these properties (e.g., deixis and iconicity). He suggests additional dimensions of temporal highlighting (the function of beats) and social interactivity, which helps to manage turn taking and the flow of conversation.

State of the Art Generation of conversational characters has achieved substantial progress, but the bar for success is extremely high. People are keen observers of human motion and will make judgments based on subtle details. By way of analogy, people will make judgments between good and bad actors, and actors being good in a particular role, but not another – and actors are human, with all the capacity for naturalness and expressivity that comes with that. The bar for conversational characters is that of a good actor, effectively performing a particular role. The field remains a long way from being able to do this automatically, for a range of different characters and over prolonged interactions with multiple subjects.

Gesture Generation Tasks Gesture Specification When generating virtual conversational characters, one of the primary challenges is determining what gestures a character should perform. Different approaches have trade-offs in terms of the type of input information they require, the amount of processing time needed to determine a gesture, and the quality of the gesture selection, both on grounds of accurately reflecting a particular character personality and being appropriate for the co-expressed utterance. One approach is to generate gestures based on prosody variations in the spoken audio signal. Prosody includes changes in volume and pitch. Such approaches have been applied for head nods and movement (Morency et al. 2008), as well as gesture generation (Levine et al. 2009, 2010). A main advantage of the approach is that good-quality audio can be highly expressive, and using it as an input for gesture specification allows the gestures to match the expressive style of the audio. Points of emphasis in the audio appear to be good landmarks for placing gesture, and their use will provide uniform emphasis across the channels. Prosody-based approaches have been used to generate gesture in real time as a user speaks (Levine et al. 2009, 2010). The drawback of only using prosody is that it does not capture semantics, so the gestures will likely not match the meaning of the audio and certainly not supplement

Hand Gesture Synthesis for Conversational Characters

5

the underlying meaning that is being conveyed in the utterance with information not present in the audio. This concern can be at least partially addressed by also parsing the spoken text (Marsella et al. 2013). It is believed that in human communication, the brain is co-planning the gesture and the utterance (McNeill 2005), so approaches that do not use future information about the planned utterance may be unlikely to match the sophistication of human gesture-speech coordination. Another approach generates gesture based on the text of the dialogue that is to be spoken. A chief benefit of these techniques is that text captures much of the information being conveyed, so these techniques can generate gestures that aid the semantics of the utterance. Text can also be analyzed for emotional content and rhetorical style, providing a rich basis for gesture generation. Rule-based approaches (Cassell et al. 2001; Lee and Marsella 2006; Lhommet and Marsella 2013; Marsella et al. 2013) can determine both the gesture locations and the type of gestures to be performed. Advantages of these techniques are that they can handle any text covered by their knowledge bases and are extensible in flexible and straightforward ways. Disadvantages include that some amount of manual work is normally required to create the rules and it is difficult to know how to author the rules to create a particular character, so behavior tends to be generic. Other work uses statistical approaches to predict the gestures that a particular person would employ (Bergmann et al. 2010; Kipp 2005; Neff et al. 2008). These techniques support the creation of individualized characters, which are essential for many applications, such as anything involving storytelling. Individualized behavior may also outperform averaged behavior (Bergmann et al. 2010), as would be contained in generic rules. These approaches, however, are largely limited to reproducing characters like the subjects modeled and creating arbitrary characters remains an open challenge. Recent work has begun applying deep learning to the mapping from text and prosody to gesture (Chiu et al. 2015). This is a potentially powerful approach, but requires a large quantity of data, and ways to produce specific characters must be developed. While the divide between prosody-driven and rule-based approaches is useful for understanding techniques, current approaches are increasingly relying on a combination of text and prosody information (e.g., (Lhommet and Marsella 2013; Marsella et al. 2013)). Techniques based on generating gesture from text are limited to ideas expressed in the text. The information we convey through gesture is sometimes redundant with speech, although expressed in a different form, but often expresses information that is different to that in speech (McNeill 2005). For example, I might say “I saw a [monster.],” with the square brackets indicating the location of a gesture that holds my hand above my head, with my fingers bent 90 at the first knuckle and then held straight. The gesture indicates the height of the monster, information completely lacking from the verbal utterance. Evidence suggests that gestures are most effective when they are nonredundant (Goldin-Meadow 2006; Hostetter 2011; Singer and Goldin-Meadow 2005). This implies the need to base gesture generation on a deeper notion of a “communicative intent,” which may not solely be contained in the text and describes the fully message to be delivered. The SAIBA (situation, agent, intention, behavior, animation) framework represents a step toward establishing a computational architecture to tackle the

6

M. Neff

fundamental multimodal communication problem of moving from a communicative intent to output across the various agent channels of gesture, text, prosody, facial expressions, and posture (SAIBA. Working group website 2012). The approach defines stages in production and markup languages to connect them. The first stage is planning the communicative intent. This is communicated using the Function Markup Language (Heylen et al. 2008) to the behavior planner, which decides how to achieve the desired functions using the agent modalities available. The final behavior is then sent to a behavior realizer for generation using the Behavior Markup Language (Kopp et al. 2006; Vilhjalmsson et al. 2007). Such approaches echo, at least at the broad conceptual level, theories of communication like McNeill’s growth point hypothesis that argue gesture and language emerge in a shared process from a communicative intent (McNeill 2005). Recent work has sought to develop cognitive (Kopp et al. 2013) and combined cognitive and linguistic models (Bergmann et al. 2013) to explore the distribution of communicative content across output modalities.

Gesture Animation Generation of high-quality gesture animation must satisfy a rich set of requirements: • Match the gesture timing to that of the speech. • Connect individual gestures into fluent gesture units. • Adjust the gesture to the character’s context (e.g., to point to a person or object in the scene). • Generate appropriate gesture forms for the utterance (e.g., show the shape of an object, mime an action being performed, point). • Vary the gesture based on the personality of the character. • Vary the gesture to reflect the character’s current mood and tone of the speech. While a wide set of techniques have been used for gesture animation, the need for precise agent control, especially in interactive systems, has often favored the use of kinematic procedural techniques (e.g., (Chi et al. 2000; Hartmann et al. 2006; Kopp and Wachsmuth 2004)). For example, Kopp and Wachsmuth kopp04 present a system that uses curves derived from neurophysiological research to drive the trajectory of gesturing arm motions. Procedural techniques allow full control of the motion, making it easy to adjust the gesture to the requirements of the speech, both for matching spatial and timing demands. While gesture is less constrained by physics than motions like tumbling, physical simulation has still been used for gesture animation and can add important nuance to the motion (Neff and Fiume 2002, 2005; Neff et al. 2008; Van Welbergen et al. 2010). These approaches generally include balance control and a basic approximation to the muscle, such as a proportional derivative controller. The balance control will add full-body movement to compensate for arm movements, and the

Hand Gesture Synthesis for Conversational Characters

7

controllers can add subtle oscillations and arm swings. These effects require proper tuning. Motion capture data has seen increasing use in an attempt to improve the realism of character motion. These techniques often employ versions of motion graphs (Arikan and Forsyth 2002; Kovar et al. 2002; Lee et al. 2002) which concatenate segments of motion to create a sequence, such as in Fernández-Baena et al. (2014) and Stone et al. (2004). The motion capture data can provide very high-quality motion, but control is more limited, so it can be a challenge to adapt the motion to novel speech or generated different characters. Gesture relies heavily on hand shape, and it can be a challenge to capture good-quality hand motion while simultaneously capturing body motion. Some techniques seek to synthesize acceptable hand motion using the body motion alone (Jörg et al. 2012). For a fuller discussion of the issues around hand animation, please refer to (Wheatland et al. 2015). As part of the SAIBA effort, several research groups have developed “behavior realizers,” animation engines capable of realizing commands in the Behavior Markup Language (Vilhjalmsson et al. 2007) that is supplied by a higher level in an agent architecture. These systems emphasize control and use a combination of procedural data and motion clips (e.g., (Heloir and Kipp 2009; Kallmann and Marsella 2005; Shapiro 2011; Thiebaux et al. 2008; Van Welbergen et al. 2010)). The SmartBody system, for example, uses a layering approach based on a hierarchy of controllers for different tasks (e.g., idle motion, locomotion, reach, breathing). These controllers may control different or overlapping parts of the body, which creates a coordination challenge. They can be combined or one controller may override another (Shapiro 2011). Often gesture specification systems will indicate a particular gesture form that is required, e.g., a conduit gesture in which the hand is cupped and moves forward. Systems often employ a dictionary of gesture forms that can be used in syntheses. These gestures have been encoded using motion capture clips, hand animation, or numerical spatial specifications. Some techniques (Kopp et al. 2004) have sought to generate the correct forms automatically, for example, based on a description of the image trying to be created by the gesture. Gesture animation is normally deployed in scenarios where it is desirable for the characters to portray clear personalities and show variations in emotion and mood. For these reasons, controlling expressive variation of the motion has been an important focus. A set of challenges must be solved. These include determining how to parameterize a motion to give expressive control, understanding what aspects of motion must be varied to generate a desired impact, ensuring consistency over time, determining how to expose appropriate control structures to the user or character control system, and, finally, synthesizing the motion to contain the desired properties. Chi et al. (2000) use the Effort and Shape components of Laban Movement Analysis to provide an expressive parameterization of motion. Changing any of the four effort qualities (Weight, Space, Time, and Flow) or the Shape Qualities (Rising-Sinking, Spreading-Enclosing, Advancing-Retreating) will vary the timing and path of the gesture, along with the engagement of the torso. Hartmann

8

M. Neff

et al. (Hartmann 2005) use tension, continuity, and bias splines (Kochanek and Bartels 1984) to control arm trajectories and provide expressive control through parameters for activation, spatial and temporal extent, and fluidity and repetition. Neff and Fiume (2005) develop an extensible set of movement properties that can be varied and a system that allows users to write character sketches that reflect a particular character’s movement tendencies and then layer additional edits on top. While gestures are often largely thought of as movements of the arms and hands and often represented this way in computational systems, they can indeed use the whole body. A character can nod its head, gesture with its toe, etc. More importantly, while arms are the dominant appendages for a motion, engaging the entire body can lead to more clear and effective animation. Lamb called this engagement of the whole body during gesturing Posture-Gesture Merger and argued that it led to a more fluid and attractive motion (Lamb 1965).

Additional Considerations Conversations are interactions between people and this must be reflected in the animation. Both the speaker(s) and listener(s) have roles to play. Visual attention must be managed through appropriate gaze behavior to indicate who is paying attention and how actively, along with indicating who is thinking or distracted. Attentive listeners will provide back channel cues, like head nods, to indicate that they are listening and understanding. These must be appropriately timed with the speaker’s dialogue. Holding the floor is also actively managed. Speakers may decide to yield their turn to another. Listeners may interrupt, and the speaker may yield in response or refuse to do so. Floor management relies on both vocal and gestural cues. Proxemics are also highly communicative to an audience and must be managed appropriately. This creates additional animation challenges in terms of small-scale locomotion in order to fluidly manage character placement. Gestural behavior must adapt to the context. Gestures will be adjusted based on the number of people in the conversation and their physical locations relative to one another. As characters interact, they may also begin to mirror each other’s behavior and postures. Gestures are also often used to refer to items in the environment and hence must be adapted based on the character’s location. Finally, characters will engage in conversations while also simultaneously performing other activities, such as walking, jogging, or cleaning the house. The gesture behavior must be adapted to the constraints of this other behavior, for example, gestures performed while jogging tend to be done with more bent arms and are less frequent than standing gestures (Wang et al. 2016).

Future Directions While significant progress has been made, the bar for conversational gesture animation is very high. We are a long way from being able to easily create synthetic characters that match the expressive quality, range, and realism of a skilled actor, and

Hand Gesture Synthesis for Conversational Characters

9

applications that rely on synthetic characters are impoverished by this gap. Some of the key issues to address include: Characters with large gesture repertoires: It currently takes a great deal of work to build a movement set for a character, generally involving recording, cleaning, and retargeting motion capture or hand animating movements. This places a practical limitation on the number of gestures that they can perform. Methods that allow large sets of gestures to be rapidly generated are needed. A particular challenge is being able to synthesize novel gestures on the fly to react to the character’s current context. Motion quality: While motion quality has improved, it remains well short of photo-realism, particularly for interactive characters. Hand motion remains a particular challenge, as is appropriate full-body engagement. Most systems focus on standing characters, whereas people engage in a wide range of activities while simultaneously gesturing. A significant challenge is correctly orchestrating a performance across the various movement modalities (breath, arm movements, body movements, facial expressions, etc.), especially when the motion diverges from playback of a recording or hand-animated sequence. Planning from communicative intent: Systems that can represent an arbitrary communicative intent and can distribute it across various communication modes, and do so in different ways for different speakers, remain a long-term goal. This will likely require both improved computational models and a more thorough understanding of how humans formulate communication. Customization for characters and mood: While people tend to have their own, unique gesturing style, it is a challenge to imbue synthetic characters with this expressive range without an enormous amount of manual labor. It is also a challenge to accurate reflect a character’s current mood; anger, sadness, irritation, excitement, etc. Authoring controls: If a user wishes to create a particular character with a given role, personality, etc., there must be tools to allow this to be authored. Substantial work is required to allow authors to go from an imagined character to an effective realization.

References Arikan O, Forsyth DA (2002) Interactive motion generation from examples. ACM Trans Graph 21 (3):483–490 Bergmann K, Kopp S, Eyssel F (2010) Individualized gesturing outperforms average gesturing–evaluating gesture production in virtual humans. In: International conference on intelligent virtual agents. Springer, Berlin/Heidelberg, pp 104–117 Bergmann K, Kahl S, Kopp.S (2013) Modeling the semantic coordination of speech and gesture under cognitive and linguistic constraints. In: Intelligent virtual agents. Springer, Berlin, Heidelberg, pp 203–216 Cassell J, Vilhjálmsson H, Bickmore T (2001) BEAT: the behavior expression animation toolkit. In: Proceedings of SIGGRAPH 2001. ACM, New York, NY, pp 477–486 Chi DM, Costa M, Zhao L, Badler NI (2000) The EMOTE model for effort and shape. In: Proceedings of SIGGRAPH 2000. ACM, New York, NY, pp 173–182

10

M. Neff

Chiu C-C,Morency L-P, Marsella S (2015) Predicting co-verbal gestures: a deep and temporal modeling approach. In: International conference on intelligent virtual agents. Springer, Cham, pp 152–166. Fernández-Baena A, Montaño R, Antonijoan M, Roversi A, Miralles D, Alas F (2014) Gesture synthesis adapted to speech emphasis. Speech Comm 57:331–350 Goldin-Meadow S (2005) Hearing gesture: how our hands help us think. Harvard University Press, Massachusetts Goldin-Meadow S (2006) Talking and thinking with our hands. Curr Dir Psychol Sci 15(1):34–39 Hartmann B, Mancini M, Pelachaud C (2006) Implementing expressive gesture synthesis for embodied conversational agents. In Proc. Gesture Workshop 2005, vol 3881 of LNAI. Springer, Berlin\Heidelberg, pp 45–55 Heloir A, Kipp M (2009) EMBR–A Realtime Animation Engine for Interactive Embodied Agents. In: Intelligent virtual agents 09. Springer, Berlin, Heidelberg, pp 393–404 Heylen D, Kopp S, Marsella SC, Pelachaud C, Vilhjálmsson H (2008) The next step towards a function markup language. In: International workshop on intelligent virtual agents. Springer, Berlin, Heidelberg, pp 270–280 Hostetter AB (2011) When do gestures communicate? A meta-analysis. Psychol Bull 137(2):297 Jörg S, Hodgins J, Safonova A (2012) Data-driven finger motion synthesis for gesturing characters. ACM Trans Graph 31(6):189 Kallmann M, Marsella S (2005) Hierarchical motion controllers for real-time autonomous virtual humans. In: Proceedings of the 5th International working conference on intelligent virtual agents (IVA’05), pp 243–265, Kos, Greece, 12–14 September 2005 Kendon A (1972) Some relationships between body motion and speech. Stud dyadic commun 7 (177):90 Kendon A (1988) How gestures can become like words. Cross-cult perspect nonverbal commun 1:131–141 Kendon A (1994) Do gestures communicate? A review. Res lang soc interact 27(3):175–200 Kipp M (2005) Gesture generation by imitation: from human behavior to computer character animation. Universal-Publishers, Boca Raton, Fl, USA Kipp M, Neff M, Kipp K, Albrecht I (2007) Towards natural gesture synthesis: evaluating gesture units in a data-driven approach to gesture synthesis. In Proceedings of intelligent virtual agents (IVA07), vol 4722 of LNAI, Association for Computational Linguistics, Berlin, Heidelberg, pp 15–28 Kita S (1990) The temporal relationship between gesture and speech: a study of Japanese-English bilinguals. MS Dep Psychol Univ Chic 90:91–94 Kita S, Van Gijn I, Van Der Hulst H (1998) Movement phase in signs and co-speech gestures, and their transcriptions by human coders. In: Proceedings of the International Gesture Workshop on Gesture and Sign Language in Human-Computer Interaction. Springer-Verlag, Berlin, Heidelberg, pp 23–35 Kochanek DHU, Bartels RH (1984) Interpolating splines with local tension, continuity, and bias control. Comput Graph 18(3):33–41 Kopp S, Wachsmuth I (2004) Synthesizing multimodal utterances for conversational agents. Comput Anim Virtual Worlds 15:39–52 Kopp S, Tepper P, Cassell J (2004) Towards integrated microplanning of language and iconic gesture for multimodal output. In: Proceedings of the 6th international conference on multimodal interfaces. ACM, New York, NY, pp 97–104 Kopp S, Krenn B, Marsella S, Marshall AN, Pelachaud C, Pirker H, Thórisson KR, Vilhjálmsson H (2006) Towards a common framework for multimodal generation: the behavior markup language. In: International workshop on intelligent virtual agents. Springer, Berlin, Heidelberg, pp 205–217 Kopp S, Bergmann K, Kahl S (2013) A spreading-activation model of the semantic coordination of speech and gesture. In: Proceedings of the 35th annual conference of the cognitive science society (CogSci 2013). Cognitive Science Society, Austin (in press, 2013)

Hand Gesture Synthesis for Conversational Characters

11

Kovar L, Gleicher M, Pighin F (2002) Motion graphs. ACM Trans Graph 21(3):473–482 Lamb W (1965) Posture and gesture: an introduction to the study of physical behavior. Duckworth, London Lee J, Marsella S (2006) Nonverbal behavior generator for embodied conversational agents. In: Intelligent virtual agents. Springer, Berlin, Heidelberg, pp 243–255 Lee J, Chai J, Reitsma PSA, Hodgins JK, Pollard NS (2002) Interactive control of avatars animated with human motion data. ACM Trans Graph 21(3):491–500 Levine S, Theobalt C, Koltun V (2009) Real-time prosody-driven synthesis of body language. ACM Trans Graph 28(5):1–10 Levine S, Krahenbuhl P, Thrun S, Koltun V (2010) Gesture controllers. ACM Trans Graph 29 (4):1–11 Lhommet M, Marsella SC (2013) Gesture with meaning. In: Intelligent Virtual Agents. Springer, Berlin, Heidelberg, pp 303–312 Marsella S, Xu Y, Lhommet M, Feng A, Scherer S, Shapiro A (2013) Virtual character performance from speech. In: Proceedings of the 12th ACM SIGGRAPH/Eurographics Symposium on Computer Animation, ACM, New York, NY, pp 25–35 McNeill D (1992) Hand and mind: what gestures reveal about thought. University of Chicago Press, Chicago McNeill D (2005) Gesture and thought. University of Chicago Press, Chicago McNeill D, Levy E (1982) Conceptual representations in language activity and gesture. In: Jarvella RJ, Klein W (eds) Speech, place, and action. Wiley, Chichester, pp 271–295 Morency L-P, de Kok I, Gratch J (2008) Predicting listener backchannels: a probabilistic multimodal approach. In: International workshop on intelligent virtual agents. Springer, Berlin/ Heidelberg, pp 176–190 Neff M, Fiume E (2002) Modeling tension and relaxation for computer animation. In Proc. ACM SIGGRAPH Symposium on Computer Animation 2002, ACM, New York, NY, pp 81–88 Neff M, Fiume E (2005) AER: aesthetic exploration and refinement for expressive character animation. In: Proceeding of ACM SIGGRAPH / Eurographics Symposium on Computer Animation 2005, ACM, New York, NY, pp 161–170 Neff M, Kipp M, Albrecht I, Seidel H-P (2008) Gesture modeling and animation based on a probabilistic re-creation of speaker style. ACM Trans Graph 27(1):5:1–5:24 Nobe S (2000) Where do most spontaneous representational gestures actually occur with respect to speech. Lang gesture 2:186 SAIBA. Working group website, 2012. http://wiki.mindmakers.org/projects:saiba:main Shapiro A (2011) Building a character animation system. In: International conference on motion in games, Springer, Berlin\Heidelberg, pp 98–109 Singer MA, Goldin-Meadow S (2005) Children learn when their teacher’s gestures and speech differ. Psychol Sci 16(2):85–89 Stone M, DeCarlo D, Oh I, Rodriguez C, Stere A, Lees A, Bregler C (2004) Speaking with hands: creating animated conversational characters from recordings of human performance. ACM Trans Graph 23(3):506–513 Thiebaux M, Marshall A, Marsella S, Kallman M (2008) Smartbody: behavior realization for embodied conversational agents. In: Proceedings of 7th International Conference on autonomous agents and multiagent systems (AAMAS 2008), International Foundation for Autonomous Agents and Multiagent Systems Richland, SC, pp 151–158 Van Welbergen H, Reidsma D, Ruttkay Z, Zwiers J (2010) Elckerlyc-A BML realizer for continuous, multimodal interaction with a virtual human. Journal on Multimodal User Interfaces 4 (2):97–118 Vilhjalmsson H, Cantelmo N, Cassell J, Chafai NE, Kipp M, Kopp S, Mancini M, Marsella S, Marshall A, Pelachaud C et al (2007) The behavior markup language: recent developments and challenges. In: Intelligent virtual agents. Springer, Berlin/New York, pp 99–111 Wang Y, Neff M (2013) The influence of prosody on the requirements for gesture-text alignment. In: Intelligent virtual agents. Springer, Berlin/New York, pp 180–188

12

M. Neff

Wang Y, Ruhland K, Neff M, O’Sullivan C (2016) Walk the talk: coordinating gesture with locomotion for conversational characters. Comput Anim Virtual Worlds 27(3–4):369–377 Wheatland N, Wang Y, Song H, Neff M, Zordan V, Jörg S (2015) State of the art in hand and finger modeling and animation. Comput Graphics Forum. The Eurographs Association and John Wiley & Sons, Ltd., Chichester, 34(2):735–760

Depth Sensor-Based Facial and Body Animation Control Yijun Shen, Jingtian Zhang, Longzhi Yang, and Hubert P. H. Shum

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . State of the Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Extracting Facial and Body Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Facial Feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Body Posture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Human Environment Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dealing with Noisy Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Face Enhancement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Posture Enhancement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Prior Knowledge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Depth Camera-Based Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2 2 4 4 5 7 7 8 9 10 11 13 14

Abstract

Depth sensors have become one of the most popular means of generating human facial and posture information in the past decade. By coupling a depth camera and computer vision based recognition algorithms, these sensors can detect human facial and body features in real time. Such a breakthrough has fused many new research directions in animation creation and control, which also has opened up new challenges. In this chapter, we explain how depth sensors obtain human facial and body information. We then discuss on the main challenge on depth sensor-based systems, which is the inaccuracy of the obtained data, and explain how the problem is tackled. Finally, we point out the emerging applications in the

Y. Shen (*) • J. Zhang (*) • L. Yang (*) • H.P.H. Shum (*) Northumbria University, Newcastle upon Tyne, UK e-mail: [email protected]; [email protected]; [email protected]; [email protected] # Springer International Publishing Switzerland 2016 B. Müller, S.I. Wolf (eds.), Handbook of Human Motion, DOI 10.1007/978-3-319-30808-1_7-1

1

2

Y. Shen et al.

field, in which human facial and body feature modeling and understanding is a key research problem. Keywords

Depth sensors • Kinect • Facial features • Body postures • Reconstruction • Machine learning • Computer animation

Introduction In the past decade, depth sensors have become a very popular mean of generating character animation. In particular, since these sensors can obtain human facial and body information in real time, it is used heavily in real-time graphics and games. While it is expensive to use human motion to interact with computer applications using traditional motion capture system, depth sensors provide an affordable alternative. Due to the low cost and high robustness of depth sensors, it can be applied in a wide application domain with easy setup. Apart from popular applications such as motion-based gaming, depth sensors are also applied in emerging applications such as virtual reality, sport training, serious games, smart environments, etc. In order to work with depth sensors, it is important to understand their working principle, as well as their strength and weakness. In this chapter, we provide comprehensive information on how depth sensors track human facial and body features using computer vision and pattern recognition based techniques, and identify their strength in computational cost and robustness. Then, we focus on the major weakness of depth sensors, that is, the low accuracy that happens during occlusion, and explain possible solutions to improve recognition quality in detail. In particular, we discuss in depth on machine learning-based reconstruction method that utilize prior knowledge to correct corrupted data obtained by the sensors. Finally, we give some examples on depth sensors-based application, especially in the field of animation creation, to show how these sensors can improve existing methods in human-computer interaction. In the rest of this chapter, we review the state of the art in section “State of the Art.” We explain in more details about how depth sensors obtain and process human facial and body movement information in section “Extracting Facial and Body Information.” We then discuss on the main challenge on depth sensor-based systems, which is about the relatively low accuracy of the data obtained, and explain how this challenge can be tackled in section “Dealing with Noisy Data.” We finally point out various emerging applications developed with depth sensors in section “Depth Camera-Based Applications” and conclude this chapter in section “Conclusion.”

State of the Art Typical depth sensors utilize a depth camera to obtain a depth image. The main advantage of the depth camera over traditional color cameras is that instead of obtaining color information, it estimates the distance of the objects seen by the

Depth Sensor-Based Facial and Body Animation Control

3

camera using an infrared sensor. The images taken from a depth camera are called depth images. In these images, the pixels represent distance instead of color. The nature of depth images provides a huge advantage in automatic recognition using computer vision and machine learning algorithm. With traditional color images, recognizing objects requires segmenting them based on color information. This is challenging under situations in which the background has a similar color as the foreground objects (Fernandez-Sanchez et al. 2013). Moreover, color values are easily affected by lighting conditions, which reduces the robustness of object recognition (Kakumanu et al. 2007). On the contrary, with depth images, since the pixel value represents distance, automatic object segmentation becomes independent of the color of the object. As long as the object is geometrically separated from the background, accurate segmentation can be performed. Followed by such an improved segmentation process is an improved object recognition system, which identifies the nature of the objects using accurate geometric features. Such an advancement in accuracy and robustness allows depth sensors to become a popular commercial product that leads to many new applications. The Microsoft Kinect (https://developer.microsoft.com/en-us/windows/kinect), which utilizes both color and depth cameras, is one of the most popular depth sensors. Due to the uses of both color and depth cameras, Kinect can create a 3D point cloud based on the obtained images. Figure 1 shows the images obtained by the two cameras, as well as two views of the corresponding point cloud. Kinect gaming usually involves players controlling the gameplay with body movement. Virtual characters in the game are then synthesized on the fly based on the movement information obtained. Such a kind of application involves different domains of research. First, computer vision and machine learning techniques are applied to analyze the depth images obtained by the depth sensor. This typically involves recognizing different human features, such as the human body parts (Shotton et al. 2012). Then, human computer interaction researches are applied to translate to control signals from the body movement into gameplay controls. Computer graphics and animation algorithms are used to create real-time rendering, which usually includes character animation synthesized from the movement of the player. In some situations, virtual reality (Kyan et al. 2015) or augmented reality (Vera et al. 2011) research is adapted to enhance the immersiveness of the game. However, depth sensors are not without their weaknesses. Comparing to traditional capturing devices such as accelerometers, the accuracy of depth sensors is

Fig. 1 (From left to right) The color and depth images obtained by a Microsoft Kinect, as well as two views of 3D point cloud rendered by combing the color and depth information

4

Y. Shen et al.

considerably lower. This is mainly because these sensors usually consist of a single depth camera. When occlusions occur, the sensors cannot obtain information from the shielded area. This results in a significant drop in recognition accuracy. While it is possible to utilize multiple depth cameras to obtain better results, one has to deal with the cross-talk, interference of infrared signals, among multiple cameras (Alex Butler et al. 2012). It also deficits the advantage of using depth sensors in terms of easy setup and efficient capture. Therefore, it is preferable to enhance the sensor accuracy using software algorithms, instead of introducing more hardware. To enhance the quality of the obtained data, machine learning approaches using prior knowledge of the face and body features have shown great success (Shum et al. 2013). The main idea is to apply prior knowledge onto the tracked data and correct the less reliable parts or introduce more details onto the data. Such knowledge can either be defined manually or learned from examples. The key here is to represent the prior knowledge in a way that is efficient and effective to be used during run-time.

Extracting Facial and Body Information There is a large body of research in obtaining facial and body information from depth cameras. In this section, we explain some of the main methods and discuss on their performances.

Facial Feature Facial feature detection usually involves face segmentation and landmark detection. The former segments the face from the background, while the latter detect key regions and feature points. To segment the face area from the background and the rest of the human body, one can detect the skin color and perform segmentation (Bronstein et al. 2005). However, such a method is easily affected by illumination. Using the histogram of depth information from the depth image can improve the system robustness (Segundo et al. 2010). Since human faces have the same topology, it is possible to apply geometric rules to identify landmarks on the face. A simple example is to approximate the face with an ellipse and divide the ellipse into different slices based on predefined angles (Segundo et al. 2010). For each slices, corresponding features can be searched based on the 3D height of the face. For example, the eyes are the lowest point on the corresponding slice, while the nose is the highest. Similarly, it is possible to use local curvature to represent different features on the face, so as to determine different facial regions (Chang et al. 2006). For example, the eye regions are usually a valley and can be represented by a specific value of mean curvature and Gaussian curvature. The disadvantage for these methods is that manually defined geometric rules may not be robust for different users, especially for users coming from different countries. A better solution is to apply a data-driven approach. For example, one can construct a database with

Depth Sensor-Based Facial and Body Animation Control

5

Fig. 2 3D facial landmark identified by Kinect overlapped on 2D color images

segmented facial regions and train a random forest that can automatically identify facial regions on a face (Kazemi et al. 2014). Another direction of representing facial features is to use predefined facial template (Li et al. 2013; Weise et al. 2011). Such a template is a high-quality 3D mesh with controllable parameters. During run-time, the system deforms the 3D template to align with the geometry structure of the segmented face from the depth image. Such a deformation process is usually done by numerical optimization due to the high degree of freedom. Upon successful alignment, the systems can understand the observed face in the depth image with the deformed template. It can also represent the face with a set of deformation parameters so as to control animation in real-time. Microsoft Kinect also provides support for 3D landmark detection, as shown in Fig. 2. Different expressions can be identified based on the arrangements of 3D landmarks. Such understanding of facial orientation and expression is useful for realtime animation control.

Body Posture The mainstream of depth sensors-based body recognition system is to apply pattern recognition and machine learning techniques to identify the human subject. By training a classifier that can identify how individual body part appears in the depth image, one can recognize these parts using real-time depth camera input (Girshick et al. 2011; Shotton et al. 2012; Sun et al. 2012). There are several major challenges in this algorithm. Chief among them is the availability of training data. In order to train a classifier, a large number of depth images with annotation indicating the body parts are needed. Since body parts appear differently based on the viewing angle, the training database should capture such parts in different viewpoints. Moreover, since users of different body sizes appear differently in depth images, to train a robust classifier that can handle all users, training images consisting of body variation are

6

Y. Shen et al.

Fig. 3 (Left) The 3D skeleton obtained by Microsoft Kinect with the corresponding depth and color images. (Middle and Right) Two views of the 3D point cloud together with the obtained 3D skeleton

needed. As a result, hundreds of thousands of annotated depth images will be required, which exceeds what human labors can generate. To solve the problem, it is proposed to synthesize depth images using different humanoid models and 3D motion capture data. Since the body part information of these humanoid models is known in advance, it becomes possible to automatically annotate the position of the body parts in the synthesized depth image. With these training images, one can train a decision forest to classify depth pixels into the corresponding body parts. Different designs of decision forest have resulted in different level of success, and they are all capable of identifying body parts in real-time. Microsoft Kinect also applied a pattern recognition approach to recognize body parts with the depth images (Shotton et al. 2012). Figure 3 shows the results of Kinect posture recognition, which is shown as the yellow skeleton. By overlapping the skeleton with the 3D point cloud, it can be observed that the Kinect performs reasonably accurate under normal circumstances. Another stream of method in body identification and modeling is to take advantage of the geometry of human body and utilize body template model (Liu et al. 2016a; Zhang et al. 2014a). First, the pixels in the depth image that belongs to the human body are extracted. Since the pixel value represents distance, one can project them into a 3D space and create a point cloud of human body. Then, the system fits a 3D humanoid mesh model into such a point cloud, so as to estimate the body posture. This process involves deforming the 3D mesh model such that the surface of the model aligns with the point cloud. Since the template model contains human information such as body parts, when deforming the model to fit the point cloud, we identify the corresponding body information in the point cloud. The main challenge in this method is to deform the mesh properly to avoid unrealistic postures and over-deformed surfaces, which is still a challenging research problem. Physicsbased motion optimization can ensure the physical correctness of the generated postures (Zhang et al. 2014a). Utilizing simplified, intermediate template for deformation optimization can enhance the optimization performance (Liu et al. 2016a). This method can potentially provide richer body information depending on the template used. However, a major drawback of such an optimization-based approach is the higher run-time computational cost, making it inefficient to be applied in realtime systems.

Depth Sensor-Based Facial and Body Animation Control

7

Human Environment Interaction The depth images captured do not only contain information about the user but also the surrounding environment. Therefore, it is possible to identify high-level information about how the user interacts with the environment. Unlike the human body, the environment does not have a uniform structure, and therefore it is not possible to fit a predefined template or apply prior knowledge. Geometry information based on planes and shapes become the next available information to extract. The RANdom SAmple Consensus (RANSAC) algorithm can be used to identify planer objects in the scenes such as walls and floors, which can help to understand how the human user moves around in the open areas (Mackay et al. 2012). It is also possible to compare successive depth images to identify the moving parts, in order to understand how the user interacts with external objects (Shum 2013). Depth cameras can be used for 3D scanning in order to obtain surface information of the environment or even the human user. While one depth image only provide information about a partial surface, which we called a 2.5D point cloud, multiple depth images taken from different viewing angle can combine and form a full 3D surface. One of the most representative systems in this area is called KinectFusion (Newcombe et al. 2011). Such a system requires the user to carry a Kinect and capture depth images continuously over a static environment. Real-time registration is performed to understand the 3D translation and rotation movement of the depth camera. This allows alignment of multiple depth images to form a complete 3D surface. Apart from scanning the environment, it is possible to scan the face and body of a human user (Cui et al. 2013) and apply real-time posture deformation on the Kinect tracked skeleton (Iwamoto et al. 2015). Finally, because single view depth cameras suffer from the occlusion problem, it is proposed to capture how human users interact with objects by combining KinectFusion, color cameras, and accelerometer-based motion capture system (Sandilands et al. 2012, 2013). Since depth sensors can obtain both environment and human information, it facilitates the argument that human information can enhance understanding of unstructured environment (Jiang and Saxena 2013; Jiang et al. 2013). Using a chair as an example. A chair can come with different shapes and designs, which makes recognition extremely difficult. However, the general purpose of a chair is for human to rest on. Therefore, with the human movement information obtained by depth cameras, we can identify a chair not just by its shapes but also by the way the human interacts with it. Similarly, human movement may be ambiguous sometimes. Understanding the environment helps us to identify the correct meaning of the human motion. Depth sensors open up new directions on recognition by considering human and environment information together.

Dealing with Noisy Data The main problem of using depth sensors is to deal with the noisy data obtained. In particular, most depth sensor-based applications rely on a single point of view to obtain the depth image. As a result, the quality of the detected face and posture are of

8

Y. Shen et al.

Fig. 4 An overview of depth sensors data enhancement

low resolution and suffer heavily from occlusion. It is possible to apply machine learning algorithms to enhance the quality of the data. The idea is to introduce a quality enhancement process that considers prior knowledge of the human body, which is typically a database of high-quality faces or postures, as shown in Fig. 4. In this section, we discuss how body and facial information can be reconstructed from noisy data.

Face Enhancement While depth sensors can obtain facial features, due to the relatively low resolution, the quality of the features is not always satisfying. The 3D face obtained is usually missing details and unrealistic. In this section, we explain how we can enhance the quality of 3D faces obtained from depth sensors. Since the quality of a single depth image is usually noisy with low resolution, the 3D facial surface generated is rough. By obtaining high-quality 3D faces through 3D scanners and their corresponding color texture, one can construct a face database and extract the corresponding prior knowledge (Liang et al. 2014; Wang et al. 2014). These faces in the database are divided into patches such as eyes, nose, etc. Since color texture is available, one can take advantage of color features to enhance the segmentation accuracy. Given the low quality depth and color images of a face obtained from sensors, facial regions are obtained in run-time. For each region obtained, a set of similar patches is found in the database. Such a region is then approximated by a weight sum of the database patches. By replacing different parts of the run-time face image with their corresponding approximation, a high-quality 3D face surface can be generated. This method depends heavily on the quality and variety of face in the database, as well as the way we abstract those faces to represent the one observed by depth sensors in run-time. Constructing a database for prior knowledge is costly. It is therefore proposed to scan the face of the user in different angles, and apply such a face to enhance the run-time detected face (Zollhöfer et al. 2014). The system first requests the user to rotate around a depth sensor and obtain a higher quality 3D mesh, using registration methods similar to the KinectFusion mentioned in the last section (Newcombe et al. 2011). Then, given a run-time lower quality depth image of the face, the system deforms the high quality 3D mesh such that it aligns with the depth image

Depth Sensor-Based Facial and Body Animation Control

9

pixels. As a result, high quality mesh with run-time facial expression can be generated. The core problem here is to deform the high quality facial mesh nicely and avoid generating visual artifact. It is shown that by dividing the face into multiple facial regions to strengthen the feature correspondence, deformation quality can be improved (Kazemi et al. 2014).

Posture Enhancement The body tracked by depth sensors may contain inaccurate body parts due to different types of error. Simple sensor error can be caused by geometry shape of body parts and viewing angles. It is proposed to apply Butterworth filter (Bailey and Bodenheimer 2012) or a simple low-pass filter (Fern’ndez-Baena et al. 2012) to smooth out the vibration effect of tracked positions due to this type of error. However, when occlusions occur, in which a particular body part is shield from the camera, the tracked body position would contain a large amount of error. Simple filter will not be sufficient to correct these postures. As a solution, it is proposed to utilize accurately captured 3D human motion as prior knowledge and reconstruct the inaccurate postures from the depth sensor. In this method, a motion database is constructed using carefully captured 3D motion, usually with optical motion capture systems. Given a depth sensor posture, one can search for a similar posture in the database. The missing or error body parts from the depth sensors can be replaced by those in the corresponding database posture (Shum and Ho 2012). However, such a naive method cannot perform well for complex posture, as using only one posture from the database cannot always generalize the posture performed by the user, and therefore cannot effectively reconstruct the posture. More advanced posture reconstruction algorithms utilize machine learning to generalize posture information from the motion database (Chai and Hodgins 2005; Liu et al. 2011; Tautges et al. 2011). In particular, the motion database is used to create a low dimensional latent space by dimensionality reduction techniques. Since the low dimensional space is generated using data from real human, each point in the space represents a valid natural posture. Given a partially mistracked posture from a depth camera, one can project the posture into the learned low dimensional space and apply numerical optimization to enhance the quality of the posture. The optimized result is finally back-projected into a full body posture. Since the optimization is performed in the low dimensional latent space, the solution found should also be a natural posture. In other words, the unnatural elements due to sensor error can be removed. The major problem of this method is that the system has no information about which part of the body posture is incorrect. Therefore, while one would expect the system to correct the error parts of the posture using information from the accurate parts, the actual system may perform vice versa. As a result, the optimized posture may no longer be similar to the original depth sensor input. To solve the problem, optimization process that considers the reliability of individual body part is proposed (Shum et al. 2013). The major difference from

10

Y. Shen et al.

Fig. 5 Applying posture reconstruction to enhance the quality of the obtained data

this method comparing with prior ones is that it divide the posture reconstruction process into two steps. In the first step, a procedural algorithm is used to evaluate the degree of reliability of individual body parts. This is by accessing the behavior of a tracked body part to see if the position of the part is inconsistent, as well as accessing the part with respect to its neighbor body parts to see if it creates inconsistent bone length. In the second step, posture reconstruction is performed with reference to this reliability information, such that the system relies on the more parts with higher reliability. Essentially, the reliability information helps the system to explicitly use the correct body parts and reconstruct the incorrect ones. Such a system can be further improved by using Gaussian process to model the motion database, which helps to reduce the amount of motion data needed to reconstruct the posture (Liu et al. 2016b; Zhou et al. 2014). Better rules to estimate the reliability of the body parts can also enhance the system performance (Ho et al. 2016). Figure 5 shows the result of applying posture reconstruction. The color and depth images show that the user is occluded by a chair and the surrounding environment. The yellow skeleton on the left is the raw posture obtained by Kinect, in which less reliable body parts are highlighted in red. The right character shows the reconstructed posture using the method proposed in (Shum et al. 2013). The awkward body parts are identified and corrected using the knowledge learned from a motion database.

Prior Knowledge The major research focus of face and posture enhancement is to apply appropriate prior knowledge to improve data obtained in rum-time. In machine learning-based

Depth Sensor-Based Facial and Body Animation Control

11

algorithms, such prior knowledge is usually learned from a database and represented in a format that can be efficiently used in run-time. For motion enhancement, since human-motion is highly nonlinear with large variation, it is not effective to represent the database using a single model. Instead, many of the existing research apply multiple local models to represent the database, such as using a mixture of Gaussian model (Liu et al. 2016b). It is also proposed to apply deep learning to learn a set of manifolds that represents a motion database (Holden et al. 2015). Precomputing these models and manifolds are time-consuming, as it involves abstracting the whole database. Therefore, lazy learning algorithm is adapted, in which modeling of the database is not done as a preprocesss but as a run-time process using run-time information (Chai and Hodgins 2005; Shum et al. 2013). During run-time, based on the user-performed posture, the system retrieves a number of relevant postures from the database and models such a subset of postures only. This method has two advantages. First, by modeling only a small number of postures that are relevant to the performed posture, one can reduce the computational cost of constructing a latent space. Second, since the subset of postures are relatively similar, one can assume that they all lay in a locally linear space and apply simpler linear dimensionality reduction to generate the latent space. This allows real-time generation of the latent space. With improved database organization, the database search time can be further reduced and the relevancy of the retrieved results can be enhanced (Plantard et al. 2016a, b), such that real-time ergonomic and motion analysis applications can be preformed (Plantard et al. 2016b). Figure 6 visualizes how prior knowledge can be estimated from database. Each blue circle in the figure represents a database entry, and the filling color represents its value. The obtained prior knowledge from the scattered database entries is represented by the shaded area, which enables one to understand the change of value within the considered space. The left figure shows a traditional machine learning algorithm, in which prior knowledge is obtained as a preprocess, considering all database entries. During run-time, when a query arrives, the system uses the knowledge to estimate the corresponding value of the query. The right figure shows the case of lazy learning, in which prior knowledge is obtained during run-time. This allows the system to extract database entries that are more similar to the query and estimate the prior knowledge with only such a subset of data.

Depth Camera-Based Applications With depth sensors, it becomes possible to consider the user posture as part of an animation system and create real-time animation. Here, we discuss on some depth sensors-based animation control systems and point out the challenges and solutions. Producing real-time facial animation with depth sensors is efficient. By representing the facial features with a deformed template, it is possible to drive the facial expression of virtual 3D faces (Li et al. 2013; Weise et al. 2011). Due to the different dimensions between the faces of the user and the character, directly

12

Y. Shen et al.

Fig. 6 (Upper) Traditional machine learning that represents the prior from the whole database. (Lower) Lazy learning that represents the prior from a subset of the database based on the online query

applying facial features such as landmark locations generates suboptimal results. The proposed common template acts as a bridge to connect the two ends. Such a template is a parametric representation of the face, which is more robust against difference in dimensions. With the template, it becomes possible to retarget the user’s expression onto the character’s face. Typical real-time animation systems such as games utilize a motion database to understand what the user performs and renders the scenario accordingly. For example, one can compare the user performed motion obtained from depth sensors with a set of motion in the database and understand the nature of the motion as well as how it should affect the real-time render (Bleiweiss et al. 2010). Alternatively, with an interaction database, one can generate a virtual character that acts according to the posture of the user, in order to create a two character dancing animation, which is difficult to be captured due to hardware limitation (Ho et al. 2013). While it is possible to utilize the posture captured by depth sensors for driving the animation of virtual characters, the generated animation may not be physically correct and dynamically plausible. On the one hand, since the depth sensors track kinematic positions only, there is no information about the forces exerted. It is proposed to combine the uses of depth cameras with pressure sensors and estimate the internal joint torque using inverse dynamics (Zhang et al. 2014b). This allows simulating virtual characters with physically correct movement. On the other hand, while depth sensors can track the body parts positions, it is relatively difficult to track

Depth Sensor-Based Facial and Body Animation Control

13

Fig. 7 Real-time dynamic deformed character generated from Kinect postures

how the body deforms dynamically during the movement. Therefore, it is proposed to enhance the realism of the generated character by applying real-time physical simulation onto Kinect postures (Iwamoto et al. 2015). This allows the system to synthesize real-time dynamic deformation, such as the jiggling of fresh, based on the movement obtained in real-time, as shown in Fig. 7. Utilizing depth cameras, user can interact with virtual objects with body motion. On the one hand, predefined hand and arm gestures can be used to control virtual objects. Once the Kinect has detected a set of specific gestures, a 3D virtual object can be fitting onto the user and move according to the user’s gesture (Soh et al. 2013). On the other hand, the virtual objects can be attached on the user’s body and move with the user’s posture, such as carrying a virtual handbag (Wang et al. 2012). Depth sensors fuse a new application known as the virtual fitting, in which the shopping experience can be facilitated by letting customers to try on virtual clothing. This allows mix-and-match of clothes and accessories in real-time without being physically present in the retail shop. The system involves building a 2D segmented clothing database indexed by the postures of the user. During run-time, the system searches for suitable database entries and overlays them on the customers fitting image (Zhou et al. 2012). Another clothes fitting method is to utilize a 3D clothing database and obtain 3D models of the user with depth sensors. This allows the system to recommend items that fits with the user’s body (Pachoulakis and Kapetanakis 2012).

Conclusion In this chapter, we explained how depth sensors are applied to gather human facial and body posture information in generating and controlling animations. Depth sensors obtain human information in real-time and provide a cheaper alternative for human motion capturing. However, there are still rooms to improve the quality of the obtained data. In particular, depth sensors suffer heavily from occlusions, in which part of the human body is shield. Machine learning algorithms can reconstruct the data and improve the quality, but more research is needed to solve the problem. Depth sensors fuse many interesting applications in the computer animation and

14

Y. Shen et al.

games domains, providing real-time control on animation creation. Given the rate of new depth sensors research and applications, such a technology can become an important part of the daily life in the near future. Acknowledgment This work is supported by the Engineering and Physical Sciences Research Council (EPSRC) (Ref: EP/M002632/1).

References Alex Butler D, Izadi S, Hilliges O, Molyneaux D, Hodges S, Kim D (2012) Shake’n’sense: reducing interference for overlapping structured light depth cameras. In: Proceedings of the SIGCHI conference on human factors in computing systems, CHI’12. ACM, New York, pp 1933–1936 Bailey SW, Bodenheimer B (2012) A comparison of motion capture data recorded from a vicon system and a Microsoft Kinect sensor. In: Proceedings of the ACM symposium on applied perception, SAP’12. ACM, New York, pp 121–121 Bleiweiss A, Eshar D, Kutliroff G, Lerner A, Oshrat Y, Yanai Y (2010) Enhanced interactive gaming by blending full-body tracking and gesture animation. In: ACM SIGGRAPH ASIA 2010 Sketches. Seoul, South Korea. ACM, p 34 Bronstein AM, Bronstein MM, Kimmel R (2005) Three-dimensional face recognition. Int J Comput Vision 64(1):5–30 Chai J, Hodgins JK (2005) Performance animation from low-dimensional control signals. In SIGGRAPH’05: ACM SIGGRAPH 2005 Papers. ACM, New York, pp 686–696 Chang KI, Bowyer KW, Flynn PJ (2006) Multiple nose region matching for 3d face recognition under varying facial expression. IEEE Trans Pattern Anal Mach Intell 28(10):1695–700 Cui Y, Chang W, Nöll T, Stricker D (2013) Kinectavatar: fully automatic body capture using a single Kinect. In: Proceedings of the 11th international conference on computer vision, vol 2, ACCV’12. Springer-Verlag, Berlin/Heidelberg, pp 133–147 Fern’ndez-Baena A, SusÃn A, Lligadas X (2012) Biomechanical validation of upper-body and lower-body joint movements of Kinect motion capture data for rehabilitation treatments. In: Intelligent Networking and Collaborative Systems (INCoS), 2012 4th International Conference on, pp 656–661 Fernandez-Sanchez EJ, Diaz J, Ros E (2013) Background subtraction based on color and depth using active sensors. Sensors 13(7):8895–915 Girshick R, Shotton J, Kohli P, Criminisi A, Fitzgibbon A (2011) Efficient regression of generalactivity human poses from depth images. In: Computer Vision (ICCV), 2011 I.E. international conference on. Barcelona, Spain. pp 415–422 Ho ESL, Chan JCP, Komura T, Leung H (2013) Interactive partner control in close interactions for real-time applications. ACM Trans Multimedia Comput Commun Appl 9(3):21:1–21:19 Ho ES, Chan JC, Chan DC, Shum HP, Cheung YM, Yuen PC (2016) Improving posture classification accuracy for depth sensor-based human activity monitoring in smart environments. Comput Vis Image Underst 148:97–110. doi:10.1111/cgf.12735 Holden D, Saito J, Komura T, Joyce T (2015) Learning motion manifolds with convolutional autoencoders. In ACM SIGGRAPH ASIA 2015 technical briefs. ACM, Kobe, Japan. 2015 SIGGRAPH ASIA Iwamoto N, Shum HPH, Yang L, Morishima S (2015) Multi-layer lattice model for real-time dynamic character animation. Comput Graph Forum 34(7):99–109 Jiang Y, Saxena A (2013) Hallucinating humans for learning robotic placement of objects. In: Proceedings of the 13th international symposium on experimental robotics. Springer International Publishing, Heidelberg, pp 921–937

Depth Sensor-Based Facial and Body Animation Control

15

Jiang Y, Koppula H, Saxena A (2013) Hallucinated humans as the hidden context for labeling 3d scenes. In: Proceedings of the 2013 I.E. conference on computer vision and pattern recognition, CVPR’13. IEEE Computer Society, Washington, DC, pp 2993–3000 Kakumanu P, Makrogiannis S, Bourbakis N (2007) A survey of skin-color modeling and detection methods. Pattern Recogn 40(3):1106–22 Kazemi V, Keskin C, Taylor J, Kohli P, Izadi S (2014) Real-time face reconstruction from a single depth image. In: 3D Vision (3DV), 2014 2nd international conference on, vol 1. IEEE, Lyon, France. 2014 3DV. pp 369–376 Kinect sdk. https://developer.microsoft.com/en-us/windows/kinect Kyan M, Sun G, Li H, Zhong L, Muneesawang P, Dong N, Elder B, Guan L (2015) An approach to ballet dance training through ms Kinect and visualization in a cave virtual reality environment. ACM Trans Intell Syst Technol (TIST) 6(2):23 Li H, Yu J, Ye Y, Bregler C (2013) Realtime facial animation with on-the-fly correctives. ACM Trans Graph 32(4):42–1 Liang S, Kemelmacher-Shlizerman I, Shapiro LG (2014) 3d face hallucination from a single depth frame. In: 3D Vision (3DV), 2014 2nd international conference on, vol 1. IEEE, Lyon, France. 2014 3DV. pp 31–38 Liu H, Wei X, Chai J, Ha I, Rhee T (2011) Realtime human motion control with a small number of inertial sensors. In: Symposium on interactive 3D graphics and games, I3D’11. ACM, New York, pp 133–140 Liu Z, Huang J, Bu S, Han J, Tang X, Li X (2016a) Template deformation-based 3-d reconstruction of full human body scans from low-cost depth cameras. IEEE Trans Cybern PP(99):1–14 Liu Z, Zhou L, Leung H, Shum HPH (2016b) Kinect posture reconstruction based on a local mixture of gaussian process models. IEEE Trans Vis Comput Graph 14 pp. doi:10.1109/ TVCG.2015.2510000 Mackay K, Shum HPH, Komura T (2012) Environment capturing with Microsoft Kinect. In: Proceedings of the 2012 international conference on software knowledge information management and applications, SKIMA’12. Chengdu, China. 2012 SKIMA Newcombe RA, Izadi S, Hilliges O, Molyneaux D, Kim D, Davison AJ, Kohli P, Shotton J, Hodges S, Fitzgibbon A (2011) Kinectfusion: real-time dense surface mapping and tracking. In: Proceedings of the 2011 10th IEEE international symposium on mixed and augmented reality, ISMAR’11. IEEE Computer Society, Washington, DC, pp 127–136 Pachoulakis I, Kapetanakis K (2012) Augmented reality platforms for virtual fitting rooms. Int J Multimedia Appl 4(4):35 Plantard P, Shum HP, Multon F (2016a) Filtered pose graph for efficient kinect pose reconstruction. Multimed Tools Appl 1–22. doi:10.1007/s11042-016-3546-4 Plantard P, Shum HPH, Multon F (2016b) Ergonomics measurements using Kinect with a pose correction framework. In: Proceedings of the 2016 international digital human modeling symposium, DHM ’16, Montreal, 8 p Sandilands P, Choi MG, Komura T (2012) Capturing close interactions with objects using a magnetic motion capture system and a rgbd sensor. In: Proceedings of the 2012 motion in games. Springer, Berlin/Heidelberg, pp 220–231 Sandilands P, Choi MG, Komura T (2013) Interaction capture using magnetic sensors. Comput Anim Virtual Worlds 24(6):527–38 Segundo MP, Silva L, Bellon ORP, Queirolo CC (2010) Automatic face segmentation and facial landmark detection in range images. Systems Man Cybern Part B Cybern IEEE Trans 40 (5):1319–30 Shotton J, Girshick R, Fitzgibbon A, Sharp T, Cook M, Finocchio M, . . . Blake A (2013) Efficient human pose estimation from single depth images. IEEE Trans Pattern Anal Machine Intell 35 (12):2821–2840 Shum HPH (2013) Serious games with human-object interactions using rgb-d camera. In: Proceedings of the 6th international conference on motion in games, MIG’13. Springer-Verlag, Berlin/Heidelberg

16

Y. Shen et al.

Shum HPH, Ho ESL (2012) Real-time physical modelling of character movements with Microsoft Kinect. In: Proceedings of the 18th ACM symposium on virtual reality software and technology, VRST’12. ACM, New York, pp 17–24 Shum HPH, Ho ESL, Jiang Y, Takagi S (2013) Real-time posture reconstruction for Microsoft Kinect. IEEE Trans Cybern 43(5):1357–69 Soh J, Choi Y, Park Y, Yang HS (2013) User-friendly 3d object manipulation gesture using Kinect. In: Proceedings of the 12th ACM SIGGRAPH international conference on virtual-reality continuum and its applications in industry, VRCAI’13. ACM, New York, pp 231–234 Sun M, Kohli P, Shotton J (2012) Conditional regression forests for human pose estimation. In: Computer Vision and Pattern Recognition (CVPR), 2012 I.E. conference on. Providence, Rhode Island. pp 3394–3401 Tautges J, Zinke A, Krüger B, Baumann J, Weber A, Helten T, Müller M, Seidel H-P, Eberhardt B (2011) Motion reconstruction using sparse accelerometer data. ACM Trans Graph 30 (3):18:1–18:12 Vera L, Gimeno J, Coma I, Fernández M (2011) Augmented mirror: interactive augmented reality system based on Kinect. In: Human-Computer Interaction–INTERACT 2011. Springer, Lisbon, Portugal. 2011 INTERACT. pp 483–486 Wang L, Villamil R, Samarasekera S, Kumar R (2012) Magic mirror: a virtual handbag shopping system. In: Computer Vision and Pattern Recognition Workshops (CVPRW), 2012 I.E. computer society conference on. IEEE, Rhode Island. 2012 CVPR. pp 19–24 Wang K, Wang X, Pan Z, Liu K (2014) A two-stage framework for 3d facereconstruction from rgbd images. Pattern Anal Mach Intell IEEE Trans 36(8):1493–504 Weise T, Bouaziz S, Li H, Pauly M (2011) Realtime performance-based facial animation. ACM Trans Graph (TOG) 30:77, ACM Zhang P, Siu K, Jianjie Z, Liu CK, Chai J (2014a) Leveraging depth cameras and wearable pressure sensors for full-body kinematics and dynamics capture. ACM Trans Graph 33(6):221:1–221:14 Zhang P, Siu K, Jianjie Z, Liu CK, Chai J (2014b) Leveraging depth cameras and wearable pressure sensors for full-body kinematics and dynamics capture. ACM Trans Graph (TOG) 33(6):221 Zhou Z, Shu B, Zhuo S, Deng X, Tan P, Lin S (2012) Image-based clothes animation for virtual fitting. In: SIGGRAPH Asia 2012 technical briefs. ACM, Singapore. 2012 SIGGRAPH ASIA. p 33 Zhou L, Liu Z, Leung H, Shum HPH (2014) Posture reconstruction using Kinect with a probabilistic model. In: Proceedings of the 20th ACM symposium on virtual reality software and technology, VRST’14. ACM, New York, pp 117–125 Zollhöfer M, Nießner M, Izadi S, Rehmann C, Zach C, Fisher M, Wu C, Fitzgibbon A, Loop C, Theobalt C et al (2014) Real-time non-rigid reconstruction using an rgb-d camera. ACM Trans Graph (TOG) 33(4):156

Real-Time Full-Body Pose Synthesis and Editing HO Edmond S. L. and Yuen Pong C.

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 State of the Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Editing and Synthesizing Full-Body Pose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Editing Poses by Inverse Kinematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Data-Driven Pose Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 User Interface for Full-Body Posing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

Abstract

Posing character has always been playing an important role in character animation and interactive applications such as computer games. However, such a task is time-consuming and labor-intensive. In order to improve the efficiency in character posing, researchers in computer graphics have been working on a wide variety of semi- or fully automatic approaches in creating full-body poses, ranging from traditional approaches like inverse kinematics (IK), data-driven approaches which make use of captured motion data, as well as direct pose manipulation through intuitive interfaces. In this book chapter, we will introduce the aforementioned techniques and also discuss their applications in animation production.

H. Edmond S. L. Department of Computer and Information Sciences, Northumbria University, Newcastle upon Tyne, UK e-mail: [email protected] Y. Pong C. (*) Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong e-mail: [email protected] # Springer International Publishing Switzerland 2016 B. Müller, S.I. Wolf (eds.), Handbook of Human Motion, DOI 10.1007/978-3-319-30808-1_8-1

1

2

H. Edmond S. L. and Y. Pong C.

Keywords

Pose synthesis • Pose editing • Inverse kinematics • Motion retargeting • Motion blending • Jacobian-based IK • Cyclic coordinate decent • Collision avoidance • Data-driven pose synthesis

Introduction A recent study Kyto et al. (2015) found that 61 % of time in animation production was spent on character posing by professional animators. Such a time-consuming task motivated tremendous effort in improving the efficiency of human character posing over the last three decades. Producing full-body poses has always been playing an important role in character animation. For example, a long motion sequence can be represented by key poses in keyframe animation, and the in-between poses can be interpolated from the key poses. Such an approach is still widely used in animation production nowadays. While only a small number of poses have to be created in keyframe animation, creating a full-body pose manually (e.g., controlling the locations of all body parts) is a tedious and time-consuming task. As results, researchers have been working on fully or semiautomatic methods in full-body pose creation. In this chapter, we will review methods in character posing, including those creating new pose from scratch and creating new pose by modifying existing motion data. In addition to introducing the pose creation techniques, the applications of those techniques in animation production, such as reaching, retargeting, foot-skate cleanup, etc., will be discussed.

State of the Art Jacobian-based inverse kinematics (IK) algorithms have been widely used in pose editing over the last three decades in computer animation. A recent work by Harish et al. (2016) presents a new parallel numerical IK approach for multithreaded architectures. The new approach can handle complex articulated structures (e.g., with more than 600 degrees of freedom) with multiple constraints at real time. Besides the traditional approaches, many intuitive real-time posing approaches have been proposed. Guay et al. (2013) proposed to synthesize 3D pose of a character by drawing a single aesthetic line called line of action (LOA). An example is shown in Fig. 1. The main idea of the new method is to edit the pose of the character such that the selected body line becomes similar to the sketched simple curve given by the user. Ten body lines which connect every pair of end effectors (e.g., head to right hand, left hand to right leg, etc.) are defined, and the most appropriate one will be selected according to the location of the line sketched and the viewpoint. Another interesting real-time posing approach proposed by Rhodin et al. (2014) recently enables the user to control the poses of a virtual character by a wide range of

Real-Time Full-Body Pose Synthesis and Editing

3

Fig. 1 3D poses generated by a single aesthetic line (colored in red) proposed by Guay et al. (2013) (Reproduced with permission from Guay et al. (2013))

Fig. 2: Posing characters with different articulated structures with the tangible input device proposed in Jacobson et al. (2014) (Reproduced with permission from Jacobson et al. (2014))

input motion such as full-body, hand, or face motion from any motion-sensing devices. Instead of controlling the skeleton of the characters, the new method defines the vertex-to-vertex correspondence between the meshes representing the input motion and the virtual character and deforms the mesh of the character accordingly. Furthermore, researchers have been trying to provide the users with a more natural way in posing characters. In Jacobson et al. (2014), a tangible puppet-like input device is proposed for interactive pose manipulation. By measuring the relative bone orientation on the input device, the pose of the virtual character will be updated accordingly. Different from previously developed puppet-like input devices (e.g., Esposito et al. 1995), the new device can be used for articulated structure with different topologies as the device is assembled from modular, interchangeable, and hot-pluggable parts. Examples of posing are shown in Fig. 2. To further enhance the realism of the resultant animation, physics-based approaches have been adapted in animation production. However, due to the high computational cost in previous approaches, most of them cannot be used in interactive applications. A recent work by Hämäläinen et al. (2015) synthesizes physically valid poses of humanlike characters at interactive frame rate in performing a wide variety of motions such as balancing on a ball, recovering from disturbances, reaching and juggling a ball, etc.

4

H. Edmond S. L. and Y. Pong C.

Editing and Synthesizing Full-Body Pose In this section, techniques for synthesizing and editing full-body character pose will be introduced. Firstly, a traditional posing approach inverse kinematics (IK) which is widely used in computer animation and robotics will be discussed in Section 3.1. In particular, different types of IK solvers (Sections 3.1.1 and 3.1.3) and examples of their applications (Section 3.1.2) will be presented. Secondly, data-driven approaches which make use of existing motion data to produce natural-looking pose will be explained in Section 3.2. Finally, direct manipulation of the pose of the character using different kinds of intuitive interfaces such as puppet-based, natural user interface and sketch-based approaches will be reviewed in Section 3.3.

Editing Poses by Inverse Kinematics With the advancement of motion capture technology, more and more motion data are available nowadays. Reusing the captured or existing poses in new applications can be more efficient (in terms of production time and labor cost) than creating new poses from scratch. However, the number of captured or existing poses is limited, and it is necessary to edit the available poses according to the user’s new requirements. In addition, there are demands in editing the full-body pose at runtime in interactive applications such as computer games. In computer animation, characters are usually represented as articulated structures in which the body segments connected at the joint locations. The pose of the character is then controlled by the joint parameters (e.g., joint angles) and the global translation of the root joint (e.g., the hip of a humanlike character). Given the skeletal structure (e.g., bone lengths), the joint parameters of the current pose, and the changes in the joint parameters, the changes in location or orientation of every joint can be computed: x_i ¼ J i θ_

(1)

where x_i can be the changes in positions or orientations of the i-th joint and Ji is the Jacobian matrix for mapping the changes in joint parameters θ_ to the changes in x_i of the i-th joint:   @xi Ji ¼ @θ

(2)

However, it is a tedious task for the user to control the pose of a character by specifying that all joint parameters as a humanlike character usually composed of more than 40 joints. Instead of editing a pose by specifying all of the joint parameters as in Eq. 1, the new pose can be produced by specifying the target location and/or the orientation of the selected joint(s) and the required changes in the joint parameters will be computed by inverse kinematics (IK).

Real-Time Full-Body Pose Synthesis and Editing

5

IK has been widely used in robotics and computer animation for controlling robots and characters. Poses of articulated characters such as human and animal figures can be edited by such an approach for producing computer animations. IK can be applied to a wide range of applications in animation production, for example, keyframe postures editing, interactive postures editing, and editing pre-captured motion data. An early analytic approach proposed by Lee and Shin (1999) determines the posture based on the positions of the hands and feet relative to the positions of shoulder and hips. High performance in pose editing has been demonstrated in their experiments. However, one of the major limitations of analytic approaches is that the analytic solvers must be designed specifically to each individual system which cannot be applied to arbitrary articulated structures. On the other hand, numerical IK solvers are more general and will be discussed below.

Numerical Inverse Kinematics Approaches Numerical solvers linearize the relationship of joint parameters and the positions and/or orientations of the end effectors around the current posture to obtain the IK solution for new position and/or orientation of the end effectors close to the current position and/or orientation: θ_ ¼ J 1 i x_i

(3)

is the inverse of the Jacobian matrix (Eq. 2) of the i-th joint. Since the where J 1 i Jacobian matrix may not be a square matrix, the pseudoinverse of the Jacobian matrix can be used for solving the IK problem. The pseudoinverse J þ i can be calculated by: T T Jþ i ¼ Ji JiJi

1

(4)

and the IK problem can be solved by: θ_ ¼ J þ i x_i

(5)

There are three main advantages for numerical solvers: • They can be applied to arbitrary chain structures. • Various types of constraints, such as positional or planar constraints, can be handled in the same platform. • Constraints can be easily switched on and off. Therefore, many numerical IK approaches have been proposed. The most practical and commonly used numerical solver is based on the least square methods (Whitney 1969). One of the major problems in the original least squares method is that it becomes unstable near the singularity points, in which results in large changes in the solution (i.e., the joint parameters ̇ θ). The singularity problem usually occurs

6

H. Edmond S. L. and Y. Pong C.

when there is no physically feasible solution to the IK problem; for example, the target location of a controlled joint is unreachable or there are multiple constraints which are conflicting with each other. To tackle this problem, various methodologies such as singularity-robust (SR) inverse (Nakamura and Hanafusa 1986) have been developed to stabilize the system near such singularity postures for generating full-body postures in graphics applications (Yamane and Nakamura 2003). The main idea of using SR inverse is to introduce a weighting parameter to balance between satisfying constraints and stabilizing the changes in joint parameters. The SR inverse J i is calculated by: J i ¼ J Ti J i J Ti þ kI

1

(6)

where k is the weighting parameter and I is the identity matrix. The larger the value of k, the larger the error in satisfying constraints, but the solution becomes more stable. The bottleneck of these methods is the cost of computing the pseudoinverse matrix, which grows cube proportional to the number of constraints. Baraff (1996) proposes a method of forward dynamics for articulated body structures, which can be used for solving IK problems. Instead of calculating the pseudoinverse matrix, an equation of Lagrange multipliers is solved. Since the matrix used in his method is sparse, efficient solvers for sparse matrix can be used. However, the method can only handle equality constraints, and the cost still increases cubic proportional to the number of auxiliary constraints.

Application of Numerical Inverse Kinematics Approaches As numerical IK approaches can be applied to arbitrary articulated structures, a wide range of applications has been developed for real-time full-body pose creation. Four types of applications are briefly discussed below. Footskate Cleanup In animation production, the motion sequence can be produced by interpolating key poses or directly using MOCAP data. However, when interpolating the key poses, artifacts such as footskate can be produced. Footskate occurs when the motion cannot reproduce the footplants the animation intended to create. For example, in a walking motion, footskates occurs when the character slides over the surface of the ground. When reusing MOCAP data, footskate will occur if the motion data were not well captured due to the noise or tracking error. This problem can be solved by analyzing the motion sequence (e.g., the state in a walk cycle, ankle height and rotation, etc.) and determine the position of the feet at every pose (in each frame). Then, IK can be applied to edit the poses accordingly (Kovar et al. 2002; Kulpa et al. 2005; Lee and Shin 1999; Lu and Liu 2014). Character Retargeting While reusing existing motions, including both MOCAP data and previously created motions, is common practice in animation production, it is a very challenging task.

Real-Time Full-Body Pose Synthesis and Editing

7

Fig. 3 Examples of retargeting the original Judo poses (colored in red) to characters with different sizes using interaction mesh (Ho et al. 2010b) (Reproduced with permission from Ho et al. (2010b))

This is because the articulated character used in existing motions may differ from the new character(s) in body segment lengths and sizes. As a result, animating new characters with existing motion data may result in loss of contacts (e.g., wrong footplates, unable to reach an object). To tackle this problem, IK can be applied at every pose (in each frame) to constrain some of the body parts (such as hands and feet) to preserve the contacts as in the original motion while editing the other parts of the body accordingly (Gleicher 1998). Motion retargeting can also be applied to transferring the live performance of a human subject to the movement of a virtual character (Shin et al. 2001). Ho et al. (2010b) proposed interaction mesh, which represents the spatial relations between closely interacting body parts. By preserving the shape of the interaction mesh while retargeting characters to new sizes, penetration-free postures will be produced. Examples of retargeting closely interacting characters to different sizes are illustrated in Fig. 3. The spatial relation-based representation can also be used for controlling the movement of humanoid robots in highly constrained environment (Ho and Shum 2013) and synthesizing virtual partner in VR applications (Ho et al. 2013a). Collision Avoidance in Pose Editing Collision and interpenetration of body segments can significantly degrade the realism of the resultant character animation. Collisions may occur when the character is interacting with other objects and characters in the scene. To solve this problem, collision detection algorithms will be applied to determine which body part(s) are colliding with other body parts or objects. Next, the colliding segments will be moving away from each other to avoid the interpenetration by applying IK to edit the poses. Various approaches have been proposed (Kallmann 2008; Lyard and Magnenat-Thalmann 2008). Editing Poses for Interaction Between Characters For interactive applications such as computer games and virtual reality application, virtual characters will response to the avatars or objects controlled by the user. When handling virtual scenes with close interactions between the characters such as fighting and dancing, the poses of the characters have to be edited at runtime to preserve the context of the interaction. Imagine that the attacking character is trying to punch the head of the defending characters. The punching trajectories have to be

8

H. Edmond S. L. and Y. Pong C.

edited in order to reach the target (i.e., the head) when the defending characters is moving around (e.g., controlled by the user). For example, Shum et al. (Shum et al. 2007; Shum et al. 2008) edit the positions of the interacting body parts at every frame, and Ho and Komura (Ho and Komura 2009; Ho and Komura 2011; Ho et al. 2010a) edit the way the body parts tangling with each other. IK can also be used for creating the reactive motion when external perturbation is applied on the character (Komura et al. 2005).

Heuristic Inverse Kinematics Approach One of the representative heuristic search approaches in solving IK problems is the cyclic coordinate descent (CCD) (Wang and Chen 1991) method. The CCD method is a simple and fast approach for computing the joint parameters to satisfying the constraints iteratively. The IK problem is solved in cycles in which each joint parameter will be computed in a cycle. Starting with the outermost joint (i.e., closet to the end effector) (Welman 1993) in the articulated structure, the joint parameters are updated sequentially to bring the end effector E to the target location T. When editing a joint parameter on joint i, the positions of joint i, E, and T in Cartesian coordinates are projected to Pi, Ea, and Ta according to the axis of rotation a. Next, the rotation Δθjointi required to bring Ea closer to Ta can be found by calculating the angle between the two vectors originated from Pi to Ea and Pi to Ta:   ð E a  Pi Þ  ð T a  P i Þ Δθjointi ¼ arccos kðEa  Pi Þk kðT a  Pi Þk

(7)

The direction of rotation will be determined by the cross product of the two vectors Pi to Ea and Pi to Ta. By iteratively updating the joint parameters in every cycle, the difference between the positions of E and T can be minimized. Since the joint parameters can be computed analytically as in Eq. 7 in each step, computationally expansive calculations such as matrix manipulations are not needed and result in less computational costly than numerical approaches introduced in 3.1.1. This makes the CCD method suitable for real-time full-body pose editing applications.

Summary of IK Approaches A general problem of traditional IK algorithms is the difficulty to ensure the naturalness of the synthesized motion. This is because natural human motion involves a lot of subtle behaviors such as balancing and correlation of body parts, which are difficult to be modeled mathematically. In the next section, we will introduce methodologies in using precaptured human motion to improve the solution for pose editing.

Real-Time Full-Body Pose Synthesis and Editing

9

Fig. 4 Results obtained in Rose et al. (1998), the sample motions (green) and the blended motions (yellow) (Reproduced with permission from Rose et al. (1998))

Data-Driven Pose Synthesis The idea of data-driven motion synthesis is to make use of captured motion data to create the required postures, such that natural and humanlike movement can be created by specifying a relatively small number of constraints. An early work by Rose et al. (1998) edits poses by interpolating collected poses to satisfy the constraints. This is based on an old technique called motion blending in computer animation. In Rose et al. (1998), a concept of verbs and adverbs is proposed to generate new poses from examples. In their work, verbs refer to parameterized motions constructed from sets of similar motions, and adverbs are parameters that control the verbs. For each verb, the sample motions are time aligned by manually specifying the key-time for every motion. Then, the motions clips are placed on a parameter space based on the characteristics of the motion clips. Motion blending is done by computing the weights of the sample motions in the corresponding verb using radial basis functions (RBF). By specifying the adverbs, new motion will be created. In addition, users can create a verbgraph so that transition motions between verbs can be generated. Figure 4 shows the sample motions (green) and the blended motions (yellow) created by their method.

10

H. Edmond S. L. and Y. Pong C.

In the approaches proposed in recent years, the collected human motions have to be analyzed first, and machine learning tasks are often required to learn a model for pose synthesis in later stage. In the rest of this section, we roughly divide the datadriven approaches into two categories: offline training and online modeling approaches.

Offline Training Approaches Synthesizing a natural-looking pose can be viewed as finding a solution (i.e., joint parameters) from a natural movement space created using captured motions. Grochow et al. (2004) propose to use scaled Gaussian process latent variable model (SGPLVM) (Lawrence 2004) to create such a natural pose space. While the process of learning the pose model is done offline, the learned model can be used for real-time full-body pose synthesis. By specifying constraints such as the positions of the hands and feet, natural-looking full-body pose can be synthesized. However, due to the complexity of the learning process, the model cannot be trained with large number of poses. In Wu et al. (2011), Wu et al. further propose to select a subset of distinctive postures in a large pose database for learning a natural pose space for pose synthesis. Wei and Chai (2011) solved the same problem by constructing a mixture of factor analysis. The algorithm segments the motion database into local regions and models each of them individually. Nevertheless, the training cost and system complexity increases with the amount of source data, and the effectiveness of dimensional reduction reduces with the increase of motion data variety. Online Modeling Approaches As opposed to offline training approaches, online modeling has shown to be effective for real-time application with large motion dataset. The idea is to select a small subset of posture based on run-time information to synthesize the required posture. For example, Chai and Hodgins (2005) use a lazy learning approach to learn low-dimensional local linear models (principal component analysis (PCA)) to approximate the high-dimensional manifold which contains the natural and valid poses during runtime. Given the current pose of the character and the target positions of the selected joint(s) as constraints, a set of postures that are similar to the current one is used to learn the local linear model. By interpolating the poses in the lowdimension space while minimizing the energy terms to ensure that the synthesized pose is smooth (i.e., joint velocities), and satisfy the constraints given by the user and the probability distribution of the captures motion in the training data, naturallooking full-body motion can be synthesized. Liu et al. extended the idea by using the maximum a posteriori framework to reconstruct the motion, which enhanced the consistency of the movement in the temporal domain (Liu et al. 2011). The general problem of these methods is that it is difficult to ensure the set of extracted postures to be logically similar as kinematics metric is used. Ho et al. (2013b) also use a lazy learning approach to learn local linear models while taking into account the spatial relationship between body parts. The topology-based approach computes and represents the tangling of body parts using a subset of topology coordinates (Ho and Komura 2009). As interpolating

Real-Time Full-Body Pose Synthesis and Editing

11

Fig. 5 Posing characters with close interactions while avoiding penetration of body parts by the method proposed in Ho et al. (2013b) (Reproduced with permission from Ho et al. (2013b))

poses with significant difference can easily result in interpenetrates of the body parts, their method only selects topologically similar poses to learn the local model and ensures the changes in spatial relationship to be small when editing the pose. By this, penetration-free postures can be created (Fig. 5).

User Interface for Full-Body Posing Beside the pose synthesis and editing approaches introduced above, another stream of research lies in providing a more natural and intuitive interface for the users to pose the characters. In this subsection, three types of character posing interfaces, puppet-based interface, natural user interface, and sketch-based interface, will be introduced.

Puppet-Based Interface An early work by Esposito et al. (1995) provides the users with a puppet called Monkey as shown in Fig. 6. Monkey is a humanlike puppet with 32 degrees of freedom, 1800 tall, and about 600 wide. Rotational sensors are located at the joints of the puppet to measure the joint angles when the user manipulates the puppet. The pose sequence produced by puppet-based input devices can also be used as example motion to retrieve similar movement from the database (Numaguchi et al. 2011). A recently proposed tangible input device (Jacobson et al. 2014) introduced in Section 2 further enables users to create articulated characters with different topology for real-time full-body posing. Natural User Interface With the advancement in motion-sensing technologies, a wide variety of natural user interface (NUI) applications have been proposed. In human posing, a recent work by

12

H. Edmond S. L. and Y. Pong C.

Fig. 6 An input device, called Monkey, for interactive pose manipulation (Reproduced with permission from Esposito et al. (1995))

Oshita et al. (2013) captures the movements of the fingers and hands of the user when manipulating an intangible puppet. The design of the puppet control is inspired by traditional puppet controlling mechanism, in which the head/body rotation and body translation are controlled by the right hand while the legs of the character are controlled by the left hand. The hand and finger movement is captured at runtime using leap motion controller (Motion 2016). Unlike previous sensor-based approaches like the data glove-based methods (Isrozaidi et al. 2010; Komura and Lam 2006), no sensor or marker is required to be attached on the hand as the leap motion controller tracks the finger and hand movement by emitting infrared (IR) light and analyzing the reflected IR light to calculate the 3D positions of different parts of the hand(s) over time (Fig. 7).

Sketch-Based Interface Another type of popular intuitive posing interfaces is sketch-based approaches which are inspired by the pose design process in traditional 2D hand-drawn animation production. Mapping 2D sketch to 3D pose is an under-constrained problem. An early work proposed by Igarashi et al. (1999) enables user to sketch the 2D silhouette of the character, and the corresponding 3D mesh model will be generated. In additional, methods for posing 3D characters using sketches of the skeleton in 2D

Real-Time Full-Body Pose Synthesis and Editing

13

Fig. 7 An input device, called Monkey, for interactive pose manipulation (Reproduced with permission from Oshita et al. (2013))

stick figures (Choi et al. 2012; Davis et al. 2003; Wei and Chai 2011) are also an active research topic. In Lin et al. (2012), the user can sketch the sitting pose of a stick figure in 2D. By taking into account the interaction between the sketched pose and the environment and preserving the physical correctness such as balancing of the character, the 3D pose will be produced at interactive rate (with GPU implementation). To further simplify the input from the user, highly abstracted sketch such as the line of action approach (Guay et al. 2013) introduced in Section 2 has been proposed. A recent work proposed by Hahn et al. (2015) further proposed a method to allow the user to define a custom sketch abstraction, for example, by sketching the outline or the skeleton, and the system will map the sketch to the rigging parameters to edit the pose of the 3D character by deforming the mesh model.

Conclusion In this chapter, various kinds of real-time full-body posing approaches have been discussed. Traditional posing approaches such as IK automatically creates new pose according to a small number of constraints which reduces the workload of the animators in posing characters. Data-driven approaches produce natural-looking poses by limiting the poses to be produced lie in the natural pose space. Finally, a wide range of intuitive interfaces for directly manipulating the pose of the character to further simplify the posing process. While the research interests in full-body posing have been changing from traditional methods to more intuitive controls, we believe IK will still play an important role in character posing.

References Baraff D (1996) Linear-time dynamics using lagrange multipliers. In: SIGGRAPH ’96: Proceedings of the 23rd annual conference on computer graphics and interactive techniques. ACM, New York, pp 137–146. doi:10.1145/237170.237226

14

H. Edmond S. L. and Y. Pong C.

Chai J, Hodgins JK (2005) Performance animation from low-dimensional control signals. In: SIGGRAPH ’05: ACM SIGGRAPH 2005 papers. ACM, New York, pp 686–696. doi:10.1145/1186822.1073248 Choi MG, Yang K, Igarashi T, Mitani J, Lee J (2012) Retrieval and visualization of human motion data via stick figures. Comput Graph Forum 31(7pt1):2057–2065. doi:10.1111/j.14678659.2012.03198.x Davis J, Agrawala M, Chuang E, Popović Z, Salesin D (2003) A sketching interface for articulated figure animation. In: Proceedings of the 2003 ACM SIGGRAPH/eurographics symposium on computer animation, SCA ’03. Eurographics Association, Aire-la-Ville, pp 320–328 http://dl. acm.org/citation.cfm?id=846276.846322 Esposito C, Paley WB, Ong J (1995) Of mice and monkeys: a specialized input device for virtual body animation. In: Proceedings of the 1995 symposium on interactive 3D graphics, I3D ’95. ACM, New York, p 109–ff. doi:10.1145/199404.199424 Gleicher M (1998) Retargeting motion to new characters. In: SIGGRAPH ’98: Proceedings of the 25th annual conference on computer graphics and interactive techniques. ACM Press, New York, pp 33–42. doi:10.1145/280814.280820 Grochow K, Martin SL, Hertzmann A, Popović Z (2004) Style-based inverse kinematics. ACM Trans Graph 23(3):522–531. doi:10.1145/1015706.1015755 Guay M, Cani MP, Ronfard R (2013) The line of action: an intuitive interface for expressive character posing. ACM Trans Graph 32(6):205:1–205:8. doi:10.1145/2508363.2508397 Hahn F, Mutzel F, Coros S, Thomaszewski B, Nitti M, Gross M, Sumner RW (2015) Sketch abstractions for character posing. In: Proceedings of the 14th ACM SIGGRAPH/eurographics symposium on computer animation, SCA ’15. ACM, New York, pp 185–191. doi:10.1145/ 2786784.2786785 Hämäläinen P, Rajamäki J, Liu CK (2015) Online control of simulated humanoids using particle belief propagation. ACM Trans Graph 34(4):81:1–81:13. doi:10.1145/2767002 Harish P, Mahmudi M, Callennec BL, Boulic R (2016) Parallel inverse kinematics for multithreaded architectures. ACM Trans Graph 35(2):19:1–19:13. doi:10.1145/2887740 Ho ESL, Komura T (2009) Character motion synthesis by topology coordinates. In: Dutr’e P, Stamminger M (eds) Computer graphics forum (Proceedings of Eurographics 2009), Munich, vol 28, pp 299–308 Ho ESL, Komura T (2011) A finite state machine based on topology coordinates for wrestling games. Comput Animat Virtual Worlds 22(5):435–443. doi:10.1002/cav.376 Ho ESL, Shum HPH (2013) Motion adaptation for humanoid robots in constrained environments. In: Robotics and automation (ICRA), 2013 I.E. international conference on, pp 3813–3818. doi:10.1109/ICRA.2013.6631113 Ho ESL, Komura T, Ramamoorthy S, Vijayakumar S (2010a) Controlling humanoid robots in topology coordinates. In: Intelligent robots and systems (IROS), 2010 IEEE/RSJ international conference on, pp 178–182. doi:10.1109/IROS.2010.5652787 Ho ESL, Komura T, Tai CL (2010b) Spatial relationship preserving character motion adaptation. ACM Trans Graph 29(4):1–8. doi:10.1145/1778765.1778770 Ho ESL, Chan JCP, Komura T, Leung H (2013a) Interactive partner control in close interactions for real-time applications. ACM Trans Multimed Comput Commun Appl 9(3):21:1–21:19. doi:10.1145/2487268.2487274 Ho ESL, Shum HPH, Ym C, PC Y (2013b) Topology aware data-driven inverse kinematics. Comput Graph Forum 32(7):61–70. doi:10.1111/cgf.12212 Igarashi T, Matsuoka S, Tanaka H (1999) Teddy: a sketching interface for 3d freeform design. In: Proceedings of the 26th annual conference on computer graphics and interactive techniques, SIGGRAPH ’99. ACM Press/Addison-Wesley, New York, pp 409–416. doi:10.1145/ 311535.311602 Isrozaidi N, Ismail N, Oshita M (2010) Data glove-based interface for real-time character motion control. In: ACM SIGGRAPH ASIA 2010 Posters, SA ’10. ACM, New York, p 5:1. doi:10.1145/1900354.1900360

Real-Time Full-Body Pose Synthesis and Editing

15

Jacobson A, Panozzo D, Glauser O, Pradalier C, Hilliges O, Sorkine-Hornung O (2014) Tangible and modular input device for character articulation. ACM Trans Graph 33(4):82:1–82:12. doi:10.1145/2601097.2601112 Kallmann M (2008) Analytical inverse kinematics with body posture control. Comput Animat Virtual Worlds 19(2):79–91 Komura T, Lam WC (2006) Real-time locomotion control by sensing gloves. Comput Animat Virtual Worlds 17(5):513–525. doi:10.1002/cav.114 Komura T, Ho ESL, Lau RW (2005) Animating reactive motion using momentum-based inverse kinematics: motion capture and retrieval. J Vis Comput Animat 16(3–4):213–223. doi:10.1002/ cav.v16:3/4 Kovar L, Schreiner J, Gleicher M (2002) Footskate cleanup for motion capture editing. In: SCA ’02: Proceedings of the 2002 ACM SIGGRAPH/Eurographics symposium on Computer animation, pp 97–104. doi:10.1145/545261.545277 Kulpa R, Multon F, Arnaldi B (2005) Morphology-independent representation of motions for interactive human-like animation. Computer Graphics Forum 24(3):343–351. doi:10.1111/ j.1467-8659.2005.00859.x Kyto M, Dhinakaran K, Martikainen A, Hamalainen P (2015) Improving 3d character posing with a gestural interface. IEEE Comput Graph Appl. doi:10.1109/MCG.2015.117 Lawrence ND (2004) Gaussian process latent variable models for visualisation of high dimensional data. In: Advances in neural information processing systems (Proceedings of NIPS 2003). MIT Press, Cambridge, MA, pp 329–336 Lee J, Shin SY (1999) A hierarchical approach to interactive motion editing for human-like figures. In: Proceedings of the 26th annual conference on computer graphics and interactive techniques, SIGGRAPH ’99. ACM Press/Addison-Wesley Publishing, New York, pp 39–48. doi:10.1145/ 311535.311539 Lin J, Igarashi T, Mitani J, Liao M, He Y (2012) A sketching interface for sitting pose design in the virtual environment. IEEE Trans Vis Comput Graph 18(11):1979–1991. doi:10.1109/ TVCG.2012.61 Liu H, Wei X, Chai J, Ha I, Rhee T (2011) Realtime human motion control with a small number of inertial sensors. In: Symposium on interactive 3D graphics and games, I3D ’11. ACM, New York, pp 133–140. doi:10.1145/1944745.1944768 Lu J, Liu X (2014) Foot plant detection for motion capture data by curve saliency. In: Computing, Communication and Networking Technologies (ICCCNT), 2014 international conference on, pp 1–6. doi:10.1109/ICCCNT.2014.6963001 Lyard E, Magnenat-Thalmann N (2008) Motion adaptation based on character shape. Comput Animat Virtual Worlds 19(3–4):189–198. doi:10.1002/cav.v19:3/4 Leap Motion (2016, n.d.) https://www.leapmotion.com/ Nakamura Y, Hanafusa H (1986) Inverse kinematics solutions with singularity robustness for robot manipulator control. J Dyn Syst Meas Control 108:163–171 Numaguchi N, Nakazawa A, Shiratori T, Hodgins JK (2011) A puppet interface for retrieval of motion capture data. In: Proceedings of the 2011 ACM SIGGRAPH/Eurographics symposium on computer animation, SCA ’11. ACM, New York, pp 157–166. doi:10.1145/ 2019406.2019427 Oshita M, Senju Y, Morishige S (2013) Character motion control interface with hand manipulation inspired by puppet mechanism. In: Proceedings of the 12th ACM SIGGRAPH international conference on virtual-reality continuum and its applications in industry, VRCAI ’13. ACM, New York, pp 131–138. doi:10.1145/2534329.2534360 Rhodin H, Tompkin J, Kim KI, Kiran V, Seidel HP, Theobalt C (2014) Interactive motion mapping for real-time character control. Comput Graph Forum (Proc Eurograph) 33(2):273–282. doi:10.1111/cgf.12325 Rose C, Cohen MF, Bodenheimer B (1998) Verbs and adverbs: multidimensional motion interpolation. IEEE Comput Graph Appl 18:32–40. doi:10.1109/38.708559

16

H. Edmond S. L. and Y. Pong C.

Shin HJ, Lee J, Shin SY, Gleicher M (2001) Computer puppetry: an importance-based approach. ACM Trans Graph 20(2):67–94. doi:10.1145/502122.502123 Shum HPH, Komura T, Yamazaki S (2007) Simulating competitive interactions using singly captured motions. In: Proceedings of ACM virtual reality software technology 2007, pp 65–72 Shum HPH, Komura T, Yamazaki S (2008) Simulating interactions of avatars in high dimensional state space. In: ACM SIGGRAPH symposium on interactive 3D graphics (i3D) 2008, pp 131–138 Wang LCT, Chen CC (1991) A combined optimization method for solving the inverse kinematics problems of mechanical manipulators. IEEE Trans Robot Autom 7(4):489–499. doi:10.1109/ 70.86079 Wei XK, Chai J (2011) Intuitive interactive human-character posing with millions of example poses. IEEE Comput Graph Appl 31:78–88. doi:10.1109/MCG.2009.132 Welman C (1993) Inverse kinematics and geometric constraints for articulated figure manipulation. Master’s thesis, Simon Frasor University Whitney D (1969) Resolved motion rate control of manipulators and human prostheses. Man-Machine Syst IEEE Trans 10(2):47–53. doi:10.1109/TMMS.1969.299896 Wu X, Tournier M, Reveret L (2011) Natural character posing from a large motion database. IEEE Comput Graph Appl 31(3):69–77. doi:10.1109/MCG.2009.111 Yamane K, Nakamura Y (2003) Natural motion animation through constraining and deconstraining at will. IEEE Trans Vis Comput Graph 9(3):352–360. doi:10.1109/TVCG.2003.1207443

Real-Time Full Body Motion Control John Collomosse and Adrian Hilton

Abstract

This chapter surveys techniques for interactive character animation, exploring datadriven and physical simulation-based methods. Interactive character animation is increasingly data driven, with animation produced through the sampling, concatenation, and blending of pre-captured motion fragments to create movement. The chapter therefore begins by surveying commercial technologies and academic research into performance capture. Physically based simulations for interactive character animation are briefly surveyed, with a focus upon technique proven to run in real time. The chapter focuses upon concatenative synthesis approaches to animation, particularly upon motion graphs and their parametric extensions for planning skeletal and surface motion for interactive character animation. Keywords

Animation • 3D motion capture • Real-time motion • Virtual reality • Augmented reality • 4D mesh calculation • Parametric motion

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . State of the Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Commercial Technologies for Performance Capture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Marker-Less Human Motion Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Interactive Character Animation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Real-Time Physics-Based Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Concatenative Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Summary and Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2 2 3 5 8 10 14 22 23 24

J. Collomosse (*) • A. Hilton Centre for Vision Speech and Signal Processing (CVSSP), University of Surrey, Surrey, UK e-mail: [email protected]; [email protected] # Springer International Publishing AG 2016 B. Müller, S.I. Wolf (eds.), Handbook of Human Motion, DOI 10.1007/978-3-319-30808-1_9-1

1

2

J. Collomosse and A. Hilton

Introduction Compelling visuals and high-quality character animation are cornerstones of modern video games and immersive experiences. Yet character animation remains an expensive process. It can take a digital artist weeks to skin (design the 3D surface representation) of a character model and then rig it with a skeleton to facilitate full body control and animation. Animation is often expedited by retargeting human performance capture data to drive the character’s movement. Yet creativity and artistic input remains in the loop, blending handcrafted animation with motion capture data which itself may be an amalgam of multiple takes (e.g., it is common for separate passes to be used for the face, head, and hands). Performance capture itself is expensive; equipment hire, operation, and studio/actor time can approach millions of US dollars on a high-end production. The recent resurgence of virtual and augmented reality (VR/AR) experiences, in which character interaction takes place at very close quarters, is further driving up expectations of visual realism. Creating believable interactive digital characters is therefore a trade-off between project budget and quality. Better tool support inevitably leads to efficiency and so a rebalancing toward higher quality. In this chapter, we survey state-of-the-art technologies and algorithms (as of 2015) for efficient interactive character animation. While a common goal is a drive toward increased automation, which in some cases can produce interactive characters with near-complete automation, one should not lose sight that these are tools only and the need for the creative artist in the loop remains essential to reach the high-quality bar demanded by modern production. As such this chapter takes a practical view on animation, first surveying the commercial technologies and academic research into performance capture and then surveying the two complementary approaches to real-time animation – physically based approaches (examined further in chapter C-2) and data-driven approaches. Although character animation is frequently used within other domains with the Creative Industries (movies, broadcast), its use within games requires new animation sequences to be generated on the fly, responding in real time to user interaction and game events. This places some design restrictions on the underpinning algorithms (efficient data structures, no temporal look ahead for kinematics). This chapter therefore focuses upon algorithms for interactive, rather than more general offline, character animation covered elsewhere in this book.

State of the Art Historically character animation has been underpinned by meticulous observations of movement in nature, for example, the gait cycles of people or animals. This link has been made explicit by contemporary character animation, which is trending toward a data-driven process in which sampled physical performance is the basis for synthesizing realistic movement in real time. This chapter therefore begins by surveying commercial technologies, and state-of-the-art Computer Vision algorithms, for capturing human motion data.

Real-Time Full Body Motion Control

3

Commercial Technologies for Performance Capture Motion Capture (mocap) technology was initially developed within the Life Sciences for human movement analysis. The adoption of mocap for digital entertainment, commonly referred to as performance capture (PC), is now widespread. PC accounts for 46% of the total 3D motion capture system market which is growing annually at a rate of around 10% and expected to reach 142.5 million US dollars by 2020 (Rohan 2015). Indeed many of the innovations in mocap (e.g., marker-less capture) are now being developed within the Creative Industries and transferred back into domains such as biomechanics and healthcare. PC systems enable sequences of skeletal joint angles to be recorded from one or several actors. The key distinction between PC systems is the kind of physical marker or wearable device (if any) required to be attached to the actors. The predominant form of PC in the Creative Industries is marker based, using passive markers that are tracked visually using synchronized multiple viewpoint video (MVV). Popular systems for passive marker PC are manufactured by Vicon (UK) and Optitrack (US), which require the actor to wear retroreflective spheres (approximately 20–30 are typically used for full body capture). A region of the studio (capture volume) is surrounded by several infrared (IR) cameras in known locations and illuminated by several diffuse IR light sources. Prior to capture of performance data, a calibration process is performed to learn the relative locations (extrinsic parameters) of the cameras. This enables the world location of the markers attached to the actor to be triangulated, resulting in a 3D point cloud from which a skeletal pose is inferred using physical and kinematic constraints. Modern software (e.g., Blade or MotionBuilder) can perform this inference in real time providing immediate availability of a pose estimate for each actor in the scene. PC service providers (e.g., The Imaginarium Studios, London UK) have begun to harness this technology to pre-visualize the appearance of digital characters for movie or game production during live actor performance. Such facilities provide immediate visual feedback to both the actor and director onset removing the trial and error and so improving efficiency in the capture process (Fig. 1, left). Other forms of passive PC in regular use include the fractal suits patented by Industrial Light and Magic for full body motion capture (Fig. 1, right). The suits are tracked using visible light and so are more amenable to deployment in outdoor sets where strong natural light makes IR impractical. Active marker-based systems include offerings from CodaMotion (UK) and PhaseSpace (US). Markers are bright IR light-emitting diodes (LEDs) that pulse with unique signatures that identify the marker to one or several observing cameras. Since markers are uniquely labeled at source, automated tracking of markers is trivial making marker confusion highly unlikely. By contrast, the labeling of triangulated markers in a passive system is performed during pose inference and may be incorrect in the presence of clutter (e.g., multiple occluding actors). Marker mislabeling causes errors in pose estimation, which can only be removed through addition of more witness cameras so reducing the chance of occlusion or manually correcting the data post-capture. An advantage of active marker-based systems is therefore the need for

4

J. Collomosse and A. Hilton

Fig. 1 Performance capture technologies. Top: Vicon IR-based system being used to pre-visualize character performance in real time within the UNREAL Games engine (EPIC). Bottom: Industrial Light and Magic’s fractal suit enabling visual light-based tracking outdoors

fewer cameras and reduced data correction. Active systems tend to perform better outdoors, again due to obviating the need for large area IR illumination. The disadvantage is the additional expense and time required for actor setup (wires and batteries) due to the complexity of the markers. The workflow to produce a skeletal pose estimate from active marker data is identical to passive systems, since the capture again results in a sequence of 3D point movements. Inertial motion capture system uses inertial motion-sensing units (IMUs) to detect changes in joint orientation and movement, providing an alternative to visual

Real-Time Full Body Motion Control

5

tracking and so removing the problem of marker occlusion. IMUs are worn on each limb (around 12–14 for full body capture) and connected wirelessly to a hub which forwards the data for software processing. Common IMU captures solutions include AnimeZoo (UK), XSens (Netherlands), and most recently the crowd-funded PerceptionNeuron (US) system. All of these solutions again rely upon a robust back-end software product to infer a skeletal pose estimate using physical and kinematic constraints. The disadvantage of inertial capture is drift, since the IMUs output only a stream of relative joint angles. For this reason, IMU mocap is sometimes combined with a secondary modality, e.g., laser ranging or passive video, to capture the world position of the actor. An emerging form of PC is marker-less mocap, using Computer Vision to track the actor without the need for wearables. Although the accuracy of commercial marker-less systems has yet to reach parity with marker-based solutions, the greatly reduced setup time and flexibility to use only regular video cameras for capture makes such systems a cost-effective option. For the purposes of teaching data-driven animation production, marker-less technologies are therefore attractive. Solutions include the OrganicMotion stage (US), a cube arrangement of around 20 machine vision cameras that calculates human pose using the silhouette of the performer against a uniform background from the multiple camera angles. More recently The Captury (Germany) launched a software-only product for skeletal PC that estimates pose against an arbitrary background using a possibly heterogeneous array of cameras. Yet although commercial solutions to marker-less PC remain in their infancy, academic research is making good progress as we next discuss.

Marker-Less Human Motion Estimation Passive estimation of human pose from video is a long-standing Computer Vision challenge, particularly when visual fiducials (markers) are not present. Methods can be partitioned into those considering monocular (single-view) video or multiple viewpoint video.

Monocular Human Pose Estimation Human pose estimation (HPE) often requires the regions of interest (ROIs) representing people to be identified within the video. This person localization problem is can be solved using background (Zhao and Nevatia 2003) or motion (Agarwal and Triggs 2006) subtraction, in the cases of simple background. In more cluttered scenarios, supervised machine can be applied to detect the presence of a person within a sliding window swept over the video frame. Within each position of the window, pretrained classifiers based on Histogram of Gradient (HoG) descriptors can robustly identify the torso (Eichner and Ferrari 2009), face (Viola and Jones 2004), or entire body (Dalal and Triggs 2005). Once the subject is localized within the frame, the majority of monocular HPE algorithms attempt to infer only a 2D, i.e., apparent pose of the performer. These adopt either (a) top-down fitting of a person model, optimizing limb parameters and

6

J. Collomosse and A. Hilton

projecting to image space to evaluate correlation with image data, or (b) individually segmenting parts and integrating their positions in a bottom-up manner to produce a maximal likelihood pose. Bottom-up approaches dominated early research into HPE, over one decade ago. Srinivasan and Shi (2007) used an image segmentation algorithm (graph cut) to parse a subset of salient shapes from an image and group these into a shape resembling a person using a set of learned rules. However the approach was limited to a single person, and background clutter was reported to interfere with the initial segmentation and so the eventual accuracy of the approach. Ren et al. proposed an alternative algorithm in which Canny edge contours were recursively split into segments, each of which was classified as a putative body part using shape cues such as parallelism (Ren et al. 2005). Ning et al. (2008) similarly attempted to label body parts individually, applying a Bag of Visual Words (BoVW) framework to learn codewords for body zone labeling – segmenting 2D body parts to infer pose. Mori and Malik described the first bottom-up algorithm capable of estimating a 3D pose in world space, identifying the position of individual joints in a 2D image using scale and symmetry constraints, and then matching those 2D joint positions to a set of many “training images” each of which had been manually annotated a priori with 2D joint positions (Mori et al. 2004) and was associated also with a 3D ground truth. Once the closest training image had been identified by matching query and training joint positions in 2D, the relevant 3D pose was returned as the result. Top-down approaches, in which the entire 2D image is used as evidence to fit a model, are more contemporary. The most common form of model fitted to the image is a “pictorial structure,” essentially a collection of 2D limbs (regions) articulated by springs, that can be iteratively deformed to fit to evidence in the image under an optimization process (Andriluka et al. 2009; Eichner and Ferrari 2009). However such approaches do not yield recover a 3D pose estimate or if so are unstable due to ambiguity in reasoning from a single image.

Multi-View Human Pose Estimation A 3D estimate of human pose may be inferred with less ambiguity using footage captured from multiple viewpoints. In such a setup, a configuration of cameras (typically surround a subject in a 180 or 360 arc) observes a capture volume within which a performance is enacted. The cameras are typically calibrated, i.e., for a subject observed by C camera views c = [1,C] the extrinsic parameters {Rc,COPc}  (camera orientation and focal point) and intrinsic parameters f c , oxc , oyc (focal length and 2D optical center) are known. Two categories of approach exist: (a) those estimating 2D pose from each view independently and fusing these to deduce a 3D pose and (b) those inferring a 3D pose from 3D geometric proxy of the performer recovered through volumetric reconstruction. Computer Vision has undergone a revolution in recent years, with deep convolutional neural networks (CNNs) previously popular in text recognition being extended and applied to solve many open problems including human pose

Real-Time Full Body Motion Control

7

Fig. 2 Convolutional neural networks (CNNs) used for pose estimation in multi-viewpoint video. (a) Using 2D detections of body parts fused in a 3D probabilistic model (from (Elhayek et al. 2015)), (b) recognition of pose from 3D volumetric data recovered from multiple views (from (Trumble et al. 2016))

estimation. CNNs have shown particularly strengths in general object detection, with some state-of-the-art networks, e.g., GoogLeNet (Google Inc.), surpassing human performance in certain scenarios. Most recently CNNs have also been used to detect human body parts in single and multiple viewpoint video and infer from these human pose. Elhayek et al. (2015) estimate human body parts from individual video viewpoints using CNN detectors and then fuse these under a probabilistic model fusing color and motion constraints from a body part tracker to create a 3D pose estimate. The CNN detection step is robust to clutter, making the system suitable for estimation of 3D pose in complex scenes including outdoors (Fig. 2a). In volumetric approaches, a geometric proxy of the performer is built using a visual hull (Grauman et al. 2003) computed from foreground mattes extracted each camera image Ic using a chroma key or more sophisticated image segmentation algorithm. To compute the visual hull, the capture volume is coarsely decimated into

8

J. Collomosse and A. Hilton

a set of voxels at locations V = {V1, . . ., Vm}; a resolution of 1 cm3 is commonly used for a capture volume of approximately 6  2  6 m. The probability of the voxel being part of the performer in a given view c is: pðVj cÞ ¼ BðI c ðx½V i , y½V i ÞÞ,

(1)

where B(.) is a simple blue dominance term derived from the RGB components of B I c ðx, yÞ, i:e:1  RþGþB and (x, y) is the point within Ic that Vi projects to: x½V i  ¼

f vy f c vx þ oxc and y½V i  ¼ c þ oyc , vz vz   vx vy vz ¼ COPc  R1 c Vi:

where,

(2) (3)

The overall probability of occupancy for a given voxel p(V ) is:   C pðV i Þ ¼ ∏ 1= 1 þ epðV j cÞ : i¼1

(4)

We compute p(Vi) for all Vi  V to create a volumetric representation of the performer for subsequent processing. An iso-contour extraction algorithm such as marching cubes (Lorensen and Cline 1987) is used to extract a triangular mesh model from the voxel-based visual hull (Fig. 3). The result is a topologically independent 3D mesh for each frame of video. This can be converted into a so-called “4D” representation using a mesh tracking process to conform these individual meshes to a single mesh that deforms over time (Budd et al. 2013). Once obtained, it is trivial to mark up a single frame of the performance to embed a skeleton (e.g., marking each joint limb as an average of subsets of mesh vertices) and have the skeleton track with the performance as the mesh deforms. As we explain in subsection “Concatenative Synthesis,” either the skeletal or surface representations from such a 4D performance capture may be used to drive character animation interactively. CNNs have also been applied to volumetric approaches, with a spherical histogram (c.f. subsection “Surface Motion Graphs”) derived from the visual hull being fed into a CNN to directly identify human pose (Trumble et al. 2016). The system contrasts with Elhayek et al. (2015) where the CNN operates in 2D rather than 3D space and similarly adds robustness to visual clutter in the scene.

Interactive Character Animation Interactive character animation often takes place within complex digital environments, such as games, in which multiple entities (characters, moveable objects, and static scene elements) interact continuously. Since these interactions are a function of user input, they cannot be predicted or scripted a priori, and enumerating all possible

Real-Time Full Body Motion Control

9

Fig. 3 4D performance capture. Multiple video views (top) are fused to create a volumetric representation of the performance which is meshed (bottom). The per-frame meshes are conformed to a single deforming mesh over time, into which a skeleton may be embedded and tracked (right)

eventualities is intractable. It is therefore necessary to plan animation in real time using fast, online algorithms (i.e., algorithms using data from the current and previous timesteps only). Two distinct categories of algorithm exist. First, algorithms drawing upon pre-supplied database of motion for the character, usually obtained via PC and/or manual scripting. Several fragments of motion data (“motion fragments”) are stitched and blended together to create a seamless piece animation. A trivial example is a single cycle of a walk, which can be repeatedly concatenated to create a character walking forward in perpetuity. However more complex behavior (e.g., walks along an arbitrary path) can be created by carefully selecting and interpolating between a set of motion fragments (e.g., three walk cycles, one veering left, one veering right, and one straight-ahead) such that no jarring movement occurs. This form of motion synthesis, formed by concatenating (and in some cases interpolating between) several motion fragments, is referred to as “concatenative synthesis.” The challenge is therefore in selecting and sequencing appropriate motion fragments to react to planning requirements (move from A to B) under environmental (e.g., occlusion) and physical (e.g., kinematic) constraints. This is usually performed via a graph optimization process, with the motion fragments and valid transitions between these encoded in the nodes and edges of a directed graph referred to as a “move tree” or “motion graph” (Kovar et al. 2002). The key advantages of a motion graph are predictability of movement and artistic control over the motion fragments that are challenging to embody within a physical simulation. The disadvantage is that motion cannot generalize far beyond the motion fragments, i.e., character movement obtained via PC in the studio. We discuss concatenative synthesis in detail within subsection “Concatenative Synthesis.”

10

J. Collomosse and A. Hilton

Second, algorithms that do not require prescripted or directly captured animation but instead simulate the movement under physical laws. Physics simulation is now commonly included within games engines (e.g., Havoc, PhysX) but used primarily to determine motion of objects or particles or animation of secondary characteristics such as cloth attached to characters (Armstrong and Green 1985). Yet more recently, physics-based character animation has been explored integrating such engines into the animation loop of principal characters (Geijtenbeek et al. 2010). Physics-based simulation offers the significant advantage of generalization; characters modeled in this manner can react to any situation with the virtual world and are not bound to a database on preordained movements. Nevertheless, the high computational cost of simulation forces accuracy-performance trade-offs for real-time use. Simplifying assumptions such as articulated rigid bodies for skeletal structure is very common. It is therefore inaccurate to consider physically simulated animation as being more “natural”; indeed the tendency of simulation to produce “robotic” movements lacking expressivity has limited practical uptake of these methods for interactive character animation until comparatively recently. We briefly discuss physics-based character control in the next section, restricting discussion to the context of real-time animation for interactive applications. A detailed discussion of physics-based character animation in a broader context can be found in chapter C-2.

Real-Time Physics-Based Simulation Physically simulated characters are usually modeled as a single articulated structure of rigid limb components, interconnected by two basic forms of joint mimicking anatomy in nature. Characters modeled under physical simulation are typically humanoid (Hodgins 1991; Raibert and Hodjins 1991) or animal (Wampler and Popovic 2009) consisting predominantly of hinge joints, with hip and shoulder joints implemented as ball-socket joints. Depending on the purpose of the simulation, limbs may be amalgamated for computational efficiency (e.g., a single component for the head, neck, and torso) (Tin et al. 2008). More complex simulations can include sliding joints in place of some hinge joints that serve to model shock absorption within the ligaments of the leg (Kwon and Hodgins 2010).

Character Model Actuation The essence of the physical simulation is to solve for the forces and torques that should be applied to each limb, in order to bring about a desired motion. This solve is performed by a “motion controller” algorithm (subsection “Character Motion Control”). The locations at each limb where forces are to be applied are a further design consideration of the modeler. The most common strategy is to consider torque about each joint (degree of freedom), a method known as servo actuation. While intuitive, servo actuation is not natural – effectively assuming each joint to contain a motor capable of rotating its counterpart – careful motion planning is necessary to guard against unnatural motion arising under this simplified model.

Real-Time Full Body Motion Control

11

Biologically inspired models include simulated muscles that actuate through tendons attached to limbs, effecting a torque upon the connected joints. Muscleactuated models are more challenging to design motion controllers for, since the maximum torque that can be applied by a muscle is limited by the turning moment of the limb which is dependent on the current pose of the model. Furthermore the number of degrees of freedom in such models tends to be higher than servo-actuated models, since muscles tend to operate in an antagonist manner with a pair of muscles per joint enabling “push” and “pull” about the joint. Moreover such models cannot be considered natural unless the tendons themselves are modeled as nonrigid structures, capable of stretch and compressing to store and release energy in the movement. The high computational complexity of motion controllers to solve for muscle-actuated models therefore remains a barrier to their use in real-time character animation, whose applications to digital entertainment (rather than say, biomechanics) rarely require biologically accurate simulation. We therefore do not consider them further in this chapter.

Character Motion Control Use cases for character animation rarely demand direct, fine-grain control of each degree of freedom in the model. Rather, character control is directed at a higher level, e.g., “move from A to B at a particular speed, in a particular style.” Such directives are issued by game AI, narrative, or other higher level controllers. Motion controllers are therefore a mid-layer component in the control stack bridging the semantic gap between high-level control and low-level actuation parameters. In interactive scenarios, simple servo-based actuation (i.e., independent, direct control over joint torques) is adopted to ensure computation of the mapping is tractable in real time. Solving for the movement is performed iteratively, over many small timesteps, each incorporating feedback supplied by the physics engine from each actuation of the model at the previous timestep under closed-loop control. This obviates the need to model complex outcomes of movements within the controller itself. Feedback comprises not only global torso position and orientation but also individual joint orientation and velocity post-simulation of the movement. It is common for controllers to reason about the stability (balance) of the character when planning movement. The center of mass (COM) of the character should correspond to the zero-moment point (ZMP), i.e., the point at which the reaction force from the world surface results in a zero net moment. When the COM and ZMP coincide, the model is stable. We outline two common strategies to motion control that are applicable to physically based real-time interactive character animation. Control in Joint-Space via Pose Graphs Some of the earliest animation engines comprised carefully engineered software routines, to procedurally generate motion according to mechanics models embedded within kinematics solvers and key-framed poses. This approach is derived from real-time motion planning in the robotics literature. Such approaches model the desired end-position of a limb (or limbs) as a “keypose.” Using a kinematics engine, the animation rig (i.e., joint angles) is gradually

12

J. Collomosse and A. Hilton

adjusted to bring the character’s pose closer to that desired key-pose. The adjustment is an iterative process of actuation and feedback from the environment to determine the actual position of the character and subsequent motions. For example, the COM and ZMP as well as the physical difference between the current and intended joint positions are monitored to ensure that the intended motion does not unbalance the character unduly and that progress is not impeded by itself or other scene elements. A sequence of such key-poses is defined within a “Pose Space Graph” (PSG), where the nodes in the graph are procedurally defined poses, i.e., designed by the animator, but the movements between poses are solved using an inverse kinematics engine (IK). A motion, such as a walk, is performed by transitioning through states in the PSG (Fig. 4 illustrates a walk cycle in a PSG). Due to physical or timing constraints, a character often will not reach a desired pose within the PSG before being directed toward the next state. Indeed it is often unhelpful for the character to decelerate and pause (i.e., obtain ZMP of zero) and become stable at a given state before moving on the next; a degree of perpetual instability, for example, exists within the human walk cycle. Therefore key-poses in the PSG are often exaggerated on the expectation that the system will approximate rather than interpolate the key-poses within it. The operation of PSGs is somewhat analogous to motion graphs (c.f. subsection “Skeletal Motion Graphs”), except that IK is used to plan motion under physical models, rather than pre-captured performance fragments concatenated and played back. Control via Machine Learned Models Although expensive to train, machine learning approaches offer a run-time efficiency unrivaled by other real-time motion controller strategies. Most commonly, neural networks (NN) are used to learn a mapping between high-level control parameters and low-level joint torques, rather than manually identified full body poses (subsection “Character Motion Control”). Notwithstanding design of the fitness function of the network and its overall architecture, the training process itself is fully automatic using a process trial and error via feedback from the physics simulation. A further appeal of such approaches is that such training is akin to the biological process of learned actuation in nature. Networks usually adopt a shallow feed-forward network such as a multi-layer perceptron (MLP) (Pollack et al. 2000), although the growing popularity of deeply learned networks has prompted some recent research into their use as motion controllers (Holden et al. 2016). Training the MLP proceeds from an initially randomized (via Gaussian or “white” noise) set of network weights, using a function derived from some success metric typically the duration that the controlled model can execute movement (e.g., walk) for without destabilizing and falling over. Many thousands of networks (weight configurations) are evaluated to drive character locomotion, and the most successful networks are modified and explored further in an iterative optimization process to train the network (Sims 1994). Optimization of the NN weights is commonly performed by an evolutionary algorithm (EA) in which populations of network configurations (i.e., sets of weights) are evaluated in batches. The more successful configurations are propagated with the

Real-Time Full Body Motion Control

13

Fig. 4 Physically based interactive character animation. (a) Pose Space Graph used to drive highlevel goals for kinematics solvers which direct joint movements (from (Laszlo and Fiume 1996)); (b) ambulatory motion of a creature and person learned by optimization processes mimicking nature (from (Holden et al. 2016; Sims 1994)), respectively

subsequent batch and spliced with other promising configurations, to produce batches of increasingly fitter networks (Yao 1999). In complex networks with many weights and complex movements, it can be challenging for EAs to converge during training. In such cases, weight configurations for the NN can be bootstrapped by training the same network over simpler problems. This improves up white noise initialization for more complex tasks. In practice, training a NN can take tens of thousands of iterations to learn an acceptable controller (Sims 1994) for even very simple movements. Yet, once learned the controller is trivial to evaluate quickly and can be readily deployed into a real-time system. Even with bootstrapped training however, NN cannot learn complex movement, and it was not until the recent advent of more sophisticated (deeper) NNs that locomotion of a fully articulated body was

14

J. Collomosse and A. Hilton

demonstrated using an entirely machine-learned motion controller (Holden et al. 2016).

Concatenative Synthesis Motion concatenation is a common method for synthesizing interactive animation without the complexity and computational expense of physical simulation. In a concatenative synthesis pipeline, short fragments of motion capture are joined (and often blended) together to create a single seamless movement. In the simplest example, a single walk cycle may be repeated with appropriately chosen in-out points to create a perpetual gait. A more complex example may concatenate walk cycles turning slightly left, slightly right, or advancing straight-ahead to create locomotion along an arbitrary path.

Skeletal Motion Graphs Concatenative synthesis is dependent on the ability to seamlessly join together pairs of pre-captured motion fragments – subsequences of performance capture – to build complex animations. An initial step when synthesizing animation is therefore to identify the transition points within performance captured footage, at which motion fragments may be spliced together. Typically the entire capture (which may in practice consist of several movements, e.g., walking, turning) is considered as a single long sequence of t = [1, N] frames, and pairs of frames {1..N, 1..N} identified that could be transitioned between to without the view perceiving a discontinuity. A measure of similarity is defined, computable from and to any time instant in the sequence, and that measure thresholded to identify all potential transition points. Figure 5 visualizes both the concept and an example of such a comparison computed exhaustively over all frames of a motion capture sequence – brighter cells indicating closer matching frame pairs. Measures of Pose Similarity Pose similarity measures (which, in practice, often compute the dissimilarity between frames) should exhibit three important properties: 1. Be invariant to global rotation and translation – similar poses should be identified as similar regardless of the subject’s position in world space at both time instants. Otherwise, few transition points will be detected. 2. Exhibit spatiotemporal consistency – poses should not only appear similar at the pair of time instants considered but also move similarly. Otherwise, motion will appear discontinuous. 3. Reflect the importance of certain joints over others. Otherwise, a difference in position of, e.g., a finger might outweigh a difference in position of a leg. Common similarity measures include direct comparison of joint angles (in quaternion form) or, more commonly, direct comparisons of limb spatial position in 3D. A set of 3D points p1..m is computed either from limb end-points or from the vertices of a coarse mesh approximating the form of the

Real-Time Full Body Motion Control

15

Hit

Stand

Walk

Jog

Fig. 5 An example of an animation (top) generated by a motion graph (left) comprising four actions. A visualization of the inter-frame distance comparison used to compute a motion graph (right)

model and a sum of squared differences used to evaluate the dissimilarity D( p, p0 ) between point sets from a pair of frames p and p0 at times t and t0 , respectively: Dðp, p0 Þ ¼ minθ, x0 , z0

m X i¼1

 2 ωi pi  Mθ, x0 , z0 p0i  :

(5)

where |.| is the Euclidean distance in world space, pi is the ith point in the set, and M is a Euclidean transformation best aligning the two sets of point clouds, via a translation on the ground plane (z0, z0) and a rotation θ about the vertical ( y) axis – so satisfying property (1). In order to embed spatiotemporal coherency (2), the score is computed over point sets not just from a given pair of times {t,t0 } but for a k frame window t  2k , t þ 2k . This is effectively a low-pass filter over time and explains the blurred appearance of Fig. 5 (right). For efficiency, pair-wise scores are computed and the resulting matrix low-pass filtered. The relative importance of each point (associated with the limb from which the point was derived) is set manually via ωi satisfying property (3). Motion Graph Construction Local thresholding is applied to the resulting similarity matrix, identifying nonadjacent frames (t,t0 ) that could be concatenated together to produce smooth transitions according to properties (1–3). For example, if the mocap sequence contains several cycles of a walk, it is likely that corresponding points in the walk cycles (e.g., left foot down at the start of each cycle) would be identified as transitions. Playing one walk cycle to this time t and then “seeking” forward or backward by several hundred frames to the corresponding time t0 in another walk cycle will not produce a visual discontinuity despite the nonlinear temporal playback.

16

J. Collomosse and A. Hilton

The “in” (t) and “out” (t0 ) frames of these transition points are identified and represented as nodes in a graph structure (the motion graph). Edges in the graph correspond to clips of motion, i.e., motion fragments between these frames in linear time. Additional edges are introduced to connect the “in” and “out” frames of each transition. Usually the pose of the performer at the “in” and “out” points differs slightly, and so this additional edge itself comprises a short sequence of frames constructed by interpolating the poses at “in” and “out,” respectively, e.g., using quarternion-based joint angle interpolation. Motion Path Optimization Random walks over the motion graph representation can provide a useful visualization to confirm that sensible transitions have been identified. However interactive character control requires careful planning of routes through the motion graph, to produce movement satisfying constraints the most fundamental of which are the desired end pose (and position in the world, pv), the distance that the character should walk (dv), and the time it should take the character to get there (tv). Under the motion graph representation, this corresponds to computing the optimal path routing us from the current frame of animation (i.e., the current motion capture frame being rendered) to a frame corresponding to the target key-pose elsewhere in the graph. Since motion graphs are often cyclic, there is potentially unbounded number of possible paths. The optimal path is the one minimizing a cost function, expressed in terms of these four animation constraints (Ctrans, Ctime, Cdist, and Cspatial): CðPÞ ¼ Ctrans ðPÞ þ ωtime  Ctime ðPÞ þ ωdist  Cdist ðPÞ þ ωspace  Cspace ðPÞ:

(6)

Studying each of these terms in turn, the cost of a path P is influenced by Ctrans, reflecting the cost of all performing animation transitions along the path P. Writing the sequence of Nf edges (motion fragments) along this path as {fj} where j = [1, Nf], this cost is a summation of the cost of transitioning at each motion graph node along that path: Ctrans ðPÞ ¼

X j¼1

  N f  1D f j 7! f jþ1 ,

(7)

where D ( fj 7! fj+1) expresses the cost of transitioning from the last frame of fj to the first frame of fj+1, computing by measuring the alignment of their respective point clouds p and p0 via D( p, p0 ) (Eq. 5). The timing cost Ctime(P) is computed as the absolute difference between the target time tv for the animation sequence and the absolute time time(P) taken to transition along the path P: Ctime ðPÞ ¼ jtimeðPÞ  tv j:

timeðPÞ ¼ N f  Δt,

(8)

1 where Δt is the time take to display a single frame of animation, e.g., Δt ¼ 25 for 25 frames per second.

Real-Time Full Body Motion Control

17

Similarly, the cost Cdist( p) is computed as the absolute difference between the target distance dv for the character to travel and the absolute distance traveled dist(P) computed by summing the distance traveled for each frame comprising P. Cdist ðPÞ ¼ jdistðPÞ  tv j:

distðPÞ ¼

X j¼1

       N f  1P f j  P f jþ1 ,

(9)

where P is a 2D projection operation, projecting the 3D points clouds p and p0 corresponding to the end frame of fj and start frame of fj+1, respectively, to the 2D ground (x  z) plane and computing the centroid. The final cost Cspatial is computed similarly via centroid projection of the animation end-point, penalizing a large distance between the target end-point of the character and end-point arising from the animation described by P.       Cspace ðPÞ ¼ P f N f 1  pv :

(10)

The three parameters ωtime, ωdist, and ωspace are normalizing weights typical values of which are ωtime = 1/10, ωdist = 1/3, and ωspace = 1 (Arikan and Forsyth 2002). Finding the optimal path Popt for a given set of constraints C. . . is found by minimizing the combined cost function C(P) (Eq. 6): Popt ¼ argmin C ðPÞ: P

(11)

An efficient approach using integer programming to search for the optimal path that best satisfies the animation constraints can be found in Huang et al. (2009) and is capable of running in real time for motion fragment datasets of several minutes. Note that Ctrans is be precomputed for all possible motion fragment pairs, enabling run-time efficiencies – the total transition cost for a candidate path P is simply summed during search.

Surface Motion Graphs Surface motion graphs (SMGs) extend the skeletal motion graph concept beyond joint angles, to additionally consider the 3D volume of the performer. This is important since the movement of 3D surfaces attached to the skeleton (e.g., hair or flowing clothing) is often complex, and simple concatenation of pre-animated or captured motion fragments without considering the movement of this surface geometry can lead to visual discontinuities between motion fragments. Consideration of surfaces, rather than skeletons, requires the motion graph pipeline to change only in two main areas. First, the definition of frame similarity, i.e., Eq. 5 must be modified to consider volume rather than joint positions. Second, the algorithm for interpolating similar frames to create smooth transitions must be substituted for a surface interpolation algorithm.

18

J. Collomosse and A. Hilton

Fig. 6 Visualization of a spherical histogram computed from a character volume. Multiple video views (left) are combined to produce a volumetric estimate of the character (middle) which is quantized into a spherical (long-lat) representation at multiple radii from the volume centroid

3D Shape Similarity To construct a SMG, an alternative measure of frame similarity using 3D surface information is adopted, reflecting the same three desirable properties of similarity measures outlined in subsection “Skeletal Motion Graphs.” A spherical histogram representation is calculated from the 3D character volume within the frame. The space local to the character’s centroid is decimated into sub-volumes, divided by equispaced lines of longitude and latitude – so yielding a 2D array encoding the histogram that encodes the volume occupied by the character. Spherical histograms are computed over a variety of radii, as depicted in Fig. 6 (right) yielding a three-dimensional stack of 2D spherical histograms. The SMG is computed as with skeletal motion graphs, through an optimization process that attempts to align each video to every other – resulting in a matrix of similarity measurements between frames. The similarity between the spherical histograms Hr (.) at radius r of the 3D character meshes Qa and Qb is computed by: DðQa , Qb Þ ¼ min ϕ

R 

 1X ωr H r ðQa , 0Þ  H r Qb , ϕ , R r¼1

(12)

where H(x, φ) indicates a spherical histogram computed over a given mesh x, rotated about the y axis (i.e., axis of longitude) by φ degrees. In practice this rotation can be performed by cycling the columns of the 2D histogram obviating any expensive geometric transformations; an exhaustive search across φ = [0, 359] degrees is recommended in Huang et al. (2009). The use of the model centroid, followed by optimization for φ, fulfills property (1), i.e., rotational and translational invariance in the comparison. The resulting 2D matrix of inter-frame comparisons is low-pass filtered as before to introduce temporal coherence, satisfying property (2). Weightings set for each radial layer of the spherical histogram ωr weight the importance of detail as distance increases, satisfying (3). Transition Generation Due to comparatively high number of degrees of freedom on a 3D surface, it is much more likely that the start and end frames of a pair of

Real-Time Full Body Motion Control

19

motion fragments fj and fj+1 selected on an optimal path Popt will not exactly match. To mitigate any visual discontinuities on playback, a short transition sequence is introduced to morph the former surface (Sj) into the latter (Sj+1). This transition sequence is substituted in for small number of frames (L ) before and after the transition point. Writing this time interval k = [L, L], a blending weight α(k) is computed: αðkÞ ¼

kþL , 2L

(13)

and a nonlinear mesh blending algorithm (such as the Laplacian deformation scheme of Tejera et al. (2013)) is applied to blend S j 7! S j+1 weighted by al pha(k).

Parametric Motion Graphs Parametric motion graphs (PMG) extend classical motion graphs (subsection “Skeletal Motion Graphs”) by considering not only the concatenation but also the blending of motion fragments to synthesize animation. A simple example is a captured sequence of a walk and a run cycle. By blending these two motion fragments together, one can create a cycle of a walk, a jog, a run, or anything in-between. Combined with the concatenation of cycles, this leads to a flexibility and storage efficiency not available via classic methods – a PMG requires only a single example of each kind of motion fragment, whereas a classical approach would require pre-captured fragments of walks and runs at several discrete speeds. Parametric extensions have been applied to both skeletal (Heck and Gleicher 2007) and surface motion graphs (Casas et al. 2013). Provided a mechanism exists for interpolating a character model (joint angles or 3D surface) between two frames, the method can be applied. Without loss of generality, we consider surface motion graphs (SMGs) here. SMGs assume the availability of 4D performance capture data, i.e., a single 3D mesh of constant topology deforming over time to create character motion (subsection “Multi-View Human Pose Estimation”). We consider a set of N temporally aligned 4D mesh sequences Q = {Qi(t)} for i = [1, N] motion fragments. Since vertices are in correspondence, it is possible to interpolate frames from such sequences directly by interpolating vertex positions in linear or piecewise linear form. We define such an interpolating function b(Q, w) yielding an interpolated mesh QB(t, w) at time t given a vector of weights w expressing how much influence each of the meshes from the N motion fragments at time t should contribute to that interpolated mesh. QB ðt, wÞ ¼ bðQ, wÞ,

(14)

where w = {wi} for normalized weights w  [0, 1] that drives a mesh blending function b(.) capable of combining meshes at above 25 frames per second for interactive character animation.

20

J. Collomosse and A. Hilton

Fig. 7 Three animations each generated from a parametric motion graph (Casas et al. 2013). First row: walk to run (speed control). Second row: short to long horizontal leap (distance, i.e., leap length control). Third row: short to high jump (height control). Final row: animation from a parametric motion graph embedded within an outdoor scene under interactive user control (Casas et al. 2014) (time lapse view)

Several steps are necessary to deliver parametric control of the motion fragments: time warping to align pairs of mesh sequences (which may different in length) so that they can be meaningfully interpolated, the blending function b(.) to perform the interpolation, and mapping between high-level “user” parameters from the motion controller to low-level blend weights w. Considerations such as path planning through the graph remain, as with classical motion graphs, but must be extended since the solution space now includes arbitrary blendings of motion fragments, as well as the concatenated of those blended fragments. Exhaustively searching this solution space is expensive, motivating real-time methods to make PMG feasible for interactive character animation.

Real-Time Full Body Motion Control

21

Mesh Sequence Alignment Mesh sequences are aligned using a continuous time warping function t = f(tu) where the captured timebase tu is mapped in nonlinear fashion to a normalized range t = [0, 1] so as to align poses. The technique is described in Witkin and Popovic (1995). Although coarse results are obtainable without mesh alignment, failure to properly align sequences can lead to artifacts such as foot skate. Real-Time Mesh Blending Several interpolation schemes can be employed to blend a pair of captured poses. Assuming a single mesh has been deformed to track throughout the 4D performance capture source data (i.e., all frames have constant topology), a simple linear interpolation between 3D positions of corresponding vertices is a good first approximation to a mesh blend. Particularly in the presence of rotation, such approximations yield unrealistic results. A highquality solution is to use differential coordinates, i.e., a Laplacian mesh blend (Botsch and Sorkine 2008); however solution of the linear system comprising a 3v  3v matrix of vertex positions, where v is of the order 105, is currently impractical for interactive animation. Therefore a good compromise can be obtained using a piecewise linear interpolation (Casas et al. 2013), which precomputes offline a set of nonlinear interpolated meshes (e.g., via Botsch and Sorkine (2008)), and any requested parametric mesh is produced by weighted linear blending of the closest two precomputed meshes. The solution produces more realistic output, in general, that linear interpolation at the same computational cost. High-Level Control High-level parametric control is achieved by learning a mapping function f(w) between the blend weights w and the high-level motion parameters p, e.g., from the motion controller. A mapping function w = f1( p) is learned from the high-level parameter to the blend weights required to generate the desired motion. This is necessary as the blend weights w do not provide an intuitive parameterization of the motion. Motion parameters p are high-level user-specified controls for a particular class of motions such as speed and direction for walk or run and height and distance for a jump. The inverse mapping function f1 from parameters to weights can be constructed by a discrete sampling of the weight space w and evaluation of the corresponding motion parameters p. Parametric Motion Planning PMGs dispense of the notion of precomputed transition points, since offline computation of all possible transition and blend possibilities between, e.g., a pair of mesh sequences would yield an impractical number of permutations to permit real-time path finding. We consider instead a continuous “weight-time” space with the weight modeling the blend between one mesh sequence (e.g., a walk) and another (e.g., a run). We consider motion planning as the problem of finding a route through this space, taking us from a source time and pose (i.e., weight combination) to a target time and pose. Figure 8 illustrates such a route finding process. The requirement for smooth motion dictates we may only modify the weight or time in small steps, yielding to a “fanning out” or trellis of possible paths from the source point in weight-time space. The

22

J. Collomosse and A. Hilton

ws 1

0

wd

Qsi

Msj 0

Qdi

1

t

1

source parametric space

0

Mdj t

0

1

target parametric space

Fig. 8 Real-time motion planning under a parametric motion graph. Routes are identified between trellises fanned out from the source pose Qs and end pose Qd. The possible (red) and optimal (green) paths are indicated (illustration only)

optimal path Popt between two parametric points in that space is that minimizing a cost function balancing mesh similarity ES(P) and time taken, i.e., latency EL(P) to reach that pose: Popt ¼ argmin ES ðPÞ þ λEL ðPÞ, PΩ

(15)

where λ defines the trade-off between transition similarity and latency. The transition path P is optimized over a trellis of frames as in Fig. 7 starting at frame Qs(ts, ws) ending at Qd(td wd) where Qs and Qd are interpolated meshes (Eq. 14). The trellis is sampled forward and backward in time at discrete intervals in time Δt and parameters Δw up to a threshold depth in the weight-time space. This defines a set of candidate paths P comprising the transitions between each possible pair of frames in the source and target trellis. For a candidate path P, the latency cost EL(P) is measured as the number of frames in the path P between the source and target frames. The transition similarity cost ES(P) is measured as the similarity in mesh shape and motion at the transition point between the source and target motion space for the path P, computable via Eq. 12 for mesh data (or if using purely skeletal mocap data, via Eq. 5). Casas et al. (2012) proposed a method based on precomputing a set of similarities between the input data and interpolating these at run-time to solve routing between the two trellises at interactive speeds. Figure 7 provides examples of animation generated under this parametric framework.

Summary and Future Directions This chapter has surveyed techniques for interactive character animation, broadly categorizing these as either data driven or physical simulation based. Arguably the major use cases for interactive character animation are video games and immersive

Real-Time Full Body Motion Control

23

virtual experiences (VR/AR). In these domains, computational power is at a premium – developers must seek highly efficient real-time algorithms, maintaining high frame rates (especially for VR/AR) without compromising on animation quality. This has led interactive animation to trend toward data-driven techniques that sample, blend, and concatenate fragments of performance capture rather than spend cycles performing expensive online physical simulations. This chapter has therefore focused upon synthesis techniques that sample, concatenate, and blend motion fragments to create animation. The chapter began by surveying commercial technologies and academic research into performance capture. Although commercial systems predominantly focus upon skeletal motion capture, research in 4D performance capture is maturing toward practical solutions for simultaneous capture of skeletal and surface detail. The discussion of motion graphs focused upon the their original use for skeletal data and their more recent extensions to support not only 4D surface capture but also their parametric variants that enable blending of sequences in addition to their concatenation. Physical simulation-based approaches for character animation were examined within the context of interactive animation, deferring broader discussion of this topic to chapter C-2. Open challenges remain for interactive character animation, particularly around expressivity and artistic control. Artistic directors will often request editing of animation to move in a particular style (jaunty, sad), adjustments that can be performed manually by professional tools such as Maya (Autodesk) or MotionBuilder (Vicon) but that cannot be applied automatically in an interactive character animation engine. While work such as Brand et al.’s Style Machines (Brand and Hertzmann 2000) enables stylization of stand-alone skeletal mocap sequences, algorithms have yet to deliver the ability to modulate animation interactively, e.g., to react to emotional context in a game. An interesting direction for future research would be to integrate stylization and other high-level behavioral attributes into the motion graph optimization process.

Cross-References ▶ Biped Controller for Character Animation ▶ Blendshape Facial Animation ▶ Data-Driven Character Animation Synthesis ▶ Data-Driven Hand Animation Synthesis ▶ Depth Sensor Based Facial and Body Animation Control ▶ Example-Based Skinning Animation ▶ Eye Animation ▶ Hand Gesture Synthesis for Conversational Characters ▶ Head Motion Generation ▶ Laughter Animation Generation ▶ Physically-Based Character Animation Synthesis ▶ Real-Time Full Body Motion Control ▶ Real-Time Full Body Pose Synthesis and Editing

24

J. Collomosse and A. Hilton

▶ Video-Based Performance Driven Facial Animation ▶ Visual Speech Animation

References Agarwal A, Triggs B (2006) Recovering 3d human pose from monocular images. IEEE Trans PAMI 28(1):44–58 Andriluka M, Roth S, Schiele B (2009) Pictorial structures revisited: people detection and articulated pose estimation. In: Proceedings of computer vision and pattern recognition, IEEE Arikan O, Forsyth D (2002) Synthesizing constrained motions from examples. ACM Trans Graph 21(3):483–490 Armstrong W, Green M (1985) The dynamics of articulated rigid bodies for purposes of animation. Vis Comput 4(1):231–240 Botsch M, Sorkine O (2008) On linear variational surface deformation methods. IEEE Trans Vis Comput Graph 14(1):213–230 Brand M, Hertzmann A (2000) Style machines. In: Proceedings of ACM SIGGRAPH, ACM Press, pp 183–192 Budd C, Huang P, Klaudiny M, Hilton A (2013) Global non-rigid alignment of surface sequences. Int J Comput Vis 102(1):256–270 Casas D, Tejera M, Guillemaut JY, Hilton A (2012) 4d parametric motion graphs for interactive animation. In: Proceedings of Symposium on Interactive 3D Graphics and Games (I3D), IEEE Casas D, Tejera M, Guillemaut JY, Hilton A (2013) Interactive animation of 4d performance capture. IEEE Trans Vis Comput Graph (TVCG) 19(5):762–773 Casas D, Volino M, Collomosse J, Hilton A (2014) 4d video textures for interactive character appearance. In: Proceedings of Computer Graphics Forum (Eurographics 2014), IEEE Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proceedings of Computer Vision and Pattern Recognition, IEEE, vol 3. pp 886–893 Eichner M, Ferrari V (2009) Better appearance models for pictorial structures. In: Proceedings of British Machine Vision Conference (BMVC), IEEE Elhayek A, Aguiar E, Tompson J, Jain A, Pishchulin L, Andriluka M, Bregler C, Schiele B, Theobalt C (2015) Efficient convnet-based marker-less motion capture in general scenes with a low number of cameras. In: Proceedings of computer vision and pattern recognition, IEEE Geijtenbeek T, Bogert AJVD, Basten B, Egges A (2010) Evaluating the physical realism of character animations using musculoskeletal models. In: Proceedings of Conference on Motion in Games, vol 5. Springer-Verlag, Heidelberg, pp 11–22 Grauman K, Shakhnarovich G, Darrell T (2003) A Bayesian approach to image-based visual hull reconstruction. In: Proceedings of CVPR, IEEE Heck R, Gleicher M (2007) Parametric motion graphs. In: Proceedings of Symposium on Interactive 3D Graphics and Games (I3D), IEEE, pp 129–136 Hodgins J (1991) Biped gait transition. In: Proceedings of Conference on robotics and automation, IEEE, pp 2092–2097 Holden D, Saito J, Komura T (2016) A deep learning framework for character motion synthesis and editing. In: Proceedings of ACM SIGGRAPH, ACM Huang P, Hilton A, Starck J (2009) Human motion synthesis from 3d video. In: Proceedings of CVPR, IEEE Kovar L, Gleicher M, Pighin F (2002) Motion graphs. ACM Trans Graph 21(3):473–482 Kwon T, Hodgins J (2010) Control systems for human running using an inverted pendulum model and a reference motion capture sequence. In: Proceedings of Eurographics Symposium on Computer Animation (SCA), Blackwell Laszlo J, Fiume EVD (1996) Limit cycle control and its application to the animation of balancing and walking. ACM Trans on graphics, ACM

Real-Time Full Body Motion Control

25

Lorensen W, Cline H (1987) Marching cubes: a high resolution 3d surface construction algorithm. ACM Comput Graph 21(4):163–169 Mori G, Ren X, Efros A, Malik J (2004) Recovering human body configurations: combining segmentation and recognition. In: Proceedings of computer vision and pattern recognition, IEEE, pp 326–333 Ning H, Xu W, Gong Y, Huang T (2008) Discriminative learning of visual words for 3D human pose estimation. In: Proceedings of CVPR, IEEE Pollack J, Lipson H, Ficici S, Funes P (2000) Evolutionary techniques in physical robotics. In: Proceedings of International Conference Evolvable Systems (ICES). Springer-Verlag Raibert M, Hodjins K (1991) Animation of dynamic legged locomotion. ACM Comput Graph 25 (4):349–358 Ren X, Berg E, Malik J (2005) Recovering human body configurations using pairwise constraints between parts. In: Proceedings of International Conference on computer vision, IEEE, vol 1. pp 824–831 Rohan A (2015) 3D motion capture system market – global forecast to 2020. Tech. rep. Markets and Markets Inc., Vancouver Sims K (1994) Evolving virtual creatures. ACM Trans. on Graphics, ACM Srinivasan P, Shi J (2007) Bottom-up recognition and parsing of the human body. In: Proceedings of computer vision and pattern recognition, IEEE, pp 1–8 Tejera M, Casas D, Hilton A (2013) Animation control of surface motion capture. ACM Trans. on Graphics, IEEE, pp 1532–1545 Tin K, Coros S, Beaudoin P (2008) Continuation methods for adapting simulated skills. ACM Trans. on Graphics, IEEE, vol 27(3) Trumble M, Gilbert A, Hilton A, Collomosse J (2016) Learning markerless human pose estimation from multiple viewpoint video. In: Proceedings of ECCV workshops, ACM Viola P, Jones M (2004) Robust real-time object detection. Int J Comput Vis 2(57):137–154 Wampler K, Popovic Z (2009) Optimal gait and form for animal locomotion. ACM Trans. on Graphics, ACM, vol 28(3) Witkin A, Popovic Z (1995) Motion warping. ACM Trans. on Graphics, ACM, pp 105–108 Yao X (1999) Evolving artificial neural networks. In: Proceedings of the IEEE, IEEE, vol 87 Zhao T, Nevatia R (2003) Bayesian human segmentation in crowded situations. In: Proceedings of computer vision and pattern recognition, IEEE, vol 2. pp 459–466

Physically Based Character Animation Synthesis Jie Tan

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . State of the Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Physical Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simulation in Maximal Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simulation in Generalized Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Contact Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simulation Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Motion Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Trajectory Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Improving Realism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reducing Prior Knowledge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bringing the Character to the Real World . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2 3 5 6 7 8 10 10 11 14 17 17 17 18 19

Abstract

Understanding and synthesizing human motions are an important scientific quest. It also has broad applications in computer animation. Research on physically based character animation in the last two decades has achieved impressive advancement. A large variety of human activities are synthesized automatically in a physically simulated environment. The two key components of physically based character animation are (1) physical simulation that models the dynamics of humans and their environment and (2) controller optimization that optimizes the character’s motions in the simulation. This approach has an inherent realism because we all live in a world that obeys physical laws, and we evolved to survive J. Tan (*) Georgia Institute of Technology, Atlanta, GA, USA e-mail: [email protected] # Springer International Publishing Switzerland 2016 B. Müller, S.I. Wolf (eds.), Handbook of Human Motion, DOI 10.1007/978-3-319-30808-1_11-1

1

2

J. Tan

in this physical environment. In this chapter, we will review the state of the art of physically based character animation, introduce a few established methods in physical simulation and motor control, and discuss promising future directions. Keywords

Character animation • Physical simulation • Trajectory optimization • Reinforcement learning

Introduction Mother Nature has created a diverse set of awe-inspiring motions in the animal kingdom: Birds can fly in the sky, fishes can swim in the water, geckos can crawl on vertical surfaces, and cats can reorient themselves in midair. Similarly, human’s motions exhibit efficiency (locomotion), agility (kung fu), gracefulness (ballet), and dexterity (hand manipulation). Studying these motions is not only a scientific quest that quenches our curiosity but also an important step toward synthesizing them in a way that can fundamentally change our life. Character animation aims to faithfully synthesize these motions of humans and animals and display them to an audience for the purpose of entertainment, storytelling, and education. The synthesized motions need to appear realistic to give the audience an immersive experience. In the last few decades, we have seen tremendous advance in character animation. Some of the most breathtaking movies, such as Harry Potter, Avatar, and Life of Pi, rely heavily on computer-generated animations. Nowadays, it is almost impossible for the audience to tell apart the computer-synthesized motions from the real footage. Behind these realistic animations lie countless hours of tedious manual work of highly specialized experts. For example, to produce a 100-min feature film at Pixar can take dozens of artists and engineers more than 5 years of development. In today’s animation pipeline, the most popular techniques are key frames or motion capture, both of which require artistic expertise and laborious manual work. Even worse, the knowledge and efforts that are put into one animation sequence are not necessarily generalizable to other motions. In my point of view, these are not efficient or principled ways of animation synthesis. A principled way to synthesize character animation is to study the fundamental factors that have shaped our motions. Instead of focusing on the appearance of our motions, we need to dig deeper to understand why we move in the way that we are doing today. After understanding the root causes that have shaped our movements, we can then synthesize them naturally and automatically. Our motions are shaped through millions of years of optimization (evolution) in a world that obey physical laws. This insight has motivated a new paradigm of physically based character animation. The two key components of this paradigm are physical simulation and motion control. We first build a physical simulation to model the physical world and then perform optimization to control the motions of characters so that they can move purposefully, naturally, and robustly in the simulated environment.

Physically Based Character Animation Synthesis

3

Although we often take our motions for granted since we can perform them so effortlessly, physically based character animation is a notoriously difficult problem because our motions involve sophisticated neuromuscular control, sensory information processing, motion planning, coordinated muscle activation, and complicated interactions with the surrounding physical environment. Even though we are still far from fully understanding the underlying control mechanisms that govern our motions, two decades of research in physically based character animation has brought us new insights, effective methodologies, and impressive results. The purpose of this chapter is to review the state of the art (section “State of the Art”), introduce some of the established algorithms (sections “Physical Simulation” and “Motion Control”), and discuss promising future research directions (section “Future Directions”) in physically based character animation.

State of the Art Starting from the seminal work of Hodgins et al. (1995), controlling a physically simulated human character has been extensively studied in computer animation. A wide variety of human activities, including walking (Yin et al. 2007), running (Kwon and Hodgins 2010), swimming (Kwatra et al. 2009; Si et al. 2014), biking (Tan et al. 2014), dressing (Clegg et al. 2015), gymnastics (Hodgins et al. 1995), reacting to perturbations (Wang et al. 2010), falling and landing (Ha et al. 2012) and manipulating objects with hands (Liu 2009; Ye and Liu 2012; Bai and Liu 2014) are realistically synthesized in physically simulated environments (Fig. 1). Two widely used techniques in physically based character animation are trajectory optimization and reinforcement learning. Trajectory optimization formulates a constrained optimization to minimize a task-related objective function subject to physical constraints. It has been applied to control the iconic jumping Luxo Jr lamp (Witkin and Kass 1988), humanoid characters (Liu and Popovic´ 2002; Jain et al. 2009; Ye and Liu 2010), and characters with arbitrary morphologies (Wampler and Popovic´ 2009). The resulting motions are physically plausible and follow the animation principles such as anticipation and follow-through (Thomas and Johnston 1995). Reinforcement learning algorithms solve a Markov decision process (MDP) to find optimal actions at different states. When the MDP has moderate dimensions, (fitted) value function iteration has been successfully applied to generalize motion capture data (Treuille et al. 2007; Levine et al. 2012), to carry out locomotion tasks (Coros et al. 2009), and to manipulate objects with hands (Andrews and Kry 2013). When the dimensionality is high, policy search (Ng and Jordan 2000) can directly search for a control policy without the need to construct a value function. Many studies on locomotion control (Yin et al. 2008; Wang et al. 2009, 2012; Coros et al. 2011; Geijtenbeek et al. 2013) performed policy search on parameterized controllers. Although we have seen impressive advances for the last two decades, the gracefulness, agility, and versatility of real human motions remain unmatched.

4

J. Tan

Fig. 1 Various human activities, such as running, swimming, dressing, performing bicycle stunts, interacting with the environment, and manipulating clothes, are modeled in a physically simulated environment (Image courtesy of (Hodgins et al. 1995; Si et al. 2014; Clegg et al. 2015; Tan et al. 2014; Coros et al. 2010; Bai and Liu 2014))

There are challenges in physically based character animation that need further investigation. First, controlling balance is a key problem of synthesizing human motions in a physically simulated environment. Balance can be maintained by exerting virtual forces (Pratt et al. 2001; Coros et al. 2010), applying linear feedback (Laszlo et al. 1996; Yin et al. 2007; da Silva et al. 2008; Coros et al. 2010), using nonlinear control policies (Muico et al. 2009), planning the contact forces (Muico et al. 2009; Tan et al. 2012b), employing reduced models (Tsai et al. 2010; Kwon and Hod- gins 2010; Mordatch et al. 2010; Coros et al. 2010; Ye and Liu 2010), and training in stochastic environments (Wang et al. 2010). Although the balance problem in simple locomotion tasks, such as walking and running, has been solved, maintaining balance in tasks that require agile motions remains an open problem. Another challenge is to effectively plan the contacts. We human can only move ourselves and other objects through contacts. However, contact events (contact breakage, sliding, etc.) introduce unsmooth forces to the dynamics, which breaks the control space into fragmented feasible regions. As a result, a small change in control parameters can easily generate bifurcated consequences. For this reason, many previous methods explicitly assumed that the contacts remain static (Abe et al. 2007; Jain et al. 2009; Kim and Pollard 2011) while optimizing controllers. This assumption significantly restricts the effectiveness of the controller because the controller is not allowed to actively exploit contact breakage, slipping contacts, or rolling contacts to achieve control goals. Three promising research directions to

Physically Based Character Animation Synthesis

5

tackle this challenge are contact-invariant optimization (Mordatch et al. 2012, 2013), QPCC (Tan et al. 2012b) and policy search with stochastic optimization (Wu and Popovic´ 2010; Wang et al. 2010; Mordatch et al. 2010). An important criterion in character animation is the realism of the synthesized motions. There is still large room to improve the quality of physically based character animation. One possible cause of the unnatural motions is the vast simplification of the human models. To improve the realism, prior work has simulated the dynamics of muscles and demonstrated complex interplay among bones, muscles, ligaments, and other soft tissues for individual body parts, including the neck (Lee and Terzopoulos 2006), upper body (Zordan et al. 2006; DiLorenzo et al. 2008; Lee et al. 2009), lower body (Wang et al. 2012), and hands (Tsang et al. 2005; Sueda et al. 2008). However, constructing such a sophisticated biological model for a fullhuman character is computationally prohibitive. An alternative solution is to augment a physically controlled character with realistic motion capture streams (da Silva et al. 2008; Muico et al. 2009; Liu et al. 2010).

Physical Simulation Physically based character animation consists of two parts, simulation and control. This section concentrates on the simulation part while the next section on control. Although the majority of research in physically based character animation focuses on control, a good understanding of physical simulation is essential for designing effective controllers because complex human behaviors often require sophisticated controllers that exploit the dynamics of a multi-body system. In physically based character animation, a human character is often represented as an articulated rigid-body system (Fig. 2 left), a group of rigid bodies chained together through rotational joints. These joints can have different number of degrees of freedom (DOF). For example, the shoulder is a ball joint (three DOFs), the wrist is a universal joint (two DOFs), and the elbow is a hinge joint (one DOF). In some cases, if the character’s motion involves dexterous hand manipulation, a detailed hand model (Fig. 2 right) is attached to each wrist. Note that the articulated rigidbody system is a dramatic but necessary simplification since simulating each bone, muscle, and tendon that a real human has would require a prohibitively huge amount of computational resources. In the simulation, the articulated figure is represented as a tree structure. Each node is a rigid body and each edge is a joint. One node can have multiple children but at most one parent. The root node has no parent. While any body can be selected as the root node, a common choice is to use the torso as the root. In this tree structure, loops are not allowed. Although it is possible to simulate loops, such case is rare in character animation and will not be discussed here. There are two major methods to simulate the dynamics of an articulated rigid-body system: simulation in maximal coordinates (Cartesian space) and simulation in generalized coordinates (joint space).

6

J. Tan

Fig. 2 Articulated figures in character animation to represent a human character (left) and a hand (right)

Simulation in Maximal Coordinates In maximal coordinates, the physical states of the articulated figures are defined for each node (rigid body). Each body has six degrees of freedom: three translational and three rotational. The dynamics of each rigid body is considered independently. A list of additional joint constraints is imposed to ensure that the two adjacent bodies will stick together at the joint location. The dynamic equation for each body is 

m I33 0

0 I



       v_ mg f 0 ¼ þ þ a _ ω_ Iω τ τ

(1)

where m and I are the mass and the inertia tensor of the body; I33 is a 3  3 identity matrix; v and ω are the linear and angular velocities; [f, τ]T are the passive forces and torques from joint constraints, contacts, gravity, and other external sources; and τ a are the torques actively exerted by the controllers, which is the focus of section “Motion Control.” Joints that connect two rigid bodies constrain their relative motions. Different number of constraints is imposed according to the type of joints. For example, a hinge joint has only one DOF. Thus it has five constraints that eliminate all but the rotation along the hinge axis. A ball joint has three DOFs. Its constraints eliminate the relative translation at the joint location. Suppose a joint connects body A and body B, the translational constraints are: 

I33 ½rA



    I33 vA vB  ¼0 ½rB ωB ωA

Physically Based Character Animation Synthesis

7

where [r] is the skew symmetric matrix of r, which is the vector from the center of mass (COM) of the body to the joint location. The rotational constraints are dTi ðωA  ωB Þ ¼ 0 where di is an axis perpendicular to the rotational DOFs and i could be a subset of {0, 1, 2} depending on the type of joints. To allow a character to actively control its motion, actuators are attached to the joints. According to Newton’s third law, the two bodies connected to a common actuator should receive equal and opposite joint torques. τaA  τaB ¼ 0 where τA and τB are the torques exerted by the actuator to body A and B, respectively. Although simple to understand and implement, simulating characters in maximal coordinates has a few drawbacks. First, the state representation is redundant. It is not efficient to use all six DOFs of a rigid body and then eliminate most of them with joint constraints. Second, the accumulating numerical errors in simulation would cause the joint constraints not satisfied exactly. Eventually, joints will dislocate and adjacent bodies can drift apart. Both of these shortcomings can be overcome by simulation in generalized coordinates.

Simulation in Generalized Coordinates In generalized coordinates, the physical states q, q_ of the articulated figure, are defined on the edges of the tree (joint angles). Note that the root node is attached to the world space via a 6-DOF joint that can translate and rotate freely. Each DOF is one component of q. The number of DOFs equals the dimensionality of q. In other words, there is no redundancy in this representation. The dynamic equation in generalized coordinates for an articulated rigid-body system is MðqÞq€ þ Cðq,q_ Þ ¼ Q þ τa

(2)

where q, q_ , and q€ are the position, velocity, and acceleration in generalized coordinates; M(q) is the mass matrix; C(q, q_ ) accounts for the Coriolis and centrifugal force; Q is the external generalized force, including gravity and contact force; and τa is the generalized force exerted by the controller (section “Motion Control”). This equation can be derived from Lagrangian dynamics. Detailed derivation is omitted here but can be found in Liu and Jain (2012). When the articulated rigid bodies are simulated in generalized coordinates, it is often necessary to convert physical quantities back and forth between generalized and maximal coordinates. For example, we need to compute the velocity at certain point on the articulated figure in Cartesian space for collision detection. We also need

8

J. Tan

to convert the forces from the Cartesian space to generalized coordinates to apply contact forces. Jacobian matrix J bridges these two different coordinate systems. J¼

@x @q

(3)

It represents the relation how much a point x moves in the Cartesian space if the joint angles q changes slightly. Here are the two most frequently used formulas that convert velocities and forces, and more conversion formulas can be found in Liu and Jain (2012). v ¼ Jq_ Q ¼ JT f Simulation in generalized coordinates is widely used in physically based character animation. Although it takes more effort to walk through the derivation, it has important advantages over simulation in maximal coordinates. Apparently, the representation is more compact. There is no redundancy and thus no need to use constraints to eliminate redundant states. More importantly, it ensures that the joints are satisfied exactly. Two connecting bodies can never drift apart even with numerical errors because the states of dislocated joints are not part of the state space in generalized coordinates.

Contact Modeling Most of our daily activities, such as locomotion and hand manipulation, involve interacting with our surrounding environments through contacts. Accurately simulating contacts and computing contact forces are crucial to physically based character animation. Penalty method and linear complementarity problem (LCP) are two widely used methods to model contact.

Penalty Method When a body A penetrates another body B, a repulsive penalty force fc is exerted to separate these two bodies.  fc ¼

 kdn 0

if d > 0, otherwise:

(4)

where k is the stiffness, d is the penetration depth, and n is the contact normal. Penalty method is trivial to implement. However, to make it work properly, tedious manual tuning is often needed. While too small k cannot effectively stop the penetration, too large k would lead to undesired bouncy collision response. Even worse, when simulating with large time steps, penalty method could make the

Physically Based Character Animation Synthesis

9

simulation unstable. In addition, it is not clear how to accurately model static friction using penalty methods.

Linear Complementarity Problem Linear complementarity problem (LCP) is a more accurate and stable method to model contacts. A contact force fc can be decomposed into the normal and the tangential (frictional) forces. f c ¼ f ⊥ n þ Bf jj where n is the contact normal; f⊥ and fk are the normal and tangential components, respectively; and B is a set of bases that span the tangential plane (Fig. 3). The more bases bi are used, the more accurate the approximation of the friction cone is, but more computation is needed to solve the resulting LCP. LCP imposes a set of constraints to satisfy the conditions of Coulomb friction: 1. In the normal direction, only repulsive forces are exerted to stop penetration. 2. In a static contact situation, the contact force lies within the friction cone. 3. In a sliding contact situation, the contact force lies at the boundary of the friction cone and the friction direction is opposite to the sliding direction. I will illustrate the concept of LCP using the formulation in the normal direction. The formulation in the tangential directions is beyond the scope of this chapter. It can be found in the following tutorials (Lloyd 2005; Tan and et al. 2012a). In a physical simulation, after the collisions are resolved, the relative velocity between the contact points of two colliding bodies can only be either zero (resting) or positive (separating), but not negative (penetrating): v⊥  0

(5)

Fig. 3 A linearized friction cone used in LCP formulation. Left: a foot is in contact with the ground. Right: the friction cone at the contact point. n is the contact normal, and bi are a set of tangential bases

10

J. Tan

Similarly, the normal contact force can be zero (no force) or positive (repulsive force), but not negative (sticking force): f⊥  0

(6)

The repulsive normal force exists (f ⊥ > 0) if and only if the two bodies are in contact (v⊥ ¼ 0). In contrast, when they are separating (v⊥ > 0), there is no contact force (f ⊥ ¼ 0). In other complementarity condition needs to be satisfied: v⊥ f ⊥ ¼ 0

(7)

Combining the dynamic equations (Eq. 1 or 2) and the LCP constraints (Eqs. 5, 6, and 7) forms a mixed LCP problem. It can be solved efficiently by direct (Lloyd 2005) or iterative solvers (Erleben 2007; Kaufman et al. 2008; Otaduy et al. 2009).

Simulation Software There is a growing need for simulation software that can accurately simulate the complex dynamics of virtual humans and their interactions with their surrounding environment. A number of open-source physical simulators are readily available for research in physically based character animation. The popular ones include Open Dynamic Engine (ODE) (http://www.ode.org/), Bullet (http://www.bulletphysics. org/), Dynamic Animation and Robotics Toolkit (DART) (http://dartsim.github.io/) and MuJoCo (http://www.mujoco.org). All of them can simulate articulated rigid bodies with an LCP-based contact model in real time. These simulators allow the user to specify the structure of the articulated figure, the shape and the physical properties of each body, the type of joints, and other parameters describing the environment. Different simulators may offer different features, speed, and accuracy. Erez et al. (2015) provided an up-to-date review and an in-depth comparison among these modern physical engines.

Motion Control We human can carefully plan our motion, coordinately and purposefully, and exercise our muscles to achieve a wide variety of high-level tasks, ranging from simple locomotion, dexterous hand manipulation, to highly skillful stunts. To model them in animation, simulating the passive dynamics is not enough. The key challenge that motion control tackles is to find controllers that can achieve high-level motion tasks (e.g., walk at 1 m/s, grasp a bottle and open the cap, etc.). In character animation, a controller is the character’s “algorithmic brain” that decides how much torque (τ a in Eqs. 1 and 2) are needed at each joint to successfully fulfill the task in a way that mimics human behavior. Optimization-based motion control is the most extensively researched topic in physically based character animation. The

Physically Based Character Animation Synthesis

11

Fig. 4 Different stages of walking in SIMBICON (Image courtesy of (Yin et al. 2007))

left stance 0.3 s 0

1

right foot strike

left foot strike

3

2 0.3 s right stance

optimization searches for a controller that minimizes a task-related cost function, subject to dynamical constraints. One common misunderstanding is that one can formulate a large optimization for arbitrary tasks. Due to complexity of human motions and nonlinearity of the dynamics, a large optimization may have competing objectives and many local minima. Up till today, there are no efficient optimization algorithms that can reliably find meaningful controllers in such cases. For this reason, a common practice in this field is to decompose a high-level task into multiple lower-level subtasks and formulate smaller optimization for each of the simpler subtask. For example, in SIMBICON (Yin et al. 2007), a walking cycle is decomposed into multiple stages (Fig. 4). Within each stage, separate optimizations can be used for controlling the two legs, the upper body, the balance, and the style. After solving all the optimizations, these low-level controllers can be combined so that the character can walk naturally and robustly. Controller decomposition depends on the task and requires domain knowledge. We refer the readers to the research literature to learn controller decomposition on a case-by-case basis. In this section, we will discuss two generic optimization-based methods of motion control: trajectory optimization and reinforcement learning.

Trajectory Optimization Starting from the classical paper “Spacetime Constraints” (Witkin and Kass 1988), trajectory optimization has become a mainstream technique in physically based character animation. It searches for a controller that minimizes a cost function subject to physical constraints. The general form of the optimization is

12

J. Tan

minx, u subject to

XN t¼0

gðxt , ut Þ

xtþ1 ¼ hðxt Þ þ Bt ut

(8)

where x is the physical states and u is the control. In character animation, the states are usually defined as x :¼ ½q,q_ T , and the control u :¼ τa . g is the cost function, which is handcrafted to reflect the high-level task. For example, if the task is to walk at 1 m/s, one term in the cost function could be the distance between the current COM of the character and a desired COM position that moves at 1 m/s. The constraint usually consists of the dynamic equation h. Note that in most applications of character animation, the dynamics are nonlinear to the state x but linear to the control u (see Eqs. 1 and 2). In addition, the constraints can also include joint limits, torque limits, and other task-related requirements. To make it more concrete, we will revisit the simple example in the original “Spacetime Constraints” paper: controlling a single particle. The task of this particle is to fly from point a to point b in T seconds using a time-varying jet force f(t). The dynamics of the particle is m€ x  f  mg ¼ 0, where x is its position, m is its mass, and g is gravity. The goal of this flight is to minimize the total fuel consumption ÐT 2 0 jfj dt. After discretization along time, the optimization has the following form: minx, f subject to

XN t¼0

jf t j 2

xtþ1 ¼ 2xt  xt1 þ Δtm2 f t þ Δt2 g x0 ¼ a xN ¼ b It is not too difficult to extend the above derivations to control a human character. We will need to change the control force f(t) to joint torques τa (t), the physical constraint to the dynamic equation of articulated rigid bodies (Eq. 1 or 2), and the cost function to a relevant function specific to the task. There are different options to solve the optimization according to the structure of the problem. Assuming the cost function and the dynamic equations are smooth, Witkin and Kass (1988) applied a generic nonlinear optimizer, sequential quadratic programming (SQP), to solve the problem. It is an iterative optimization technique that solves a sequence of quadratic programs that approximate the original problem. The solution of SQP is an optimal trajectory of state x(t) and control u(t). Note that this method produces a feedforward (open-loop) controller: a trajectory over time. It cannot be generalized to the neighboring regions of the state space. As a result, the controller will fail the task even with a slight disturbance to the motion. When the cost function is quadratic and the dynamic equation is linear,

Physically Based Character Animation Synthesis

minx, u subject to

XN t¼0

13

xTt Qt xt þ uTt Rt ut

xtþ1 ¼ At xt þ Bt ut

(9)

the trajectory optimization is called an LQ problem. This problem can be solved very efficiently by linear-quadratic regulator (LQR). The derivation of LQR can be found in most of optimal control textbooks (Todorov 2006). Thus we will not repeat it here. The solution is a feedback (close-loop) controller ut ¼ Kt xt. Although the requirement of linear dynamics seems restrictive, LQR still plays an important role in physically based character animation. One important application is to design a physically based controller to track motion captured data, which is an effective way to increase the realism of the synthesized motions. Given a motion capture sequence x, we can linearize the dynamic equation at its vicinity: Δ xtþ1 ¼

@h Δ xt þ Bt ut þ hðxt Þ  xtþ1 @x

where Δ x ¼ x  x . This gives an LQ problem that seeks a feedback controller ut = KtΔxt that minimizes the difference between the actual and the reference motion over the entire trajectory. More importantly, LQR is a building block to solve the more general trajectory optimization problem (Eq. 8). Given an initial trajectory x0, u0, x1, u1, . . ., uN, xN, we can perform the following steps iteratively: 1. Compute the LQ approximation of the original problem (Eq. 8) around the current trajectory by computing a first-order Taylor expansion of the dynamics and a second-order expansion of the cost function. 2. Use LQR to solve the LQ approximation to get an optimal controller. 3. Apply the current optimal controller to generate a new trajectory. 4. Go to step 1 until convergence. This iterative-LQR process is similar to the core idea behind differential dynamic programming (DDP). We refer the interested reader to Todorov (2006) for a more thorough discussion about LQR and DDP. The key advantage of DDP is that it not only provides a feedforward trajectory, but also an optimal feedback controller near that trajectory. To sum up, trajectory optimization is an effective way to synthesize character animation. The synthesized motion is not only physically correct but, more importantly, demonstrates an important animation principle, anticipation. This is because the objective of trajectory optimization is to minimize a long-term cost. Thus, the character can move intelligently that gives up short-term gains to minimize the longterm cost. For example, for a jump-up task, the trajectory optimization could produce a controller that sacrifices the height of the character’s COM first for a much higher jump later. However, there are a few shortcomings of trajectory optimization. First, it often leads to a high-dimensional optimization that is expensive to solve. Another

14

J. Tan

problem of high-dimensional nonlinear optimization is that the solver is more likely to get stuck at bad local minima. Thus, a good initialization is extremely important. Last but not least, trajectory optimization exploits the mathematical form of dynamic equations to design the optimal controller. If the dynamics is not smooth, too complicated, or unknown, it is not clear how to apply trajectory optimization methods.

Reinforcement Learning Reinforcement learning is motivated by the learning process of humans. It optimizes a controller by interacting with the physical environment with numerous trials. Initially, the controller tries out random moves. If a desired behavior is observed, a reward is provided as the positive reinforcement. This reward system will gradually shape the controller until eventually it successfully fulfills the high-level task. Reinforcement learning is an active research area, with a large number of algorithms proposed every year. We refer readers to Kaelbling et al. (1996) and Kober et al. (2013) for a thorough review. We will focus on policy search in the remaining of this section. Policy search is a popular reinforcement learning algorithm in physically based character animation. It performs extremely well in this field because it can solve control problems with high-dimensional continuous state and action spaces, which is essential to control a human character. Mathematically, reinforcement learning solves a Markov decision process (MDP). An MDP is a tuple (S, A, R, D, γ, Psas0 ), where S is the state space and A is the action space. The states reflect what the current situation is and the actions are what a character can perform to achieve the specified task. R is the reward function. A reward function is a mapping from the state-action space to a real number: R : S  A 7! R, which evaluates how good the current state and action are. D is the distribution of the initial state s0  D and γ  [0, 1] is the discount factor of the reward over time. Psas0 is the transition probability. It outputs the probability that the next state is s0 if an action a is taken at the current state s. In physically based character animation, the transition probability is computed by physical simulation. Although most of the physical simulations are deterministic, random noise can be added in simulation to increase the robustness of the learned controller (Wang et al. 2010). The solution of an MDP is a control policy π that maps the state space to the action space π : S 7! A. It decides what actions to take at different situations. The return of a policy is the accumulated rewards along the state trajectory starting at s0 by following the policy π for N steps. V π ðs0 Þ ¼

N X i¼0

γ Ni Rðsi , π ðsi ÞÞ

Physically Based Character Animation Synthesis

15

The reward at earlier states can be exponentially discounted over time. The value of a policy is the expected return with respect to the random initial state s0 drawn from D. V ðπ Þ ¼ Es0 D ½V π ðs0 Þ

(10)

Note that the goal of MDP is not to maximize the short-term reward R at the next state, but a long-term value function V. Using V instead of R as the optimization target prevents the controller from applying short-sighted greedy strategies. This agrees with our ability of long-term planning when we are executing our motions. To formulate an MDP, we need to design the state space S, the action space A, and the reward function R for a given task. Ideally, the state space should contain all the possible states of the articulated rigid-body system, including the joint angles q, joint velocities q_ , and time t, and the actions should include all the joint torques τa. However, this means that the state and the action space can have hundreds of dimensions. Due to the curse of dimensionality, solving an MDP in such a highdimensional continuous space is computationally infeasible. In practice, researchers often carefully select states and actions specifically for the tasks in question to make the computation tractable. For example, if the task is to keep balance while standing, the state space only needs to include important features for balance, such as the character’s COM and the center of the ground contact polygon. Similarly, given the well-known balance strategies, such as ankle strategy and hip strategy, the action space can be as simple as a few torques at lower body joints. Using prior knowledge of the task can greatly simplify the problem, which is a common practice in physically based character animation. A reward function for character animation usually consists of two parts, a taskrelated component that measures how far the current state is from the goal state and a component that evaluates the naturalness of the motions. Designing a good reward function is essential to the success of the entire learning algorithm. A good design of the reward should be a smooth function that gives continuous positive reinforce whenever progress is made. Mathematically, this design will present gradient information that can guide optimization solvers. In contrast, a common mistake is to give reward only when the task is achieved, which makes the reward function a narrow spike and flat zero elsewhere. This should be avoided because nearly all the optimization algorithms would have trouble finding such a spike. To apply policy search to solve the MDP, we need to parameterize the policy. A policy can be any arbitrary function. A practical way to optimize it is to parameterize it and then search for the optimal policy parameters. Commonly used parameterizations include lookup tables, linear functions, splines, and neural networks. Parameterization determines the potential quality of the final policy. However, there is no consensus what the best way is to parameterize a policy. It is decided on a case-bycase basis. Once the policy parameterization is decided, an initial policy is iteratively evaluated and improved until the optimization converges or a user-specified maximum number of iterations is reached.

16

J. Tan

To evaluate a policy, we can execute the policy in the simulation for N time steps with different initial states s0  D, accumulate the rewards, and average the returns to compute the value of the policy. Policy improvement adjusts the policy parameters to increase its value. A straightforward way is to follow the policy gradient (Ng and Jordan 2000). However, in many character animation tasks, such as locomotion and hand manipulation, contact events happen frequently, which introduces unsmooth contact forces that invalidate policy gradient. For this reason, sample-based stochastic optimization techniques are particularly suited for physically based character animation. Covariance matrix adaptation evolution strategy (CMA-ES) (Hansen 2006) is the most frequent applied optimization methods for motion control. CMA can work as long as we can evaluate the value of the policy. It does not need to compute gradient and does not rely on good initializations. More importantly, CMA is a “global” search algorithm that explores multiple local optima. Although there is no guarantee that CMA will converge at the global optimum, in practice, it often finds good local optima in moderately high-dimensional control spaces (e.g., 20–30 dimensions). For the completeness of the chapter, we will briefly describe the CMA algorithm. Readers can refer to the original paper (Hansen 2006) for additional details. CMA starts with an initial underlying Gaussian distribution in the policy parameter space with a large covariance matrix. A population of samples is drawn from this distribution. The first generation of samples is not biased in the parameter space due to the large covariance matrix. Each CMA sample represents a control policy. The policies are evaluated through simulation. They are sorted according to their values and a certain percentage of the inferior samples are discarded. The underlying Gaussian distribution is updated according to the remaining good samples and is used to generate the next generation of the samples. This process is performed iteratively. Over iterations, the underlying distribution is shifted and narrowed. Eventually, the distribution converges to a good region of the policy space. The best CMA sample throughout all the iterations is selected as the optimal control policy. In summary, reinforcement learning is a generic method for motion control. It can automatically learn a wide range of behaviors through simulation trials. Reinforcement learning, more specifically, policy search, is becoming one of the most popular approaches in character animation synthesis. Reinforcement learning does not assume any mathematical forms of the dynamic equation. It treats the physical simulation as a black box, as long as it can output the next state given the current state and action. Thus, it is not bounded to a particular dynamics model. The same learning algorithm can still work even if the simulation software is upgraded. The main challenge of reinforcement learning is to design the states, the actions, the reward, and the policy parameterization. We need to inject enough prior knowledge into the design so that the search space is small enough to be computationally feasible, but not too small to contain effective policies. This requires a lot of experience and manual tweaking, especially for challenging motion tasks.

Physically Based Character Animation Synthesis

17

Future Directions The research on physically based character animation has achieved stunning results in the past two decades. However, we are still far from the ultimate goal of character animation: a fully automatic system that can synthesize visually realistic motions that are comparable to those of real humans. Many interesting problems remain to be challenging research problems. Here I list a few promising future research directions.

Improving Realism One of the biggest issues of the physically based character animation is its quality. The synthesized motions are not yet realistic enough for broader applications. It is currently not comparable to the quality of the animations that are hand-tuned by artists. One important reason is that the articulated rigid-body systems that are widely used today are a vast simplification of the real human model. A real human has 206 bones and over 600 muscles, which has far more degrees of freedom. In addition, the joints of the articulated rigid body are controlled independently, but human joints move in coordination due to the intricate arrangement of muscles and tendons. A recent trend is to build more sophisticated human models based on biological musculotendon structures (Lee and Terzopoulos 2006; Lee et al. 2009; Wang et al. 2012). It has been demonstrated that an accurate human model can dramatically improve the realism of the synthesized motions. As more computational power is made available to us in the next decade, I expect that we can soon afford to use highly detailed human models to synthesize character animation with high fidelity. Another source of the unnaturalness is due to the handcrafted objective function in motion control. The objective functions focus mostly on energy efficiency aspect of motions. However, efficient motions do not equal natural motions. Although minimizing energy expenditure is one important factor that governs our motion, it is not the only factor. Our motions are also governed by personal habits, emotion, task, environment, and many other external factors. It is extremely challenging to handcraft objective functions for all of them. Assuming that we have abundant of motion data, which is a realistic assumption given the large volume of motion sensors installed in phones and other wearable computing devices, it is promising to extract objective functions from these data using inverse reinforcement learning (Ng and Russell 2000).

Reducing Prior Knowledge Although physically based character animation frees us from much manual work of traditional animation pipelines, it still requires some high-level prior knowledge to work effectively. For example, we know that regulating the COM of a character relative to the contact points is important for balance tasks. We can inject this prior

18

J. Tan

knowledge by manually choosing the COM and the ground contact points as features and include them into the state space of reinforcement learning. Selecting the right features (prior knowledge) is crucial for many of the current control algorithms. However, good features for one task may not carry over to different tasks. Manually selecting features would not scale to more sophisticated characters, more complicated environments, or more challenging tasks. We need an algorithm that can discover control strategies with less or even no prior knowledge. This reminds me of the recent success of deep learning. The way that we are using hand-engineered features in reinforcement learning today is analogue to using HoG or SIFT features in computer vision a few years ago. Recent advance in computer vision has demonstrated that deep neural networks, such as autoencoder (Vincent et al. 2008) or restricted Boltzmann machine (Hinton 2012), can learn features automatically. I believe that the next breakthrough in reinforcement learning is to employ similar techniques to automatically discover important features for different motion tasks.

Bringing the Character to the Real World The recent development in physically based character animation has introduced a set of powerful computational tools. With these tools, natural, agile, and robust motions can be synthesized efficiently and autonomously in a simulation. However, creating lifelike robots is still an extremely challenging, trial-and-error process that is restricted to experts. The fast evolution of 3D printing technology will soon trigger a shift in the robotics industry from mass production to personalized design and fabrication, which will result in an immediate need for a faster, cheaper, and more intuitive way to design robotic controllers. I believe that the computational tools that are developed in physically based character animation can potentially automate and streamline the process if we can transfer the controllers from the virtual simulation to the real world. Transferring controllers optimized in a simulation onto a real robot is a nontrivial task. An optimal controller that works in a state-of-the-art simulation often fails in a real environment. This is known as the Reality Gap. This gap is caused by various simplifications in the simulation, including inaccurate physical models, unmodeled actuator dynamics, assumptions of perfect sensing, and zero latency. To fully tap into the power of the computational tools, we need to develop more accurate physical simulations to bridge the Reality Gap. Researchers in physically based character animation have started to investigate this problem (Bharaj et al. 2015; Megaro et al. 2015). I believe that with further research and development, the Reality Gap will shrink rapidly, which will make it easier to transfer controllers from the simulation to the real world. As a result, I envision that the two separate research fields of character animation and robotics will eventually start to merge. This will inevitably trigger a fundamental revolution in both character animation and robotics.

Physically Based Character Animation Synthesis

19

References Abe Y, da Silva M, Popovic’ J (2007) Multiobjective control with frictional contacts. In: Proceedings of the 2007 ACM SIGGRAPH/Eurographics symposium on Computer animation. SCA”07. pp 249–258 Andrews S, Kry P (2013) Goal directed multi-finger manipulation: control policies and analysis. Comput Graph 37(7):830–839 Bai Y, Liu CK (2014) Coupling cloth and rigid bodies for dexterous manipulation. In: Proceedings of the seventh international conference on motion in games. MIG”14. ACM, pp 139–145 Bharaj G, Coros S, Thomaszewski B, Tompkin J, Bickel B, Pfister H (2015) Computational design of walking automata. In: Proceedings of the 14th ACM SIGGRAPH/eurographics symposium on computer animation. SCA”15. ACM, pp 93–100 Clegg A, Tan J, Turk G, Liu CK (2015) Animating human dressing. ACM Trans Graph 34 (4):116:1–116:9 Coros S, Beaudoin P, van de Panne M (2009) Robust task-based control policies for physics-based characters. ACM Trans Graph 28(5):170:1–170:9 Coros S, Beaudoin P, van de Panne M (2010) Generalized biped walking control. ACM Trans Graph 29(4):130, Article 130 Coros S, Karpathy A, Jones B, Reveret L, van de Panne M (2011) Locomotion skills for simulated quadrupeds. ACM Trans Graph 30(4):59 Da Silva M, Abe Y, Popovic’ J (2008) Interactive simulation of stylized human locomotion. In: ACM SIGGRAPH 2008 Papers. SIGGRAPH”08. ACM, pp 82:1–82:10 DiLorenzo PC, Zordan VB, Sanders BL (2008) Laughing out loud: control for modeling anatomically inspired laughter using audio. In: ACM SIGGRAPH Asia 2008 papers. SIGGRAPH Asia”08. pp 125:1–125:8 Erez T, Tassa Y, Todorov E (2015) Simulation tools for model-based robotics: comparison of bullet, havok, mujoco, ode and physx. In: ‘ICRA’, IEEE. pp 4397–4404 Erleben K (2007) Velocity-based shock propagation for multibody dynamics animation. ACM Transactions on Graphics (TOG), 26(2), Article No. 12. Geijtenbeek T, van de Panne M, van der Stappen AF (2013) Flexible muscle-based locomotion for bipedal creatures. ACM Trans Graph 32(6) Ha S, Ye Y, Liu CK (2012) Falling and landing motion control for character animation. ACM Trans Graph 31(6):1 Hansen N (2006) The cma evolution strategy: a comparing review. In: Towards a new evolutionary computation. Springer, New York, pp 75–102 Hinton GE (2012) A practical guide to training restricted Boltzmann machines. In: Montavon G, Orr GB, Mller K-R (eds) Neural networks: tricks of the trade, 2nd edn, Lecture notes in computer science. Springer, New York, pp 599–619 Hodgins JK, Wooten WL, Brogan DC, O’Brien JF (1995) Animating human athletics. In: SIGGRAPH. pp 71–78 Jain S, Ye Y, Liu CK (2009) Optimization-based interactive motion synthesis. ACM Trans Graph 28 (1):1–10 Kaelbling LP, Littman ML, Moore AP (1996) Reinforcement learning: a survey. J Artif Intell Res 4:237–285 Kaufman DM, Sueda S, James DL, Pai DK (2008) Staggered projections for frictional contact in multibody systems. ACM Trans Graph 27:164:1–164:11 Kim J, Pollard NS (2011) Direct control of simulated non-human characters. IEEE Comput Graph Appl 31(4):56–65 Kober J, Bagnell JAD, Peters J (2013) Reinforcement learning in robotics: a survey. Int J Robot Res 32:1238 Kwatra N, Wojtan C, Carlson M, Essa I, Mucha P, Turk G (2009) Fluid simulation with articulated bodies. IEEE Trans Vis Comput Graph 16(1):70–80

20

J. Tan

Kwon T, Hodgins J (2010) Control systems for human running using an inverted pendulum model and a reference motion capture sequence. In: Proceedings of the 2010 ACM SIGGRAPH/eurographics symposium on computer animation. SCA”10. Eurographics Association, pp 129–138 Laszlo J, van de Panne M, Fiume E (1996) Limit cycle control and its application to the animation of balancing and walking. In: Proceedings of the 23rd annual conference on computer graphics and interactive techniques. SIGGRAPH”96. ACM, pp 155–162 Lee S-H, Terzopoulos D (2006) Heads up! Biomechanical modeling and neuromuscular control of the neck. ACM Trans Graph 25(3):1188–1198 Lee S-H, Sifakis E, Terzopoulos D (2009) Comprehensive biomechanical modeling and simulation of the upper body. ACM Trans Graph 28:99:1–99:17 Levine S, Wang JM, Haraux A, Popovic’ Z, Koltun V (2012) Continuous character control with low-dimensional embeddings. ACM Trans Graph 31(4):28:1–28:10 Liu CK (2009) Dextrous manipulation from a grasping pose. ACM Transactions on Graphics (TOG) - Proceedings of ACM SIGGRAPH 2009, 28(3), Article No. 59. Liu CK, Jain S (2012) A short tutorial on multibody dynamics, Technical report GIT-GVU-15-01-1, Georgia Institute of Technology, School of Interactive Computing Liu CK, Popovic’ Z (2002) Synthesis of complex dynamic character motion from simple animations. In: Proceedings of the 29th annual conference on computer graphics and interactive techniques. SIGGRAPH”02. ACM, pp 408–416 Liu L, Yin K, van de Panne M, Shao T, Xu W (2010) Sampling-based contact-rich motion control. ACM Trans Graph 29(4), Article 128 Lloyd J (2005) Fast implementation of Lemke’s algorithm for rigid body contact simulation. In: Proceedings of the 2005 I.E. international conference on robotics and automation. ICRA 2005. pp 4538–4543 Megaro V, Thomaszewski B, Nitti M, Hilliges O, Gross M, Coros S (2015) Interactive design of 3d-printable robotic creatures. ACM Trans Graph 34(6):216:1–216:9 Mordatch I, de Lasa M, Hertzmann A (2010) Robust physics-based locomotion using low-dimensional planning. In: ACM SIGGRAPH 2010 papers. SIG- GRAPH”10. ACM, pp 71:1–71:8 Mordatch I, Popovic’ Z, Todorov E (2012) Contact-invariant optimization for hand manipulation. In: Proceedings of the ACM SIGGRAPH/eurographics symposium on computer animation. SCA”12. Eurographics Association, pp 137–144 Mordatch I, Wang JM, Todorov E, Koltun V (2013) Animating human lower limbs using contactinvariant optimization. ACM Trans Graph 32(6):203:1–203:8 Muico U, Lee Y, Popovic’ J, Popovic’ Z (2009) Contact-aware nonlinear control of dynamic characters. In: ACM SIGGRAPH 2009 papers. SIGGRAPH”09. ACM, pp 81:1–81:9 Ng AY, Jordan M (2000) Pegasus: a policy search method for large MDPs and POMDPs. In: Proceedings of the sixteenth conference on uncertainty in artificial intelligence, UAI’00. Morgan Kaufmann Publishers, San Francisco, pp 406–415 Ng AY, Russell SJ (2000) Algorithms for inverse reinforcement learning. In: Proceedings of the seventeenth international conference on machine learning, ICML”00. Morgan Kaufmann Publishers, San Francisco, pp 663–670 Otaduy MA, Tamstorf R, Steinemann D, Gross M (2009) Implicit contact handling for deformable objects. Comput Graph Forum (Proc. of Euro- graphics) 28(2):559-568 Pratt JE, Chew C-M, Torres A, Dilworth P, Pratt GA (2001) Virtual model control: an intuitive approach for bipedal locomotion. Int J Robot Res 20(2):129–143 Si W, Lee S-H, Sifakis E, Terzopoulos D (2014) Realistic biomechanical simulation and control of human swimming. ACM Trans Graph 34(1):10:1–10:15 Sueda S, Kaufman A, Pai DK (2008) Musculotendon simulation for hand animation. ACM Trans Graph 27:83:1–83:8 Tan J, Siu K, Liu CK (2012a) Contact handling for articulated rigid bodies using lcp. Technical report GIT-GVU-15-01-2, Georgia Institute of Technology, School of Interactive Computing Tan J, Turk G, Liu CK (2012a) Soft body locomotion. ACM Trans Graph 31(4):26:1–26:11

Physically Based Character Animation Synthesis

21

Tan J, Gu Y, Liu CK, Turk G (2014) Learning bicycle stunts. ACM Trans Graph 33(4):50:1–50:12 Thomas F, Johnston O (1995) The illusion of life: Disney animation, Hyperion. Abbeville Press, New York, NY. Todorov E (2006) Optimal control theory. In: Bayesian brain: probabilistic approaches to neural coding. MIT Press, Cambridge, MA. pp 269–298 Treuille A, Lee Y, Popovic’ Z (2007) Near-optimal character animation with continuous control. ACM Trans Graph 26(3):7 Tsai Y-Y, Lin W-C, Cheng KB, Lee J, Lee T-Y (2010) Real-time physics-based 3D biped character animation using an inverted pendulum model. IEEE Trans Vis Comput Graph 16(2):325–337 Tsang W, Singh K, Eugene F (2005) Helping hand: an anatomically accurate inverse dynamics solution for unconstrained hand motion. In: Proceedings of the 2005 ACM SIGGRAPH/ eurographics symposium on computer animation. SCA”05. pp 319–328 Vincent P, Larochelle H, Bengio Y, Manzagol P-A (2008) Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th international conference on machine learning. ICML”08. ACM, pp 1096–1103 Wampler K, Popovic’ Z (2009) Optimal gait and form for animal locomotion. ACM Trans Graph 28 (3):60:1–60:8 Wang JM, Fleet DJ, Hertzmann A (2009) Optimizing walking controllers. ACM Trans Graph 28 (5):168:1–168:8 Wang JM, Fleet DJ, Hertzmann A (2010) Optimizing walking controllers for uncertain inputs and environments. ACM Trans Graph 29(4):73:1–73:8 Wang JM, Hamner SR, Delp SL, Koltun V (2012) Optimizing locomotion controllers using biologically-based actuators and objectives. ACM Trans Graph 31(4):25:1–25:11 Witkin A, Kass M (1988) Spacetime constraints. In: Proceedings of the 15th annual Conference on computer graphics and interactive techniques. SIG- GRAPH”88. ACM, pp 159–168 Wu J-C, Popovic’ Z (2010) Terrain-adaptive bipedal locomotion control. In: ACM SIGGRAPH 2010 papers. SIGGRAPH”10. ACM, pp 72:1–72:10 Ye Y, Liu CK (2010) Optimal feedback control for character animation using an abstract model. In: SIGGRAPH”10: ACM SIGGRAPH 2010 papers. ACM, New York, pp 1–9 Ye Y, Liu CK (2012) Synthesis of detailed hand manipulations using contact sampling. ACM Trans Graph 31(4):41:1–41:10 Yin K, Loken K, van de Panne M (2007) SIMBICON: simple biped locomotion control. In: ACM SIGGRAPH 2007 papers. SIGGRAPH”07 Yin K, Coros S, Beaudoin P, van de Panne M (2008) Continuation methods for adapting simulated skills. ACM Trans Graph 27(3):81 Zordan VB, Celly B, Chiu B, DiLorenzo PC (2006) Breathe easy: model and control of human respiration for computer animation. Graph Models 68:113–132

Data-Driven Hand Animation Synthesis Sophie Jörg

Abstract

As virtual characters are becoming more and more realistic, the need for recording and synthesizing detailed animations for their hands is increasing. Whether we watch virtual characters in a movie, communicate with an embodied conversational agent in real time, or steer an agent ourselves in a virtual reality application or in a game, detailed hand motions have an impact on how we perceive the character. In this chapter, we give an overview of current methods to record and synthesize the subtleties of hand and finger motions. The approaches we present include marker-based and markerless optical systems, depth sensors, and sensored gloves to capture and record hand motions and data-driven algorithms to synthesize movements when only the body or arm motions are known. We furthermore describe the complex anatomy of the hand and how it is being simplified and give insights on our perception of hand motions to convey why creating realistic hand motions is challenging. Keywords

Hand motions • Fingers • Character animation • Data-driven animation • Virtual characters • Motion capture

Introduction In recent years, character animation has made tremendous steps toward realistic virtual agents, with increasingly better solutions for motion capturing body motions, creating highly realistic facial animation, and simulating cloth and hair. With these more and more realistic components, providing plausible hand and finger motions S. Jörg (*) School of Computing, Clemson University, Clemson, SC, USA E-Mail: [email protected] # Springer International Publishing AG 2016 B. Müller, S.I. Wolf (eds.), Handbook of Human Motion, DOI 10.1007/978-3-319-30808-1_13-1

1

2

S. Jörg

has become highly important. We use our hands to explore our environment by touching and manipulating objects, to conduct basic tasks such as eating or writing, to handle complex tools, to create art pieces, or to perform musical instruments (Napier 1980). Hand movements also play a crucial role in communicating information and can even take a main role in conveying meaning in sign languages. However, the complexity of the hand anatomy and our sensitive perception of small details in hand motions make it challenging to record or synthesize accurate hand motions. Furthermore, the differences in size of the hand compared to the body motions complicate the process of capturing both at the same time. Therefore, finger motions are typically animated manually, which is a cumbersome and time-intensive process for the animator. This chapter describes how detailed hand movements can be recorded and synthesized. We first explain why we need detailed hand motions for virtual characters and why these subtle motions are challenging to create, giving further details on the anatomy of the hands and on our perception of hand motions. After a brief state of the art, we then delve into the different ways to record and to synthesize hand motions, focusing on optical systems, depth sensors, and sensored gloves to capture hand motions and on data-driven algorithms to synthesize hand movements depending on the motions of the body. We conclude by describing some of the next challenges in the field.

Applications Why do we need detailed hand motions for virtual characters? Hand and finger motions are ubiquitous in our lives. They are such an integral part of our daily lives that we take those intricate motions for granted. We see and interpret them effortlessly without the need to think about them. However, even small differences in hand motions can change our interpretation of a scene (Jörg et al. 2010). Especially as the realism of virtual characters is increasing, the lack of detailed hand motions becomes disturbing. More details on our perception of hand motions are given in section “Perception of Hand Motions.” The importance of hand motions varies with the application, the task, and the type of character. We start by describing the most common applications and tasks for which we need virtual hand motions. Of course virtual hand motions are typically required for animated characters, be it for entertainment, education, or any other application. Specific situations when hand motions are crucial are during conversations, when manipulating objects, when playing instruments, or when using American Sign Language (ASL). Approaches have been suggested to create finger motions for each of these tasks. Jörg et al. (2012) synthesize hand motions for known body movements for conversational characters with a data-driven approach. Their method is presented in more detail in section “Synthesizing Data-driven Hand Motions.” Manipulation tasks are bound by the physical constraints of the manipulated objects. Physics-based and datadriven approaches and combinations of both are very effective in that area

Data-Driven Hand Animation Synthesis

3

(Liu 2009; Pollard and Zordan 2005; Ye and Liu 2012). Algorithms have been developed to allow virtual characters to play the guitar (ElKoura and Singh 2003) or the piano (Zhu et al. 2013), and techniques have been refined to improve how to capture ASL (Huenerfauth and Lu 2010; Lu and Huenerfauth 2009). For the previously listed applications, hand and finger animations are mostly created without a time constraint. Real-time applications present additional challenges. For embodied conversational agents (ECAs), hand motions are particularly important. In conversations, finger motions can convey meaning and emphasis (McNeill 1992) and even personality and mood (Wang et al. 2016). In many approaches generating gestures for conversational characters, finger motions are not considered separately from the hand, and only a small number of noticeable hand shapes are synthesized, such as pointing with the index finger. As the quality of animation for embodied conversational agents is continuing to rise, the need for more accurate hand motions is increasing. The commercialization of new technologies in virtual reality (VR) produced further applications. Hand motions need to be tracked in real time so that a person in VR can see and use their hands. Furthermore, if multiple persons communicate in VR we need methods to create movements that accurately convey the meaning of their conversations. Finally, more realistic hand animations in games could increase immersion and presence. If a virtual character has to grab and manipulate a wide range of different objects, not all necessary hand shapes can be created in advance, and adjustments are needed in real time. After describing in which applications and for which tasks detailed hand and finger motions are most important, we will review what the challenges are when creating them.

Challenges What are the challenges when capturing or synthesizing hand and finger motions? The main difficulties come from the complex structure of the hand allowing for intricate motions, the smaller size relative to the body, and people’s impressive abilities to recognize and interpret subtleties in hand and finger motions.

Structure of the Hand In their reference work on anatomy and human movement, Palastanga and Soames (2012) characterize “the development of the hand as a sensitive instrument of precision, power and delicacy” as “the acme of human evolution.” The intricate structure of the hand that allows us to perform a wide range of actions is often taken for granted. The hand consists of 27 bones, not counting sesamoid bones. The arrangement of muscles, tendons, and ligaments enables a large variety of possible poses. The phalanges of the five digits – index, middle, ring, and little or pinky finger and the thumb – with their 14 bones in total are connected to the palm. The five metacarpal bones form the palm, and the eight carpal bones connect the metacarpals to the wrist (Jörg 2011; Napier 1980; Palastanga and Soames 2012).

4

S. Jörg

For animation, in many cases, this skeleton is simplified. The eight carpal bones at the wrist are summarized into one wrist joint, and the metacarpals that form the palm might be represented by a simple, rigid structure. A further approximation concerns the joints. The complex spinning, rolling, and sliding motions of the joints are typically approximated with rotations around a point. The number of degrees of freedom of those joints can also be reduced. For example, the metacarpophalangeal joints are represented with Hardy Spicer joints with two degrees of freedom. In reality, a slight medial or lateral rotation of the fingers, so a rotation around the axis that goes through the length of the finger, is possible with the largest angle achievable passively for the little finger. A resulting simple hand skeleton has around 24 degrees of freedom (Jörg 2011; Liu 2008; Parent 2012). However, there is no single standard skeleton when it comes to character animation, and the exact characteristics vary widely based on the application and the desired level of realism. As a consequence of the anatomy of the hand and the arrangement of the tendons, different joints tend to be moved together such as the interphalangeal joints of each finger or the ring finger and the little finger (Häger-Ross and Schieber 2000; Jörg and O’Sullivan 2009). When animating or synthesizing hand motions, we can take advantage of this property by representing hand motions in a reduced set of dimensions (Braido and Zhang 2004; Ciocarlie et al. 2007; Santello et al. 1998). The joints do not move in perfect synchrony, so the dimensionality reduction is not lossless. How much a motion can be simplified and the dimensionality reduced depends on the required quality and detail of the motions. In summary, the structure of the hand is very complex, but can be simplified for animation depending on how accurate and natural the resulting motion should look. How much simplification is possible also depends on our ability to perceive and interpret hand motions.

Perception of Hand Motions People are extremely skilled when it comes to recognizing and interpreting human full body motions. We can recognize friends from far away by their walks and posture before we can see their faces, and we can make a reasonable guess of characteristics such as the sex of a person just based on their motions (Cutting and Kozlowski 1977; Kozlowski and Cutting 1977). This process is effortless; it happens automatically without actively analyzing the motion. In a similar way, our interpretation of gestures during communication is mostly automatic – as is our usage of those gestures. A wide range of insights has been gained on the meaning and interpretation of hand and arm motions, mostly by observation of how we gesture when we communicate (Kendon 2004; McNeill 1992). The subtle motions of the fingers are an inherent part of gestures, but their exact use and perception are rarely examined separately. But as the detailed finger motions are difficult to capture and might be created separately from the body motions, it becomes important to find out when these details affect our impression of a character or a scenario. A perceptual experiment investigating the effect of small delays in finger motion compared to body motion found that viewers could even detect small

Data-Driven Hand Animation Synthesis

5

synchronization errors of 0.1 s in short motion clips of a few seconds. A 0.5 s delay in finger motion altered the interpretation of a 30 s long scenario (Jörg et al. 2010). It has been shown that animated hands and handlike structures can convey emotions (Samadani et al. 2011). More interestingly, hand poses and motions influence the perceived personality of a virtual character with and without the presence of body motions. For example, spreading motions are seen as more extraverted and open than flexion, and small hand motions are regarded as more emotionally stable and agreeable than large motions (Wang et al. 2016). Hand animation is thus essential when conveying meaning and creating convincing virtual characters. Our perception of virtual characters also depends on their appearance. The same body motions on a more realistic humanlike character are rated to be biological (in contrast to artificial) less often than when they are shown on a less detailed and more abstract character (Chaminade et al. 2009). There are less studies on this subject when it comes to hand motions, but it has been shown for hands as well that different brain areas are activated when we watch real and when we watch virtual hand actions (Perani et al. 2001). The appearance of virtual hands also has an impact on our perception in virtual reality applications, notably on the virtual hand illusion. The virtual hand illusion is a body ownership illusion: When one sees a virtual hand in virtual reality that is controlled by and moves in synchrony with one’s own hand, after a short conditioning phase, a threat to the virtual hand can trigger an affective response as if the virtual hand was seen as a part of one’s own body. That means that if a virtual knife hits the virtual hand, a user can get startled and quickly pulls away their hand even if the virtual knife cannot do any real damage. Users feel to a large degree as if the virtual hand was their own. This illusion can be induced for a surprisingly large variety of models. It has been shown to occur for realistic hands, cartoony hands, a zombie and a robot hand, an abstract hand made of a torus and ellipsoids, for a cat claw, and for objects or shapes such as a square, a sphere, a balloon, and a wooden block. However, the illusion is much weaker for objects and strongest for realistic hands (Argelaguet et al. 2016; Lin and Jörg 2016; Ma and Hommel 2015a, b; Yuan and Steed 2010; Zhang and Hommel 2015). While the motion of the hand plays a crucial role in inducing the virtual hand illusion, it is not known as of yet how much offset, latency, or errors are possible without destroying the illusion of ownership. While many questions remain to be solved when it comes to our perception of hand motions, evidence suggests that viewers are able to notice small details that can be crucial for our interpretation of a situation. These perceptual skills contribute to the challenges when creating detailed, realistic finger motions.

State of the Art Motion capturing has become a standard technique when it comes to creating highly realistic body motions for movies, games, or similar applications. While more effective and less expensive systems are still being developed, it has been possible for about two decades to capture body motions with sufficient accuracy for typical

6

S. Jörg

applications with virtual characters. For finger motions, however, it is still not possible to capture accurate data in real time without restrictions. Many technologies and algorithms have been suggested to create detailed finger motions such as automatically synthesizing them based on the body motions or using sensored gloves to measure the finger’s joint angles. With the rise of virtual reality applications in recent years, new demand for capturing hand motions has emerged. Current devices to capture only the hands for the consumer market have been developed, but these devices still lack a high reliability and accuracy. A recent, detailed survey of research literature on finger and hand modeling and animation has been compiled by Wheatland et al. (2015).

Capturing Hand Motions Motion capture has become a widely adopted technology to animate realistic virtual characters for movies and games (Menache 1999). The detailed movements of the fingers are, however, challenging to capture. Several methods have been developed to accomplish this task, each with their advantages and drawbacks. Optical markerbased motion capture, sensored gloves, markerless optical motion capture, and depth sensors are the most popular solutions and are described in this section.

Optical Marker-Based Motion Capture Optical marker-based motion capture records the positions of retroreflective or LED markers and computes a skeleton based on that data. It provides a high accuracy compared to other capturing techniques. The optical motion capturing of finger movements requires careful planning. The denser the markers are placed, the smaller the markers need to be to avoid mislabelings and occurrences where multiple markers are mistaken as one by the cameras. For a rather comprehensive marker set with 20 markers (three on each digit and five on the palm), 6.5 mm spherical or 3/4 spherical retroreflective markers are recommended. The optical cameras used in such a system need to be placed closer to the performer than if only body motions were captured as a higher resolution is needed. Furthermore, cameras on the ground level should be added as the back of the hand where the markers are placed is directed downward during many gestures (Jörg 2011). Even with a careful setup, occlusions, where individual markers are hidden from the cameras, cannot be avoided. For example, when the hand forms a fist, the ends of the fingertips are hidden. Therefore, adequate post-processing is required, which typically involves arduous manual labor. These hurdles are the reason that in applications that do not require real-time animations, such as for animated movies, finger animations are typically created manually. A small number of markers or sensored gloves can be used for previsualization (Kitagawa and Windsor 2008). Several approaches have been suggested to compute an optimal sparse marker set to ensure better marker separation and identification. The detailed finger motions are

Data-Driven Hand Animation Synthesis

7

reconstructed based on that subset of markers, thus reducing the time required for post-processing. It is possible to use dimensionality reduction techniques such as principal component analysis to take advantage of the approximate redundancy in hand motions (Wheatland et al. 2013). Another approach tests which markers can be left out by reconstructing the hand pose finding similar poses in a database and verifying how different the resulting pose is (Mousas et al. 2014). The computed optimal marker sets vary with the approach and the example databases used. Common to most of them is a marker on the thumb and one marker at most on each finger except for the index finger where there might be two markers (Kang et al. 2012; Mousas et al. 2014; Schröder et al. 2015; Wheatland et al. 2013). Once the motions with the reduced marker set are recorded, the full-resolution motions are reconstructed using a database and searching for correspondences. Hoyet et al. (2012) evaluate the perception of a diverse set of finger motions including grasping, opening a bottle, and playing the flute, recorded with reduced marker sets. They use simple methods such as inverse kinematics and interpolation techniques to reconstruct the hand motions. They show that the required number of markers depends on the type of motion. For motions where the fingers only display secondary motions, a static hand pose might be good enough. For a majority of cases, a simple eight-marker hand model with six markers to capture the four fingers (four on the fingertips and two on the finger base of the index and pinky) and two markers to capture the thumb produces motions with sufficiently high quality so that viewers do not realize a difference with a full marker set. Still, for some motions a full marker set using forward kinematics is needed.

Sensored Gloves As recording finger motions with optical, marker-based motion capture systems is challenging, alternative methods have been developed. Sensored gloves directly measure the bending angle of different joints (Sturman and Zeltzer 1994). The number and configuration of bend sensors vary between 5 and 24. Commercially available technologies include fiber optic sensors, piezoresistive sensors, and inertial sensors. The main advantage of sensored gloves is that they create a continuous signal without any occlusions that can be used in real time. However, the accuracy is lower than for marker-based optical motion capture systems. The number of sensors of the gloves is typically small compared to the number of degrees of freedom of the hand. Furthermore, the sensors do not measure the global position of the hand. The gloves might move compared to the skin during the capture; therefore regular recalibrations are necessary if a high accuracy is required, which might interrupt capture sessions. Finally, cross-coupling between sensors adds further challenges, and more complex calibration methods have been developed (Kahlesz et al. 2004; Wang and Neff 2013; Wheatland et al. 2015). Sensored gloves are therefore most useful in applications where a continuous signal is important but accuracy is not crucial. These properties explain why gloves were used, for example, for virtual reality applications or as a baseline for movies where the motions can be adjusted in

8

S. Jörg

post-processing. A survey of glove-based systems has been compiled by Dipietro et al. (2008).

Markerless Optical Systems and Depth Sensors Markerless optical systems and depth sensors have become very popular in the past years. These systems, examples are the Microsoft Kinect or the Leap Motion sensor, only cover a small capture volume, but they are small, light, and inexpensive. Their accuracy depends largely on the captured hand poses and the hand orientations in relation to the sensor. Detailed silhouettes can be recognized with a much higher accuracy than poses where fingers are touching each other and might be hidden from the sensor by the palm or by other fingers or the thumb. Fast motions might also not be recognized accurately. However, algorithms for these systems are currently developed at a high speed, so that further progress is likely in the near future. The Microsoft Kinect is an RGB-D camera, which records color like a standard camera and depth information in addition to it. The Leap Motion sensor on the other hand uses only depth information. Other approaches only use information from a regular camera. Wang and Popović presented a method where the user wears a cloth glove with a colored pattern (Wang and Popović 2009). With the pattern, the pose of the hand can be estimated in single frames by looking for the nearest neighbor hand pose in a database. As a result of several improvements such as approximating the nearest neighbor lookup, the search can be conducted in real time. Many suggestions for algorithms using only monocular videos exist (de La Gorce et al. 2008). They are typically not very accurate and computationally too intensive for real-time tasks as of now. When the recognition problem with or without depth information is reduced to recognizing a specified subset of poses and gestures in a controlled environment, more reliable approaches exist.

Synthesizing Data-Driven Hand Motions An alternative approach to capturing finger motions is to synthesize the complete hand movements. Once a database of motions has been created, data-driven methods can learn from this data and reuse and adapt it as required. An example of how to synthesize hand and finger motions based on the method developed by Jörg et al. (2012) is elaborated in the next paragraph. The goal of the approach is to automatically create motions for all fingers and the thumb, not including the wrist, for conversational situations with a full body motion clip without hand movements as input motion and an available database of motions that includes both hand and body movements. To this aim, the algorithm finds the best hand motion clip from the database taking into account features such as the similarity of the arm motions and the smoothness of consecutive finger motions. The synthesis process consists of the following steps: First, the input motion and the database are segmented based on the wrist velocity. Second, the algorithm searches

Data-Driven Hand Animation Synthesis

9

the database for segments with similar wrist motions than the ones from the input motion applying dynamic time warping to adapt the length of each segment within certain limits. Third, a weighted motion graph is computed. The start node of the graph is connected to the k segments from the database that are most similar to the first input motion segment. Each of these k segments is then connected to the k segments that are most similar to the second input motion segment and so on. For each transition, a cost is calculated by comparing the orientations and angular velocities of the fingers at the last frame of a segment and at the first frame of the next segment. A weighted sum of the corresponding transition and segment costs is applied to each connection, and the shortest path is computed with Dijkstra’s algorithm resulting in a choice of motion segments. Finally, when combining these segments, transitions are created where necessary. This algorithm creates plausible finger motions for conversational situations but excludes any interactions with objects or self-collisions with the virtual character itself and does not take into account any partial information of finger positions. Enhancements to this approach have been developed by Mousas et al. (2015). Ye and Liu’s approach (Ye and Liu 2012) in contrast creates detailed and physically plausible hand manipulations when presented with a full body motion and the movements of objects that are being manipulated. They determine feasible hand-object contact points using a database and create the hand movements according to these contact positions. They select visually diverse solutions that result in intricate motion strategies such as the relocation of contact points. Many further methods have been suggested to create hand and finger motions when starting with body motions, such as capturing them separately from the body and synchronizing them in a post-processing step (Majkowska et al. 2006), procedural algorithms using databases (Aydin and Nakajima 1999), determining key hand poses based on the body motion with a support vector machine (Oshita and Senju 2014), or approaches taking advantage of data-driven and physics-based methods (Kry and Pai 2006; Neff and Seidel 2006), to cite just a few examples. Each method has their advantages and drawbacks. A more exhaustive review of the research literature on hand and finger motion synthesis can be found in Wheatland et al.’s state-of-the-art report (Wheatland et al. 2015).

Conclusion and Future Directions Detailed hand motions are highly important, especially as our expectations toward realistic virtual characters increase. But high-quality, accurate hand motions are still time consuming to capture or synthesize. While methods and techniques in this field are improving at a fast pace, there are still many open questions and processes which need improvements. Future directions include the following topics: • Approaches for a variety of applications: Many approaches that have been suggested specialize on a specific task. One next step would be to develop

10











S. Jörg

approaches that are effective for multiple tasks or to combine approaches and use the optimal approach based on an automatic assessment of the situation. Interpretation of subtle hand motions: Many questions are unsolved when it comes to our interpretation of subtle hand motions. How do details in finger motions influence our understanding and our interpretation when communicating? How much error or latency in hand motions is tolerable? Efficient methods for real-time capturing or synthesis: Recording or synthesizing movement in real time has its own challenges, for example, optimizations over longer segments of motion are not possible. The accuracy and reliability of current devices need to be improved. Improve our understanding of the virtual hand illusion: How much error is allowed? Why are some people more prone to the illusion than others? Our understanding of the reasons for the illusion and the conditions in which it occurs is still limited. Insights could allow for more appropriate feedback to improve communication and manipulation in virtual reality applications (Ebrahimi et al. 2016; Prachyabrued and Borst 2014). Details of hand motions: For animations that aim to look realistic, details such as skin deformations and wrinkles based on the anatomy of the hand or on contacts need to be synthesized. While progress has been made in this area (Andrews et al. 2013; Li and Kry 2014), automatic photo-realism has not been reached yet. Control tools for animators: Finally, methods need to be made accessible and usable for animators, which includes the development of intuitive controls to allow for a more efficient workflow.

Cross-References ▶ 3D Dynamic Pose Estimation Using Cameras and No Markers ▶ 3D Dynamic Pose Estimation Using Reflective Markers or Electromagnetic Sensors ▶ 3D Dynamic Probabilistic Pose Estimation Using Cameras and Reflective Markers ▶ Body Movements in Music Performances: On the Example of Clarinet Players ▶ Data-Driven Character Animation Synthesis ▶ Hand Gesture Synthesis for Conversational Characters ▶ Movement Efficiency in Piano Performances ▶ Perceptual Evaluation of Human Animation ▶ Postural Movements of Violin Players

References Andrews S, Jarvis M, Kry PG (2013) Data-driven fingertip appearance for interactive hand simulation. In: Proceedings of motion on games, MIG ‘13, Dublin, pp 155:177–155:186

Data-Driven Hand Animation Synthesis

11

Argelaguet F, Hoyet L, Trico M, Lecuyer A (2016) The role of interaction in virtual embodiment: effects of the virtual hand representation. In: IEEE virtual reality (VR), Greenville, pp 3–10 Aydin Y, Nakajima M (1999) Database guided computer animation of human grasping using forward and inverse kinematics. Comput Graph 23(1):145–154. doi:10.1016/S0097-8493(98) 00122-8 Braido P, Zhang X (2004) Quantitative analysis of finger motion coordination in hand manipulative and gestic acts. Hum Mov Sci 22(6):661–678. doi:10.1016/j.humov.2003.10.001 Chaminade T, Hodgins J, Kawato M (2009) Anthropomorphism influences perception of computeranimated characters’ actions. Soc Cogn Affect Neurosci 2(3):206–216 Ciocarlie M, Goldfeder C, Goldfeder C (2007) Dimensionality reduction for hand-independent dexterous robotic grasping. In: IEEE/RSJ international conference on intelligent robots and systems, IROS 2007, San Diego, pp 3270–3275 Cutting J, Kozlowski L (1977) Recognizing friends by their walk: gait perception without familiarity cues. Bull Psychon Soc 9(5):353–356 de La Gorce M, Paragios N, Fleet DJ (2008) Model-based hand tracking with texture, shading and self-occlusions. In: IEEE conference on computer vision and pattern recognition, Anchorage, pp 1–8 Dipietro L, Sabatini A, Dario P (2008) A survey of glove-based systems and their applications. IEEE Trans Syst Man Cybern Part C Appl Rev 38(4):461–482 Ebrahimi E, Babu SV, Pagano CC, Jörg S (2016) An empirical evaluation of visuo-haptic feedback on physical reaching behaviors during 3D interaction in real and immersive virtual environments. ACM Trans Appl Percept 13(4):19:1–19:21 ElKoura G, Singh K (2003) Handrix: animating the human hand. In: Proceedings of the ACM SIGGRAPH/Eurographics symposium on computer animation, San Diego, pp 110–119 Häger-Ross C, Schieber MH (2000) Quantifying the independence of human finger movements: comparisons of digits, hands, and movement frequencies. J Neurosci 20(22):8542–8550 Hoyet L, Ryall K, McDonnell R, O’Sullivan C (2012) Sleight of hand: perception of finger motion from reduced marker sets. In: Proceedings of the ACM SIGGRAPH symposium on interactive 3D graphics and games, I3D ‘12, Costa Mesa, pp 79–86 Huenerfauth M, Lu P (2010) Accurate and accessible motion-capture glove calibration for sign language data collection. ACM Trans Access Comput 3(1):2:1–2:32 Jörg S (2011) Perception of body and hand animations for realistic virtual characters. Ph thesis, University of Dublin, Trinity College, Dublin Jörg S, O’Sullivan C (2009) Exploring the dimensionality of finger motion. In: Proceedings of the 9th Eurographics Ireland workshop (EGIE 2009), Dublin, pp 1–11 Jörg S, Hodgins J, O’Sullivan C (2010) The perception of finger motions. In: Proceedings of the 7th symposium on applied perception in graphics and visualization (APGV 2010), Los Angeles, pp 129–133 Jörg S, Hodgins JK, Safonova A (2012) Data-driven finger motion synthesis for gesturing characters. ACM Trans Graph 31(6):189:1–189:7 Kahlesz F, Zachmann G, Klein R (2004) Visual-fidelity dataglove calibration. In: Computer graphics international. IEEE Computer Society, Crete, pp 403–410 Kang C, Wheatland N, Neff M, Zordan V (2012) Automatic hand-over animation for free-hand motions from low resolution input. In: Motion in games. Lecture notes in computer science, vol 7660. Springer, Berlin/Heidelberg, pp 244–253 Kendon A (2004) Gesture – visible action as utterance. Cambridge University Press, Cambridge Kitagawa M, Windsor B (2008) MoCap for artists: workflow and techniques for motion capture. Focal Press, Amsterdam/Boston Kozlowski LT, Cutting JE (1977) Recognizing the sex of a walker from a dynamic point-light display. Percept Psychophys 21(6):575–580 Kry PG, Pai DK (2006) Interaction capture and synthesis. ACM Trans Graph 25(3):872–880 Li P, Kry PG (2014) Multi-layer skin simulation with adaptive constraints. In: Proceedings of the 7th international conference on motion in games, MIG ‘14, Playa Vista, pp 171–176

12

S. Jörg

Lin L, Jörg S (2016) Need a hand?: how appearance affects the virtual hand illusion. In: Proceedings of the ACM symposium on applied perception, SAP ‘16, Anaheim, pp 69–76 Liu CK (2008) Synthesis of interactive hand manipulation. In: Proceedings of the ACM SIGGRAPH/Eurographics symposium on computer animation, Dublin, pp 163–171 Liu CK (2009) Dextrous manipulation from a grasping pose. ACM Trans Graph 28(3):3:1–3:6 Lu P, Huenerfauth M (2009) Accessible motion-capture glove calibration protocol for recording sign language data from deaf subjects. In: Proceedings of the 11th international ACM SIGACCESS conference on computers and accessibility, pp 83–90 Ma K, Hommel B (2015a) Body-ownership for actively operated non-corporeal objects. Conscious Cogn 36:75–86 Ma K, Hommel B (2015b) The role of agency for perceived ownership in the virtual hand illusion. Conscious Cogn 36:277–288 Majkowska A, Zordan VB, Faloutsos P (2006) Automatic splicing for hand and body animations. In: Proceedings of the ACM SIGGRAPH/Eurographics symposium on computer animation. Boston, MA, USA, pp 309–316 McNeill D (1992) Hand and mind: what gestures reveal about thought. The University of Chicago Press, Chicago Menache A (1999) Understanding motion capture for computer animation and video games. Morgan Kaufmann Publishers Inc., San Francisco Mousas C, Newbury P, Anagnostopoulos CN (2014) Efficient hand-over motion reconstruction. In: Proceedings of the 22nd international conference in Central Europe on computer graphics, visualization and computer vision, WSCG ‘14. Plzen, Czech Republic, pp 111–120 Mousas C, Anagnostopoulos CN, Newbury P (2015) Finger motion estimation and synthesis for gesturing characters. In: Proceedings of the 31st spring conference on computer graphics, SCCG ‘15. Smolenice, Slovakia, pp 97–104 Napier J (1980) Hands. Pantheon Books, New York Neff M, Seidel HP (2006) Modeling relaxed hand shape for character animation. In: Articulated Motion and deformable objects. Lecture notes in computer science, vol 4069. Springer, Berlin/ Heidelberg, pp 262–270 Oshita M, Senju Y (2014) Generating hand motion from body motion using key hand poses. In: Proceedings of the 7th international conference on motion in games, MIG ‘14. Playa Vista, CA, USA, pp 147–151 Palastanga N, Soames R (2012) Anatomy and human movement – structure and function, 6th edn. Butterworth Heinemann/Elsevier, Edinburgh/New York Parent R (2012) Computer animation: algorithms and techniques, 3rd edn. Morgan Kaufmann, Burlington Perani D, Fazio F, Borghese NA, Tettamanti M, Ferrari S, Decety J, Gilardi MC (2001) Different brain correlates for watching real and virtual hand actions. Neuroimage 14:749–758 Pollard NS, Zordan VB (2005) Physically based grasping control from example. In: Proceedings of the ACM SIGGRAPH/Eurographics symposium on computer animation. Los Angeles, CA, USA, pp 311–318 Prachyabrued M, Borst CW (2014) Visual feedback for virtual grasping. In: IEEE symposium on 3D User Interfaces, 3DUI, 2014. Minneapolis, MN, USA, pp 19–26 Samadani AA, DeHart BJ, Robinson K, Kulic D, Kubica E, Gorbet R (2011) A study of human performance in recognizing expressive hand movements. In: IEEE international symposium on robot and human interaction communication. Atlanta, GA, USA Santello M, Flanders M, Soechting JF (1998) Postural hand synergies for tool use. J Neurosci 18 (23):10,105–10,115 Schröder M, Maycock J, Botsch M (2015) Reduced marker layouts for optical motion capture of hands. In: Proceedings of the 8th ACM SIGGRAPH conference on motion in games, MIG ‘15. Paris, France, pp 7–16 Sturman DJ, Zeltzer D (1994) A survey of glove-based input. IEEE Comput Graph Appl 14 (1):30–39

Data-Driven Hand Animation Synthesis

13

Wang Y, Neff M (2013) Data-driven glove calibration for hand motion capture. In: Proceedings of the 12th ACM SIGGRAPH/Eurographics symposium on computer animation, SCA ‘13. Anaheim, CA, USA, pp 15–24 Wang RY, Popović J (2009) Real-time hand-tracking with a color glove. ACM Trans Graph 28 (3):63 Wang Y, Tree JEF, Walker M, Neff M (2016) Assessing the impact of hand motion on virtual character personality. ACM Trans Appl Percept 13(2):9:1–9:23 Wheatland N, Jörg S, Zordan V (2013): Automatic hand-over animation using principle component analysis. In: Proceedings of motion on games, MIG ‘13. Zürich, Switzerland, pp 175:197–175:202. ACM Wheatland N, Wang Y, Song H, Neff M, Zordan V, Jörg S (2015) State of the art in hand and finger modeling and animation. Comput Graph Forum 34(2):735–760 Ye Y, Liu CK (2012) Synthesis of detailed hand manipulations using contact sampling. ACM Trans Graph 31(4):245–254 Yuan Y, Steed A (2010) Is the rubber hand illusion induced by immersive virtual reality? Virtual Reality Conference (VR). IEEE Computer Soc. Waltham, MA, USA, pp 95–102 Zhang J, Hommel B (2016) Body ownership and response to threat. Psychol Res 80(6):1020–1029 Zhu Y, Ramakrishnan AS, Hamann B, Neff M (2013) A system for automatic animation of piano performances. Comput Anim Virtual Worlds 24(5):445–457

Example-Based Skinning Animation State of the Art Tomohiko Mukai

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . State-of-the-Art Skinning Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Example-Based Skinning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Linear Blend Skinning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Skinning Weight Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Skinning Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Example-Based Helper Bone Rigging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Per-Example Optimization of Helper Bone Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Helper Bone Controller Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2 4 5 5 6 7 9 9 11 12 14 18 18 19

Abstract

The skinning technique has been widely used for synthesizing the natural skin deformation of human-like characters in a broad range of computer graphics applications. Many skinning methods have been proposed to improve the deformation quality while achieving real-time computational performance. The design of skinned character models, however, requires heavy manual labor even for experienced digital artists with professional software and tools. This chapter presents an introduction to an example-based skinning method, which builds a

T. Mukai (*) Tokai University, Tokyo, Japan e-mail: [email protected]; [email protected] # Springer International Publishing AG 2016 B. Müller, S.I. Wolf (eds.), Handbook of Human Motion, DOI 10.1007/978-3-319-30808-1_14-1

1

2

T. Mukai

skinned character model using an example sequence of handcrafted or physically simulated skin deformations. Various types of machine learning techniques and statistical analysis methods have been proposed for example-based skinning. In this chapter, we first review state-of-the-art skinning techniques, especially for a standard skinning model called linear blend skinning that uses a virtual skeleton hierarchy to drive the skin deformation. Next, we describe several automated methods for building a skeleton-based skinned character model using example skin shapes. We introduce skinning decomposition methods that convert a shape animation sequence into a skinned character and its skeleton motion. We also explain a practical application of skinning decomposition, which builds a so-called helper bone rig from an example animation sequence. We finally discuss the future directions of example-based skinning techniques. Keywords

Animation • Rigging • Linear blend skinning • Helper bone

Introduction In computer animation, natural skin deformation is vital for producing lifelike characters, and there are many techniques and professional software packages for creating expressive skin animations. For example, a physics-based volumetric simulation is a typical approach for generating physically valid skin deformations. A physics simulation, however, requires large computational costs and careful the design of a character’s musculoskeletal model. Moreover, a numerical simulation is inferior to kinematic or analytical models in terms of controllability and stability. The animation of a human character in interactive graphics applications is often created using a skeleton-based skinning method that embeds a virtual skeletal structure into the solid geometry of the character’s surface model. This technique deforms the skin surface according to the movement of the virtual skeleton on the basis of simple geometric operations and has become a de facto standard skinning method because of its simplicity and efficiency. The typical production procedure of skinning animations is composed of the following three processes: Modeling A surface geometry of a character model is created as a polygonal mesh or parametric surface, and its material appearance is designed via shading models and texture images. The character model is often created in the rest pose such as the so-called T-stance shown in Fig. 1a. In this chapter, we assume that the skin geometry is constructed as a polygonal mesh. Rigging A virtual skeleton hierarchy, typically composed of rigid bones and rotational joints, is bound to the character geometry. This process requires specifying the skinning weights that describe the relative influence of each bone over the vertex of skin mesh (Fig. 1b). This process is called rigging, character setup, or simply setup.

Example-Based Skinning Animation

a

3

b

c

Skeleton Skin

Modeling in the T-stance

Rigging

Animation

Fig. 1 Typical process for creating skeleton-driven skin animation

Animation An animation of the skinned character is created as a time series of rotations of the skeleton joints. The deformation of the skin surface is driven by the joint movements (Fig. 1c). In this chapter, we especially focus on the rigging of the skeleton-driven character. Building a good character rig is a key requirement for synthesizing natural and fine skin deformation while providing a full range of motion control for animators, i.e., an ill-designed skeleton and rig cause unnatural deformation even if the artists carefully design the character’s geometry and skeleton movement. Moreover, the skeleton rig should be as simple and intuitive to manipulate as possible so that the animators can easily create the character motion. Owing to this trade-off between quality and manipulability, the rigging of a complex human-like character is still a challenging task even for skilled riggers and animators using professional tools, which requires trial and error and artistic experience and intuition. Many researchers have tried to develop automatic and semiautomatic methods of building a character rig. In particular, recent studies have primarily focused on datadriven approaches for constructing an optimal skeleton-based character rigs by using example data. Example-based skinning methods optimize the structure of the skeleton hierarchy and skinning weights so that the skinned animation approximates the example shapes well. Various types of machine learning techniques and statistical analysis methods have been proposed for stably and efficiently obtaining an accurate approximation. The rest of this chapter is structured as follows. In the following section, we review state-of-the-art skinning techniques. Next, we explain skinning decomposition methods that build skinned character models from example skin shapes. We also introduce a practical skinning model called a helper bone rig and its example-based construction algorithm based on the skinning decomposition method. Finally, we will discuss the future prospects of example-based skinning techniques.

4

T. Mukai

State-of-the-Art Skinning Techniques Accurate skin deformation is often generated using physics-based musculoskeletal (Li et al. 2013) or volumetric (Fan et al. 2014) simulations. These simulation-based methods are unsuitable for intuitively or intentionally changing the style of deformations and real-time applications because of their high computational cost. Moreover, the manual rigging of a character’s musculoskeletal model is a very challenging task. A data-driven approach learns the dynamical properties of softtissue materials from the example data (Shi et al. 2008) but still suffers from the computational complexity. The kinematic skinning method efficiently computes the vertex positions of the skin mesh on the basis of the pose of the internal skeleton structure. Linear blend skinning (LBS) is a standard technique for synthesizing skin deformation in realtime applications, which computes a deformed vertex position by transforming each vertex through a weighted combination of bone transformation matrices (MagnenatThalmann et al. 1988). Multiweight enveloping (Merry et al. 2006; Wang and Phillips 2002) extends the LBS model by adding a weight to each matrix element. The nonlinear skinning technique uses a dual quaternion instead of a transformation matrix to overcome LBS artifacts, but a side effect called a bulging artifact is still caused while bending (Kavan et al. 2007). Several hybrid approaches have been proposed to blend bone transformations with fewer artifacts. Kavan and Sorkine (2012) proposed a blending scheme that decomposes a bone rotation into swing and twist components and separately blends each component using different algorithms. The stretchable and twistable bone model (Jacobson and Sorkine 2011) uses different weighting functions for scaling and bone twisting, respectively. These methods successfully synthesize artifact-free skin deformation and do not discuss stylized skin deformation such as muscle–skin deformation. EigenSkin constructs an efficient model of the additive vertex displacement for LBS using a principal component analysis (Kry et al. 2002). These methods successfully synthesize artifact-free skin deformation. Naive LBS, however, is still a de facto standard skinning model in interactive graphics applications because of its efficiency and simplicity. One practical solution for minimizing LBS artifacts is to add extra bones called helper bones. The helper bone rig has become a practical real-time technology for synthesizing stylized skin deformation based on LBS. The helper bone is a secondary rig that influences skin deformation, and its pose is procedurally controlled according to the pose of the primary skeleton. Mohr and Gleicher (2003) first introduced the basic concept of a helper bone system. In their work, helper bones are generated by subdividing primary bones, and a scaling parameter is procedurally controlled according to the twist angle of the primary bone thus minimizing the candy-wrapper artifact. This technique has been widely used in many products because of its efficiency, flexibility, and compatibility with the standard graphics pipeline (Kim and Kim 2011; Parks 2005). Although this technique provides a flexible yet efficient synthesis of a variety of expressive skin deformations, rigging with helper bones is still a labor-intensive process. We have developed an example-

Example-Based Skinning Animation

5

based technique to build helper bone rigs, as explained in Section “Example-Based Helper Bone Rigging.” Scattered data interpolation such as pose-space deformation (PSD) is another approach for synthesizing skin deformation from example shapes (Kurihara and Miyata 2004; Lewis et al. 2000; Sloan et al. 2001). PSD uses radial basis function interpolation for blending example shapes according to the skeleton pose. This technique produces high-quality skin animation via intuitive designing operations. However, the PSD model requires a runtime engine to store all example data in memory. Furthermore, the computational cost of PSD increases in proportion to the number of examples. Consequently, many example shapes cannot be used in realtime systems with a limited memory capacity, such as mobile devices. The machine learning-based approach for skin deformation analyzes the relationship between a skeleton pose and its corresponding skin shape using a large set of samples. A regression technique was proposed to estimate a linear mapping from the skeletal pose to the deformation gradient of the skin surface polygons (Pulli and Popović 2007). The seminal work of Park and Hodgins (2008) predicts an optimal mapping from the skeletal motion to the dynamic motion of several markers placed on a skin surface. Neumann et al. (2013) proposed a statistical model of skin deformation that is learned from human skin shapes captured with range scan devices. These methods construct a regression model from a set of example skeleton poses and skin shapes.

Example-Based Skinning Linear Blend Skinning LBS (Magnenat-Thalmann et al. 1988) is a standard method that has been widely used for a broad range of interactive applications such as games. Most real-time graphics engines support the LBS-based rig because of its simplicity and efficiency. The LBS model computes a deformed vertex position by transforming each vertex through a weighted combination of bone transformation matrices. Given a skeleton with P bones, the global transformation matrix of the p-th bone is denoted as a 4  4 homogeneous matrix Gp, p  {1,  , P}. Let Ḡp and vj , j  {1,  , J} denote the matrix and positions of the j-th vertex on a skin in the initial T-stance pose, and 1. The global let the skinning transformation matrix be represented by Mp ¼ Gp G p transformation matrix Gp can be decomposed into a product of the local transformation matrix Lp and the parent’s global transformation matrix as Gp = Gϕ(p) Lp where ϕ( p)  {1,  , P} is the parent of the p-th bone. The deformed vertex position vj is computed with nonnegative skinning weights wj,p as vj ¼

X p

wj, p Mp vj ;

(1)

6

T. Mukai

where the affinity constraint p wj, p = 1, 8j is satisfied. The number of nonzero skinning weights at each vertex is assumed to be less than a constant number k, which is expressed as p |wj, p|1  k, 8p, where ||α denotes the Lα norm. This sparsity assumption can be interpreted as that each vertex moves according to the transformations of a few spatially neighboring bones. Moreover, this constraint ensures the efficient computation of a skin animation regardless of the total number of skeleton bones since the uninfluenced bones can be eliminated in the computation of the vertex deformation.

Skinning Weight Optimization An LBS rig is built by designing a skeleton hierarchy, including the initial bone transformation Ḡp and the parent–child relation ϕ( p) of each bone and the corresponding skinning weights wj,p of each vertex. Since the skinning weights are more difficult to design than the skeleton structure owing to the larger number of free parameters, several methods have been proposed to optimize the skinning weights for an arbitrarily skeleton structure and skin geometry. The Pinocchio system (Baran and Popović 2007) uses an analogy to heat diffusion over the skin mesh for estimating shape-aware skinning weights. The bounded biharmonic weight model (Jacobson et al. 2011) produces a smooth distribution of the skinning weights that minimizes the Laplacian energy over the character’s volumetric structure. The deformation-aware method (Kavan and Sorkine 2012) optimizes the skinning weights to minimize an elastic deformation energy over a certain range of skeletal poses. These methods make an assumption regarding the material properties of the skin surface, e.g., they assume that the physical properties of the character skin, such as the elasticity, stiffness, and friction, do not change over the entire body surface. However, this assumption is somewhat optimistic because many characters should have heterogeneous distribution of material properties that include those of bones, muscles, and fats. Further investigation is therefore demanded to improve the quality of the weight optimization. Alternative approach uses a set of example data of the skin shape and skeleton poses (Miller et al. 2011; Mohr and Gleicher 2003; Wang and Phillips 2002). Given the prior information of the skeleton hierarchy, and N examples of a pair b p, n and skin shape b of skeleton pose M vj, n , n  {1,  , N}, the optimal skinning  weights wj,p are obtained by solving a constrained problem to minimize the squared error of the vertex positions between the example shape b vj, n and the deformed skin vj as 2   X X  X ^ p, n vj  ; wj, p M vj , n  fw g ¼ argminj ^   w n p j 2

(2)

Example-Based Skinning Animation

7

subject to

X p

wj, p ¼ 1,

8j;

(3)

wj, p  0, 8j, p; X jwj, p j  k, 8j; 1

(4) (5)

p

where the three constraints are (Eq. 3) the affinity constraint, (Eq. 4) nonnegativity constraint, and (Eq. 5) sparsity constraint. This constrained least-squares problem can be approximately solved using a quadratic programming (QP) solver by relaxing the sparsity constraint, as detailed in Section “Weight Update Step.” We can alternatively use the simpler nonnegative least-squares method (James and Twigg 2005) if no sparsity constraint is imposed. These weight optimization techniques produce a good initial guess for the skinning weights for an arbitrary shape of the skin mesh and skeleton hierarchy. Manual refinement, however, is still required in practice to eliminate undesirable artifacts and to add artist-directed stylized skin behavior.

Skinning Decomposition Several algorithms have been proposed to extract both the skinning weights and optimal bone transformations from a set of example shapes, which is called the skinning decomposition problem. In other words, the goal of skinning decomposition is to convert a shape animation into a bone-or skeleton-based skinned animation. Given a set of N example shapes b vj, n , the goal of skinning decomposition is to   find the optimal skinning weights wj,p and skinning matrix Mp,n to best approximate the example shapes in a least-squares sense as 2   X X  X  wj, p Mp vj  vj , n  fw , M g ¼ argmin ^   w, M n p j

(6)

2

subject to the affinity, nonnegativity, and sparsity constraints on the skinning weights. The skinning mesh animation algorithm (James and Twigg 2005) uses a meanshift clustering algorithm for identifying rigid or near-rigid bone transformations and applies the nonnegative least-squares method to estimate the skinning weights. Kavan et al. (2010) proposed a least-squares method with a dimensionality reduction to efficiently extract the nonrigid bone movements and skinning weights from an example shape animation. The smooth skinning decomposition with rigid bones (SSDR) algorithm (Le and Deng 2012) introduced a rigidity constraint on the bone

8

T. Mukai

transformation M, which requires that M be a product of the rotation matrix R and translation matrix T as M = T R, where RT R = I and det(R) = 1 are satisfied. This algorithm was later extended to identify hierarchically structured bone transformations from a shape animation sequence (Le and Deng 2014). The SSDR algorithm is designed to meet the requirements of the production of interactive graphics systems: Three types of constraints on the skinning weights are always assumed in many graphics software packages, and the rigidity constraint on the bone transformations is also necessary for most game engines. The SSDR algorithm uses a block coordinate descent algorithm that optimizes the transformation of each bone or skinning weight at each subiteration while fixing the other variables. For instance, the weight update step optimizes the skinning weights while fixing all bone transformations, and the transformation update step optimizes the transformation of each bone while fixing the transformations of the other remaining bones and the skinning weights. These alternative processes are repeated until the objective function converges. The details of each subiteration are described below.

Initialization In the first step, each vertex of the skin mesh is bound to one bone with a skinning weight of one. The initialization problem then becomes the clustering of vertices into the specified number of clusters, where vertices in the same cluster have similar rigid transformations. For each cluster, a rigid bone transformation is fitted to relate the vertex positions in the rest pose to the vertex positions at each example. Since the quality of this motion-driven clustering has a great effect on the remaining skinning decomposition steps, several clustering algorithms have been explored, such as mean shift clustering (James and Twigg 2005), K-means clustering (Le and Deng 2012), and Linde–Buzo–Gray algorithm (Le and Deng 2014), for stably obtaining an accurate result. It is possible to apply a more sophisticated algorithm for enhancing the stability and efficiency of the clustering, which is an important open question. Weight Update Step  The optimal skinning weights wj,p are updated while fixing all bone transformations Mp,n in Eq. 6. The resulting optimization problem is rewritten as the following per-vertex constrained least squares problem: wj ¼ argmin Wj

subject to wj  0 Where

  wj  ¼ 1 1

  wj   k, 0

2 M v T ^ 4 1, 1 j ⋮ wj ¼ wj, 1   wj, p , A ¼ M1, N vj 

  Awj  b2 ; 2

8j;

... ⋱ 

3 Mp, 1 vj h iT ⋮ 5, b ¼ ^vTj, 1   ^vTj, N : Mp, N vj

(7)

Example-Based Skinning Animation

9

This problem is difficult to directly solve using a standard numerical solver owing to the L0-norm constraint |wj|0  k. Hence, an approximate solution is used to relax the sparsity constraint (Le and Deng 2012). Specifically, the L0-norm constraint is first excluded from Eq. 7, and the resulting QP problem is solved using a stock numerical solver. When the solution does not satisfy the L0-norm constraint, the k bones requiring the most effort are selected, and the weights for other bones are set to zero. The final solution is obtained by solving the QP problem again with the selected k bones.

Bone Transformation Update Step The transformation of the p-th bone for each example shape is optimized while fixing the skinning weights and the transformations of remaining P  1 bones at each subiteration. Each subproblem becomes a per-example weighted absolute orientation problem given by n o Rp, n, Tp, n ¼ argmin Rp, n , T p, n RTp, n Rp, n ¼ I,

2   X  X  wj, p T p, n Rp, n vj  ; vj , n  ^   p j   det Rp, n ¼ 1,

(8)

2

8p, n;

 ^  are obtained by the closed-form method. Please refer where the optimal Rp,n and T p, n to Le and Deng (2012) for further details.

Applications The skinning decomposition algorithm allows for the conversion of any type of shape animation sequence into a skeleton-or bone-based skinned animation. For example, the animation of soft body objects is often created using numerical simulations such as a finite element method or mass–spring network models. A facial expression is created using a blendshape animation technique. These advanced techniques, however, are not supported by some graphics engines, especially those for mobile devices. Therefore, a complex skin behavior created using professional content creation tools is converted into an LBS-based skinning animation that is widely supported by most engines. This procedure is fully compatible with a standard production workflow. The main drawback is the lack of detail in the skin deformation caused by wrinkles, self-collisions, etc., because the sparse set of rigid bones merely linearly approximates the example deformations.

Example-Based Helper Bone Rigging The helper bone rig has become a practical real-time technology for synthesizing stylized skin deformation based on LBS (Kim and Kim 2011; Mohr and Gleicher 2003; Parks 2005). The helper bone is a secondary rig that influences the skin

10

T. Mukai

deformation, and its pose is procedurally controlled according to the pose of the primary skeleton as illustrated in Fig. 2. Although the helper bone rig is manually designed in common practice, it requires a labor-intensive process for developing a procedural bone controller and the skinning weights. We have proposed an examplebased rigging method (Mukai 2015). Our method uses a two-step algorithm to add helper bones to a predesigned primary skeleton rig using example pairs of the primary skeletal pose and desirable skin shape. In the first step, our system estimates the optimal skinning weights and helper bone transformation for each example. We used a modified version of the SSDR algorithm to incrementally insert rigid helper bones into the character rig. In the second step, the helper bone controller is constructed as a polynomial function of the primary skeleton movement. Here, we first formulate LBS with P primary bones and H helper bones as vj ¼

X p

wj, p Mp þ

X

! wj, h Sh vj ;

(9)

h

where Sh and wj,h denote the skinning matrix of the h-th helper bone and the corresponding skinning weight, respectively. The first term represents the skin deformations driven by the primary skeleton, and the second term contributes additional control of the deformations using helper bones. The number of helper bones H is manually set to balance the deformation quality and computational cost. Helper bones are procedurally controlled with simple expressions according to the pose of the primary skeleton in common practice. We use a polynomial function fh that maps the primary skeleton pose Lp to the helper bone transformation Sh as

Initial pose Helper bones

Skin mesh Primary bone

Primary bone

Deformation

a

b

Skin deformation Rotation of primary bone Candy-wrap artifact

Linear blend skinning

Procedural control

Linear blend skinning + Helper bones

Fig. 2 Linear blend skinning with procedural control of helper bones

Example-Based Skinning Animation

11

Sh  f h ðL1 , L2 ,   , LP Þ:

(10)

Our helper bone rigging technique builds the regression function fh and skinning weights using example shapes and skeleton poses. Given a set of N pairs of an ^ p, n, our problem is formulated as a example shape and a primary skeleton pose ^ vj, n, M constrained least-squares problem that minimizes the squared reconstruction error between the example shape and the skin mesh with respect to the skinning weights wj,p, wj,h and the skinning matrices Sh,n as 

fw , S g ¼ argmin w, S

2   X X  X X  ^ p, n vj  wj, p M wj, h Sh, n vj  vj, n  ^   n p j h

(11)

2

subject to the affinity, nonnegativity, and sparsity constraints on the skinning weights and the rigidity constraint on the skinning matrix Sh, n.

Per-Example Optimization of Helper Bone Transformations The optimal rigid transformations of the helper bones are first estimated for each example shape using the optimization procedure summarized in Algorithm 1. Our system inserts the specified number of helper bones into the character rig in an incremental manner. Then, the helper bone transformations for each example and the skinning weights are optimized using an iterative method. The overall procedure is similar to the SSDR algorithm, where the skinning weights and bone transformations are alternately optimized by subdividing the optimization problem (Eq. 11) into subproblems of bone insertion, skinning weight optimization, and bone transformation optimization. We used the optimization techniques from the SSDR algorithm to solve these three subproblems. The main difference here is that the SSDR algorithm does not have the prior information about the transformable bones but only the information about the number. Hence, the SSDR algorithm applies a clustering technique to simultaneously estimate an initial bone configuration. In our method, the primary skeleton and its example poses are given in the problem. This method inserts helper bones using incremental optimization with a hard constraint on the primary bone transformation. Algorithm 1 Optimization of helper bone transformations and skinning weights Input: {^ vj}, {Ḡp¯}, {^ vp,n}, {Ĝp,n}, H Output: {Sh,n}, {wj,p}, {wj,h} 1: {Sh,n} = I, 8h, n, {wj,h} = 0, 8j, h 2: Initialize {wj,p} 3: repeat 4: Insert a new helper bone 5: Update helper bone transformations {Sh,n}

12

T. Mukai

6: Update skinning weights {wj,p} and {wj,h} 7: Remove insignificant helper bones 8: until The number of inserted helper bones is reached 9: repeat 10: Update helper bone transformations {Sh,n} 11: Update skinning weights {wj,p} and {wj,h} 12: until The error threshold is reached

Incremental bone insertion Our technique uses an incremental method to insert a new helper bone into the region where the largest reconstruction errors occur. For example, if the new LBS causes an elbow-collapse artifact, a helper bone is generated around the elbow to minimize this artifact. First, our system searches for a vertex with the largest reconstruction error, which is computed as 2   X  X X  ^ p, n vj  j ¼ argmin wj, p M wj, h Sh, n vj  vj , n  ^   j n p h 

(12)

2

Second, we compute a rigid transformation that closely approximates the displacement of the identified vertex and its one-ring neighbors from their initial position by solving an absolute orientation problem (Horn 1987). Then, a new helper bone is generated using the estimated transformation as its own transformation. Next, the skinning weights wj,p and wj,h and the transformation matrix of all helper bones are updated by solving constrained least-squares problems. Finally, the system removes the insignificant helper bones that have little influence on the skin deformation. Our current implementation removes the helper bones that influence less than four vertices. This process is repeated until the specified number of helper bones is achieved. Weight and bone transformation update After inserting the specified number of helper bones, the skinning weight update step (Section “Weight Update Step”) and bone transformation update step (Section “Bone Transformation Update Step”), except for the primary bones, are alternately iterated until the approximation error converges.

Helper Bone Controller Construction The helper bone controller is constructed by learning a mapping from the primary bone transformations to the helper bone transformations. We use a linear regression model to represent the mapping from the local transformation of primary bones Lp to that of helper bones Lh. Transformation parameterization The local transformation matrix Lh is extracted from the estimated skinning matrix Sh. By definition, Sh is decomposed into a product of the transformation matrices as

Example-Based Skinning Animation

13

1 ; Sh ¼ GϕðhÞ Lh G h

(13)

where ϕ(h)  {1,  , P} is the parent primary bone of the h-th helper bone, which is selected to minimize the approximation error as detailed later. The initial transformation matrix Gh is an unknown rigid transformation matrix. Assuming that the local transformation matrix of the helper bones is the identity matrix in the initial stance pose, Ḡh is equal to that of the parent primary bone Ḡϕ(h) by the definition of forward kinematics. Therefore, we can uniquely extract the local transformation matrix by  Lh ¼ G1 ϕðhÞ Sh GϕðhÞ :

(14)

The extracted local matrix is parameterized with fewer variables to reduce the dimensionality of the regression problem. Using a rigid transformation, Lh can be parameterized using a combination of a translation vector th  ℜ3 and bone rotation variables rh  SO(3). We used exponential maps for rh (Grassia 1998). This results  T in the transformation of Lh into a six-dimensional vector form tTh rTh  ℜ6 . In addition, the local transformation of the primary bone Lp is parameterized by its animating variables. For simplicity, we have assumed that each primary bone does not have a translation or scale key and that a bone rotation is always represented by exponential maps. Regression model construction We have used a χ-th-order polynomial function as a regression model. The transformation parameter of each helper bone is approximated by 

H

th rh



   f n L 1 , L 2 ,    , Lp h iT ¼ Fh xT1    xTp 1 ;

(15)

1

where xp  ℜ4 χ becomes an independent variable vector that is composed of all of the variables of the χ-th-order polynomial of rp. For example, if we take χ = 2, the independent variable vector from r = [r1, r2, r3] is x = [r1, r2, r3, r21, r22, r23, r1r2, r1r3, r2r3]. The regression matrix for the h-th helper bone Fh  ℜ6(1+p dim(xp)) is estimated from examples using the least-squares technique. In addition, we add a sparsity constraint to minimize the number of nonzero regression coefficients to generate a simpler model. The least-squares problem with the sparsity constraint can be formulated as a Lasso problem (Tibshirani 2011) given by Fh ¼ argmin jYh  Fh j22 þ λjFh j1 ; Fh

where

(16)

14

T. Mukai

2

3 X1, 1    X1, N 6 ... ⋱ ... 7 7 X¼6 4 XP, 1    XP, N 5 1  1   t    th, N Y h ¼ h, 1 rh, 1    rh, N and λ is the positive shrinkage parameter that controls the trade-off between the model accuracy and the number of nonzero coefficients. Using a stock Lasso solver, we can efficiently solve this problem. Parent bone selection There is only one problem that remains: the selection of an appropriate parent bone ϕ(h) for each helper bone. This is a discrete optimization problem. Generally, since the number of primary bones is smaller, we can implement an exhaustive search to find the optimal one. Further, the best parent bone ϕ(h) can be identified by evaluating Eqs. 14 and 15. Here, each primary bone can be used as ϕ(h), and the best one that minimizes the objective function (Eq. 16) can be selected.

Experimental Results We evaluated the approximation capability and computational performance of our helper bone rigging system. For all experiments, the parameter k, which is the maximum number of transformations to be blended, was fixed at 4. The reconstruction error was evaluated using root mean square (RMS) error of the vertex position. The optimization procedure of the helper bone transformation is parallelized over vertices, helper bones, or examples using Intel Threading Building Blocks. The computational timing was measured at 3.4 GHz on a Core i7-4770 CPU (eight logical processors) with 16 GB of RAM.

Test Dataset We used a muscle function from Autodesk Maya to synthesize an example skin shape from a skeleton pose. The muscle system emulates static muscle–skin deformation with a skeletal pose. The muscle system also produces a dynamic deformation that is caused by bone acceleration and the inertia of the muscles. For our experiment, we used only static deformation because our method supports only static mapping from a skeleton pose to a skin shape. The test character model is a sample asset of a Maya tutorial, as shown in Fig. 3. The height of the leg model is 200 cm in the initial pose. The skeleton has P = 3 animating bones and five degrees of freedom (DOFs) including hip swing and twist (3 DOFs), knee bend (1 DOF), and ankle bend (1 DOF). The eleven muscles expand and contract according to the movement of the primary skeleton. They drive the deformation of 663 vertices using a proprietary algorithm. A test dataset was created by uniformly sampling the bone rotation of the primary skeleton every 20 within each range of joint motion. Consequently, we created 6750

Example-Based Skinning Animation

a

15

c

b

Primary skeleton

Muscle

Skin mesh

Fig. 3 Character model used to create an example pose and skin shape. The skin deformation is driven by a primary skeleton and virtual muscle, which is a built-in function of Autodesk Maya

example pairs consisting of a skeleton pose and skin shape by discretizing the DOFs of the hip swing, hip twist, knee bend, and ankle bend into 6  6, 9, 5, and 5 levels, respectively.

Evaluating the Optimized Bone Transformations In the first experiment, different numbers of helper bones were inserted into the character rig while fixing the polynomial order χ = 2 and the shrinkage parameter λ = 0. Figure 4 shows the convergence of the reconstruction error with the number of helper bones and the number of iterations. The reconstruction error decreased according to the number of helper bones. In addition, there were no significant differences between the reconstruction error of four helper bones and that of five helper bones. This result indicates that the approximation almost converged at four helper bones. The reconstruction error monotonically decreased with the number of iterations, which demonstrates the stability of our SSDR-based rigging system. Figure 5 shows optimized models using different numbers of helper bones. The center image of each screen shot shows the initial pose, and the left and right images show a leg stretching pose and bending pose, respectively. The helper bones a, b, and c are located near the hip, knee, and ankle to minimize LBS artifacts. Helper bone d is located in the thigh to emulate the muscle bulge. The skinning weight map for each helper bone is visualized in Fig. 6. Helper bone a had a significant influence on a large area of the thigh, whereas the other helper bones had a lesser influence. This is the inevitable result of our incremental bone insertion algorithm where the first helper bone is inserted to offset the largest reconstruction error.

16

T. Mukai

Fig. 4 Convergence of the reconstruction error according to the numbers of helper bones and iterations

RMS error [cm] Without helper bone 5.0

4.0

3.0

H =1

2.0

2 3

0

5

a

10

4 5 20 iteration

15

b a

a

a

a

a

a

b

b

b

(a1) Leg extension

(a2) Initial pose

(a3) Leg bend

(b1) Leg extension

(b2) Initial pose

(b3) Leg bend

d

c

a

a

a

a b

b

d

b

b c

d

b

b c

c

(c1) Leg extension

a

a

c

(c2) Initial pose

(c3) Leg bend

(d1) Leg extension

(d2) Initial pose

(d3) Leg bend

Fig. 5 Optimized character rigs using different numbers of helper bones. Each model shows a different helper bone behavior

Example-Based Skinning Animation

17

Fig. 6 Skinning weight map for each helper bone. The larger weight is indicated by a darker area

To build the rig with one helper bone, our system consumed 0.17, 0.51, and 0.17 s per iteration for the bone insertion step, weight update step, and transformation update step, respectively. The total optimization time was about 15 s for 20 iterations. For the rig with four helper bones, the time recorded was 0.17, 0.82, and 0.72 s per iteration. The total time was about 32 s.

Evaluating the Accuracy of the Bone Controller In the second experiment, we examined the approximation capability of the helper bone controller. We evaluated the increase in the RMS error caused by approximating the bone transformations with the regression model. We also counted the number of nonzero polynomial coefficients using different polynomial orders X and shrinkage parameters λ while fixing H = 4. The experimental results are summarized in Table 1. The baseline RMS reconstruction error, which was measured after the per-example transformation optimization, was 1.36 cm. The ratio of the increase in the approximation error was within the range of 150–190 %. The reconstruction error decreased according to the polynomial order, and there was no significant difference between the quadratic and cubic polynomials. On the other hand, the redundant polynomial terms were removed through the shrinkage parameter λ while minimizing the increase in the approximation error. In this experiment, our prototype system consumed about 5 μs per frame to compute all skinning matrices Ŝh from the primary skeleton pose Lp. In detail, 1 μs was consumed to compose the independent variables xp from the local transformation matrices Lp, and the computation of the regression model using Eq. 15 consumed 1 μs for each helper bone. The former time increases in proportion to the number of primary bones, and the latter increases with the number of helper bones. The computational speed is sufficiently fast, although we could further improve the performance by parallelizing the execution of bone controllers.

18

T. Mukai

Table 1 Statistics of reconstruction error and the number of nonzero polynomial coefficients with respect to the polynomial order w and the shrinkage parameter l λ Linear Quadratic Cubic

Average # of nonzeros 0 10 6 5.3 14 10.9 26 18.4

20 5.1 9.1 15.1

RMS error [cm] 0 10 2.57 2.58 2.11 2.12 2.03 2.07

20 2.59 2.17 2.11

Limitations Currently, our system does not provide any guidelines for creating an example dataset. Even though we have used uniform sampling of the joint DOFs to create example poses in the experiments, this simple method might generate many redundant examples. This method may even possibly fail to sample important poses and shapes. We plan to perform further studies to identify a more artist-friendly workflow that can create a minimal example dataset. We believe that an active learning method (Cooper et al. 2007) could be a possible solution that allows artists to design example shapes in a step-by-step manner. Our method does not ensure global optimality for the skinning weights and helper bone controller. We have found that an increase in the number of helper bones often degrades the reconstruction, because numerical errors are separately accumulated when solving the optimizations for the skinning decomposition and bone controller construction.

Future Directions In this chapter, we have described an example-based rigging technique for building an LBS rig using the example data of skin deformation. Although the example-based method requires a large amount of example data to construct a character rig, several simulation methods have been proposed to generate physically valid skin deformation using a heavy computation burden. Moreover, the recent development of 3D scanning devices allow for the acquisition of the skin deformations of actual human beings. These state-of-the-art shape acquisition techniques will enable the mass production of example skin shapes within a short period of time and significantly reduce the amount of unartistic manual labor for rig construction. Although most skinning decomposition techniques and helper bone rigs have been adopted for the LBS-based technique, popular nonlinear blend skinning techniques such as dual quaternion skinning (Kavan et al. 2007), stretchable and twistable bones (Jacobson and Sorkine 2011), and elasticity-inspired joint deformers (Kavan and Sorkine 2012) are worth investigating, which is an interesting open question. Dynamic skinning is another promising future direction. The kinodynamic skinning technique (Angelidis and Singh 2007) provides volume-preserving deformation

Example-Based Skinning Animation

19

based on proxy muscles. A rig-space physics technique optimizes the free parameters of the handcrafted kinematic rig to approximate physically simulated skin deformation (Hahn et al. 2012, 2013). Position-based dynamics method (or PBD) was used to synthesize the skin deformation caused by self-collisions and the secondary effects of soft tissues (Rumman and Fratarcangeli 2015). This method provides a plausible and stable soft body motion at interactive rates but requires the elaborate construction of a PBD-based rig. The MoSh model (Loper and Black 2014) estimates the dynamic skin deformation from a sparse set of motion capture markers using a statistical model of human skin shapes. This method generates the skin shape in a low-dimensional subspace to meet the movement of the markers. The Dyna model (Pons-Moll et al. 2015) also constructs a dynamic skin deformation model using a subspace analysis of 4D scans of human subjects. This skin deformation is generated using a second-order autoregressive with an exogenous input model in the low-dimensional subspace. The SMPL model (Loper et al. 2015) learns corrective blendshape models from shape samples and has been extended to synthesize dynamic deformation by incorporating the autoregressive model. These methods successfully produce realistic deformation of human skin. We have proposed an example-based method for controlling the helper bones to mimic the secondary dynamics of soft tissues (Mukai and Kuriyama 2016) while neglecting the effect of gravity and the interactions with other objects.

References Angelidis A, Singh K (2007) Kinodynamic skinning using volume-preserving deformations. In: Proceedings of ACM SIGGRAPH/Eurographics symposium on computer animation 2007, pp 129–140 Baran I, Popović J (2007) Automatic rigging and animation of 3d characters. ACM Trans Graph 26(3):72:1–72:8 Cooper S, Hertzmann A, Popović Z (2007) Active learning for real-time motion controllers. ACM Trans Graph 26(3):5 Fan Y, Litven J, Pai DK (2014) Active volumetric musculoskeletal systems. ACM Trans Graph 33(4):152 Grassia FS (1998) Practical parameterization of rotations using the exponential map. Graph Tool 3(3):29–48 Hahn F, Martin S, Thomaszewski B, Sumner R, Coros S, Gross M (2012) Rigspace physics. ACM Trans Graph 31(4):72:1–72:8 Hahn F, Thomaszewski B, Coros S, Sumner R, Markus G (2013) Efficient simulation of secondary motion in rig-space. In: Proceedings of ACM SIGGRAPH/Eurographics symposium on computer animation 2013, pp 165–171 Horn BKP (1987) Closed-form solution of absolute orientation using unit quaternions. J Opt Soc Am A 4(4):629–642 Jacobson A, Sorkine O (2011) Stretchable and twistable bones for skeletal shape deformation. ACM Trans Graph 30(6):Article 165 Jacobson A, Baran I, Popović J, Sorkine O (2011) Bounded biharmonic weights for real-time deformation. ACM Trans Graph 30(4):78:1–78:8 James DL, Twigg CD (2005) Skinning mesh animations. ACM Trans Graph 24(3):399–407 Kavan L, Sorkine O (2012) Elasticity-inspired deformers for character articulation. ACM Trans Graph 31(6):Article 196

20

T. Mukai

Kavan L, Collins S, Zara J, O’Sullivan C (2007) Skinning with dual quaternions. In: Proceedings of ACM SIGGRAPH symposium on interactive 3D graphics 2007, pp 39–46 Kavan L, Sloan PP, O’Sullivan C (2010) Fast and efficient skinning of animated meshes. Comput Graph Forum 29(2):327–336 Kim J, Kim CH (2011) Implementation and application of the real-time helperjoint system. In: Game developers conference 2011 Kry PG, James DL, Pai DK (2002) Eigenskin: real time large deformation character skinning in hardware. In: Proceedings of ACM SIGGRAPH/Eurographics symposium on computer animation 2002, pp 153–159 Kurihara T, Miyata N (2004) Modeling deformable human hands from medical images. In: Proceedings of ACM SIGGRAPH/Eurographics symposium on computer animation 2004, pp 355–363 Le BH, Deng Z (2012) Smooth skinning decomposition with rigid bones. ACM Trans Graph 31(6): Article 199 Le BH, Deng Z (2014) Robust and accurate skeletal rigging from mesh sequences. ACM Trans Graph 33(4):1–10 Lewis JP, Cordner M, Fong N (2000) Pose space deformation: a unified approach to shape interpolation and skeleton-driven deformation. In: Proceedings of SIGGRAPH 2000, pp 165–172 Li D, Sueda S, Neog DR, Pai DK (2013) Thin skin elastodynamics. ACM Trans Graph 32(4):49 Loper M, Black NMMJ (2014) Motion and shape capture from sparse markers. ACM Trans Graph 33(6):220:1–220:13 Loper M, Mahmood N, Romero J, Pons-Moll G, Black MJ (2015) SMPL: a skinned multi-person linear model. ACM Trans Graph 34(6):248:1–248:16 Magnenat-Thalmann N, Laperrière R, Thalmann D (1988) Joint-dependent local deformations for hand animation and object grasping. In: Proceedings on graphics interface’88, pp 26–33 Merry B, Marais P, Gain J (2006) Animation space: a truly linear framework for character animation. ACM Trans Graph 25(6):1400–1423 Miller C, Arikan O, Fussell DS (2011) Frankenrigs: building character rigs from multiple sources. IEEE Trans Vis Comput Graph 17(8):1060–1070 Mohr A, Gleicher M (2003) Building efficient, accurate character skins from examples. ACM Trans Graph 22(3):562–568 Mukai T (2015) Building helper bone rigs from examples. In: Proceedings of ACM SIGGRAPH symposium on interactive 3D graphics and games 2015, pp 77–84 Mukai T, Kuriyama S (2016) Efficient dynamic skinning with low-rank helper bone controllers. ACM Trans Graph 35(4):1 Neumann T, Varanasi K, Hasler N, Wacker M, Magnor M, Theobalt C (2013) Capture and statistical modeling of arm-muscle deformations. Comput Graph Forum 32(2):285–294 Park SI, Hodgins JK (2008) Data-driven modeling of skin and muscle deformation. ACM Trans Graph 27(3):Article 96 Parks J (2005) Helper joints: advanced deformations on runtime characters. In: Game developers conference 2005 Pons-Moll G, Romero J, Mahmood N, Black MJ (2015) Dyna: a model of dynamic human shape in motion. ACM Trans Graph 33(4):120:1–120–10 Pulli RYWK, Popović J (2007) Real-time enveloping with rotational regression. ACM Trans Graph 26(3):73 Rumman NA, Fratarcangeli M (2015) Position-based skinning for soft articulated characters. Comput Graph Forum 34(6):240–250 Shi X, Zhou K, Tong Y, Desbrun M, Bao H, Guo B (2008) Example-based dynamic skinning in real time. ACM Trans Graph 27(3):29:1–29:8 Sloan PPJ, Rose CF, Cohen MF (2001) Shape by example. In: Proceedings of ACM SIGGRAPH symposium on interactive 3D graphics 2011, pp 135–143

Example-Based Skinning Animation

21

Tibshirani R (2011) Regression shrinkage and selection via the lasso: a retrospective. J R Stat Soc Ser B (Stat Methodol) 73(3):273–282 Wang XC, Phillips C (2002) Multi-weight enveloping: least-squares approximation techniques for skin animation. In: Proceedings of ACM SIGGRAPH/Eurographics symposium on computer animation, pp 129–138

Crowd Formation Generation and Control Jiaping Ren, Xiaogang Jin, and Zhigang Deng

Abstract

Crowd formation transformation simulates crowd behaviors from one formation to another. This kind of transformation has often been used in animation films, group calisthenics performance, video games, and other special effect applications. Given a source formation and a target formation, one intuitive approach to achieve this kind of transformation between two formations is to establish the source point and the destination point for each agent and plan the trajectory for each agent while maintaining collision free maneuvers. Crowd formation generation and control usually consists of five different parts: formation sampling, pair assignment, trajectory generation, motion control, and evaluation. In this chapter, we will describe the involved techniques from abstract user input to collective crowd formation transformations. Keywords

Crowd simulation • Motion control • Motion transition • Evaluation

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . State of the Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Formation Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Formation Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pair Assignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Trajectory Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Motion Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2 3 4 4 4 6 7

J. Ren (*) • X. Jin State Key Lab of CAD&CG, Zhejiang University, Hangzhou, China e-mail: [email protected]; [email protected] Z. Deng Department of Computer Science, University of Houston, Houston, TX, USA e-mail: [email protected] # Springer International Publishing AG 2017 B. Müller, S.I. Wolf (eds.), Handbook of Human Motion, DOI 10.1007/978-3-319-30808-1_15-1

1

2

J. Ren et al.

Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Introduction In recent years, simulation of group formation transformation has been increasingly used in feature animation films, video games, mass performance rehearsal, tactical arrangements of players for sports teams training, and so on. Furthermore, group formation generation and control can also find its wide applications in many other scientific and engineering fields including but not limited to robot control, multiagent systems, and behavioral biology. In most crowd simulation systems, each agent intelligently moves toward its destination through navigational pathfinding algorithms and avoids collisions with other agents and obstacles through local behavior control models. In collective crowd formation transformation, we should also consider the behaviors of the entire group. When simulating a crowd, one intuitive way to form a target formation is to provide each agent’s desired position at a particular moment and generate transitions between that position and the destination. However, users must manually specify many spatial, temporal, and correspondence constraints, which is time-consuming and nontrivial, particularly when the crowd includes many agents that change their locations frequently. Similarly, when a group of people perform a collective action simultaneously in the real world, it is generally impossible for their commander or team leaders to convey detailed movement information such as every group member’s position at every time instance. So there is a need to find automatically methods to solve these problems. How to transfer abstract user inputs to the expression that we can deal with easily? How to choose the destination for each agent? When finding the path for each agent, which path is the best? And, how to control the formation efficiently? To answer these questions, we decompose collective crowd formation transform into five different steps. Suppose that the user specifies the source and target formation shapes with sketches or contours, we first sample the agents in the formations. Then, we assign the source position and the target position for each agent. We use crowd simulation methods to generate realistic collision-avoidance trajectories of agents. We also introduce some methods to control the movements of agents and high-level transformation for crowds. We consider the approaches to measure the crowd simulation and transformation results. Organization The rest of the chapter is organized as follows. We give an overview of prior work in section “State of the Art.” In section “Formation Sampling,” we introduce how to sample the source and target formations given by users. In section “Pair Assignment,” we assign corresponding positions in both source and target

Crowd Formation Generation and Control

3

formations for each agent. In section “Trajectory Generation,” we generate trajectories for each agent from the source formation to the target formation. In section “Motion Control,” five methods about motion control are introduced. In section “Evaluation,” we give six approaches to evaluate the results of crowd formation transformation. We conclude and discuss future work in section “Summary.”

State of the Art Numerous crowd simulation and modeling approaches have been developed during the past several decades. Here we briefly review recent efforts about crowd simulation, formation transformation, and evaluation. For crowd simulation, there are two major kinds of models: rule-based models and force-based models. Rule-based crowd models are flexible to simulate various crowd agents through a set of delicately designed rules. The seminal work by Reynolds (1987) presented the concept of Boids that simulates flocks of birds and schools of fishes via several simple yet effective steering behavioral rules to keep the group cohesion, alignment, and separation, as well as avoid collisions between group members. Recently, Klotsman and Tal (2012) provided a biologically motivated rule-based artificial bird model, which produces plausible and realistic line formations of birds. A distinct research line of crowd simulations is force-based models, originally developed from human social force study by Helbing and Molnr (1995). Later, it was further applied and generalized to other simulation scenarios such as densely populated crowds (Pelechano et al. 2007), simulation of pedestrian evolution (Lakoba et al. 2005), and escape panic (Helbing et al. 2000). Group formation control is a vital collective characteristic of many crowds. Existing approaches typically combine heuristic rules with explicit hard constraints to produce and control sophisticated group formations. For example, Kwon et al. (2008) proposed a framework to generate aesthetic transitions between key crowd formation configurations. A spectral-based group formation control scheme (Takahashi et al. 2009) was also proposed. However, in these approaches, exact agent groups distributions at a number of key frames need to be specified by users. Gu and Deng (Gu and Deng 2011, 2013; Xu et al. 2015) proposed an interactive and scalable framework to generate arbitrary group formations with controllable transitions in a crowd. Henry and colleagues (Henry et al. 2012, 2014) proposed a singlepass algorithm to control crowds using a deformable mesh, and this approach can be used to control crowd environment interaction and obstacle avoidance. In addition, they proposed an alternative metric for use in a pair assignment approach for formation control that incorporates environment information. These approaches either need nontrivial manual involvements (Kwon et al. 2008; Takahashi et al. 2009) or are focused on intuitive user interfaces for formation control and interaction (Gu and Deng 2011, 2013; Henry et al. 2012, 2014; Xu et al. 2015). Many approaches are proposed to evaluate the results or improve the accuracy of multiagent and crowd simulation algorithms. Most of them perform evaluation by

4

J. Ren et al.

comparing the algorithms’ output with real-world sensor data. Pettré et al. (2009) compute appropriate parameters based on Maximum Likelihood Estimation. Lerner et al. (2009) annotate pedestrian agent trajectories with action-tags to enhance their natural appearance or realism. Guy et al. (2012) propose an entropy-based evaluation approach to quantify the similarity between real-world and simulated trajectories.

Formation Generation Formation Sampling In this section, we describe how to generate the source and target formation based on agents from user inputs. Different assumptions about the inputs by users correspond to different sampling strategies. Here we mainly introduce two formation sampling strategies based on different user inputs. One assumption supposes that the user gives contour shapes, such as squares or circles, or brush paintings (Gu and Deng 2013). This work proposes a unified formation shape representation called a formation template, i.e., an oversampled point space with a roughly even distribution, and it offers an interactive way for users to draw the input formations. In this case, to generate a visually balanced target formation template, it first evenly samples the points on the boundaries, and then fills the area between the inclusive and exclusive boundaries through an extended scanline flood-fill algorithm. In addition, it uses a filling algorithm to discretize the grid space. To avoid sampling points too close to the boundaries, it checks four points with constant offsets (top, bottom, left, and right) to the current checkpoint. Another assumption supposes that the user specifies the source and target formation shapes with sketches and the number of agents in the formation (Xu et al. 2015). In order to automatically use a user-specified number of agents to form a specified formation shape with a visually natural distribution, the sampling process from the formation shape is mainly divided into the following two stages. First, it uses a simple way to tentatively sample the approximation of the parameterized number of formation points in the formation shape with a roughly even distribution. However, the sampled result is very rigid, lacking the aesthetic effect. More importantly, it is difficult for this method to accurately sample a user-specified number of formation points. Therefore, users have to tune the number of sampled points to be exactly equivalent to the user-specified number by randomly deleting some sampled points or adding some formation points according to the Roulette Wheel Selection strategy. Then a corresponding agent is located at the location of a formation point.

Pair Assignment After the sampling process, to generate the formation transformation, we need to pair the agents in the source formation with those in the target formation. We introduce two methods to solve this pair assignment problem in this section.

Crowd Formation Generation and Control

5

Fig. 1 Generating arbitrary formations from random agent distributions (left-most) through inclusion sketch (white boundaries) and exclusion sketch (red boundaries) in Gu and Deng (2011)

The method presented in (Gu and Deng 2011, 2013) estimates the agent distribution in the target formation to find the correspondence between any agent in the initial formation and its appropriate candidate position in the formation template (see Fig. 1). Using formation coordinates, it designs a pair assignment algorithm based on two key heuristics. First, in the target formation, boundary agents should closely fit the boundary curves to clearly exhibit the user-specified formation shape. Second, each nonboundary agent should keep its adjacency condition as much as possible. This algorithm first finds correspondences for the boundary agents, and then finds correspondences for the nonboundary agents. To find correspondences for the boundary agents, it converts the positions of all the agents in the initial distribution into formation coordinates and subtracts the formation orientation from each agent’s relative direction to yield the relation agent direction. It stores this direction along with the relative agent distance in a KD-tree data structure. This approach performs the same operations for each point on the target formation template boundaries. Thus it can efficiently compute the agent corresponding to each boundary point by finding the nearest neighbor in the KD-tree. To find correspondences for the nonboundary agents, it identifies the corresponding template point for each nonboundary agent that was not selected in the previous step. Similarly, it uses each agent’s formation coordinate to find the closest inner template point. The approach further transforms that point to its world coordinate representation, i.e., the agent’s target position. In the method presented in Xu et al. (2015), Delaunay Triangulation is employed to represent the relationship among adjacent agents. Pair assignment can be formulated as the problem of building a one-to-one correspondence between the source point set and the target point set (see Fig. 2). It can be further formulated as finding an optimal assignment in a weighted bipartite graph. In the matching process, they apply a novel match measure to effectively minimize the overall disorder including the variations of both time synchronization and local structure. That is to minimize the distance from source to target for each agent, and the average distances to neighbors are similar between source formation and target formation. Finally, this method applies the classical Kuhn-Munkres algorithm (Kuhn 1955; Munkres 1957) to solve the pair assignment problem.

6

J. Ren et al.

Fig. 2 Delaunay triangulation and pair assignment in Xu et al. (2015)

Trajectory Generation In this section, we address how each agent reaches its destination at every time step after determining its target position. Six different simulation approaches are described in the following. Reciprocal Velocity Obstacles (RVO) (Van den Berg et al. 2008) which extends the Velocity Obstacle concept (Fiorini and Shiller 1998) is a widely used velocitybased crowd simulation model. This method introduces a new concept for local reactive collision avoidance. The only information each agent is required to have about the other agents is their current position and velocity and their exact shape. The basic idea is that: instead of choosing a new velocity for each agent that is outside the other agent’s velocity obstacle, this method chooses a new velocity that is the average of its current velocity and a velocity that lies outside the other agent’s velocity obstacle. Optimal reciprocal collision avoidance (ORCA) (Van Den Berg et al. 2011) is a revised model derived from the RVO, and it presents a rigorous approach for reciprocal n-body collision avoidance that provides a sufficient condition for each agent to be collision-free for at least a fixed amount of time into the future, only assuming that the other robots use the same collision-avoidance protocol. There are infinitely many pairs of velocity sets that make two agents avoiding collision, but among those, it selects the pair maximizing the amount of permitted velocities “close” to the optimized velocities for two agents. For n-body collision-avoidance, each agent performs a continuous cycle of sensing the acting within the time step. In each iteration, the agent acquires the radius, the current position, and the current optimization velocity of the other robots. Based on this information, the agent infers the permitted half-plane of velocities that make the agent avoiding collision with other agents. The set of velocities that are permitted for the agent with respect to all agents is the intersection of the half-planes of permitted velocities induced by each other agent. Then the agent chooses a new velocity that is closest to its preferred velocity among all velocities inside the region of permitted velocities. The method in Gu and Deng (2011) can automatically compute the desired position of each agent in the target formation and generate the agent correspondences between key frames. The force that drives an agent from its original position to its estimated target position can simply be the direction vector between the two positions. However, this force only considers the group formation factor. In a

Crowd Formation Generation and Control

7

dense group, such a pure formation-driven strategy cannot fully avoid agent collisions. As such, a local collision model is needed to refine within-group collision avoidance. To this end, the authors employ a force-based model (Pelechano et al. 2007) for the collision avoidance task due to its capability of handling very highdensity crowds. Because this model takes into account the collision avoidance, forces and repulsion force from neighboring group members and obstacles. In Gu and Deng (2013), local formation transition is the transition from one formation to another without considering the whole group’s general locomotion. In addition to considering to compute a linear interpolation from an agent’s initial position to its estimated target position, this method also consider an extra repulsion force to avoid collision. Without user interactions, each agent would go straight to the target position with minor transition adjustments on the way to avoid local collisions with other agents. For an agent, the method (Xu et al. 2015) locally adjusts its trajectory by applying social forces such as driving and repulse forces to navigate and avoid collisions. A mutual information-based method is introduced, which is a well-known concept in the field of information theory, and it is designed to quantify the mutual dependence between two random variables. Mutual information is somewhat correlated with the fluency and stability of agent subgroup’s localized movements in a crowd. In their method, the mutual information between direction and position and the mutual information between velocity and position are used to adjust the ideal heading and desired velocity in the basic social force model. The online real-time motion synthesis method (Han et al. 2016) transforms the initial motion automatically using the following parameters: target turning angle of the agent, the target scaling factor for the moving speed with respect to that of the source formation, and time required to achieve the target formation. This method builds interpolation functions of turning angle and scaling factor for each agent, thus contains the velocity and position for each agent in every frame.

Motion Control As we can generate the trajectory for each agent from one formation to another, in this section, we want to address the following problem: How to make the transformation more reliable, more controllable? Here, we describe five different methods to solve this problem. The method presented in Gu and Deng (2011) is a two-level hierarchical group control, that is, breaking the full group dynamics into within-group dynamics and intergroup dynamics. Xu et al. (2015) extend the adapted social force method with a subgroup formation constraint (see Fig. 3). This method clusters individual agents in a crowd into subgroups to maximally maintain the formation of the collective subgroups during the formation transition. An Affinity Propagation (AP) clustering algorithm (Frey and Dueck 2007) is used by Xu et al. (2015). The AP algorithm identifies a set of exemplars to best represent agents’ positions in the formation. They choose the AP algorithm to cluster agents since an exemplar can be

8 Fig. 3 The adapted social force method with a subgroup formation constraint of Xu et al. (2015)

J. Ren et al.

Mutual Information

Movement Control

Subgroup Clustering

conceptually considered to represent the overall movement of its corresponding agent-subgroup, and the cluster number is determined adaptively and automatically. The measure for similarity is the local relative distance variance for the collective subgroup clustering. In the method presented in Gu and Deng (2013), the authors construct a second virtual local grid field to evaluate a flow vector to guide transitions, for the need of implementing a splitting and merging transition. In order to form a user-defined formation (see Fig. 4) while moving as a whole to other location, this method introduce three factors to the group level: the global navigation vector heading to the target formation’s location, the velocity driven by global collision avoidance between different groups, and the user-guided velocity computed from the sketching interface. Vector field is introduced (Jin et al. 2008) to guide agents’ movements (see Fig. 5). A vector field can be considered as position-to-vector mapping in the problem domain. A physics-based predictive motion control is described in Han et al. (2014). It first generates a reference motion automatically at run-time based on existing data-driven motion synthesis methods. Given a reference motion, it repeatedly generates an optimal control policy for a small time window that spans a couple of footsteps in the reference motion through model predictive control while shifting the window along the time axis, which supports an on-line performance.

Evaluation As there are lots of methods for crowd formation transformation, which one is better in one definite situation? Here we introduce six different approaches to evaluate transformation results. The visual method is the most direct and basic way to judge whether an animation is aesthetic or not. However, the results by visual methods are subjective by nature, and quantitative approaches are more reliable and objective for researchers.

Crowd Formation Generation and Control

9

Fig. 4 Formation transitions with trajectory controls of Gu and Deng (2013)

Fig. 5 Agents’ movements guided by a vector field of Jin et al. (2008)

The simulation time consumption is a generally used measure of quantifying the performance of simulation methods. With the same function, people prefer the method that has a lower time consumption. Especially in interactive applications, run-time performance is required for a good user experience. The mutual information introduced by Xu et al. (2015) can be adapted to measure the aesthetic aspect of a crowd formation transform, as well as how to compute the mutual information of a dynamics crowd. The stability of local structure Xu et al. (2015), just as its name implies, is a measure of the local stability for transformation. It uses the standard deviation and the average value to quantify the stability property of local structure during a crowd formation transformation. The stability of local structure is the standard deviation of minimum neighbor distance for agents. Clearly, a lower value of the standard deviation indicates that the agents have more similar distances from their neighbors and vice versa. The effort balancing (Xu et al. 2015) is a metric to measure the synchronization of the transformation and employs the standard deviation and the average value to estimate the balancing of the agents’ efforts. When a formation transformation is smooth and visually pleasing, for any agent, we anticipate that the effort from its source position to its current position is not only the least but also balanced. Effort balancing is the standard deviation of distance to agents’ source position and their current position.

10

J. Ren et al.

A data-driven quantitative approach (Ren et al. 2016; Wang et al. 2015) is presented to evaluate collective dynamics models by using real-world trajectory datasets. It is possible that two different groups with noisy trajectories may exhibit similar behaviors even when their trajectory positions are quite different. This approach uses discrete probability density distribution functions that are generated from the time-varying metrics and reflects the global characteristics of groups. The influence of a small amount of data abnormality or noise can be ignored. It introduces seven time-varying metrics: the velocity, the acceleration, the angular velocity, the angular acceleration, the Cartesion jerk, the shortest distance, and the velocity difference. The evaluation model is related to the sum of the differences in discrete PDF between the real-world data and the simulation data for the seven metrics. To compare different simulation methods, the overall evaluation of this approach is the iterations of two components: optimizing the dynamics model parameters and optimizing the weights of seven energy terms. This method uses a genetic algorithm to compute the optimal parameters by maximizing the evaluation model and introduces entropy to compute the weights of seven time-varying metrics.

Summary We have discussed the processes involved in collective crowd formation transformation in this chapter. In formation sampling level, we describe two formation sampling strategies. The first strategy samples on the input sketches with a roughly even distribution and then adopts Roulette Wheel Selection to relate the agents to their certain positions. The second strategy fills the boundary and considers the inner area. In pair assignment level, we introduce two strategies to find the correspondence between an agent’s original position and its new position. One method finds correspondences for the boundary agents followed by the nonboundary agents. The other method presents a measure to minimize the overall disorder including the variations of both time synchronization and local structure and transfers the pair assignment problem into an optimization problem. In the trajectory generation level, we introduce six methods to guide the movement of each agent in the formation. The main concern is about collision-avoidance. In the motion control level, we describe five methods to make the results more reliable and controllable. In the evaluation level, we introduce six different approaches to evaluate the simulation methods, the transformation results, and the execution performance, and most of them are quantitative methods. Limitation and Future Work Although various methods can be applied into the crowd formation transformation problem, these approaches still cannot handle complex scenarios with obstacles and the multiformation’s mixture and separation. Moreover, current approaches suppose that the collective crowds are moving on the two dimensions. In the future, we can improve the existing methods by dealing with complex obstacles, considering interactions among different groups, extending the 2D methods to 3D, and changing target formation dynamically.

Crowd Formation Generation and Control

11

Cross-References ▶ Crowd Evacuation Simulation ▶ Functional Crowds ▶ Optimal Control Modeling of Human Movement ▶ Segmental Movements in Cycling

References van den Berg J, Lin M, Manocha D (2008) Reciprocal velocity obstacles for real-time multi-agent navigation. In: Robotics and automation, 2008. ICRA 2008. IEEE international conference on Robotics and Automation, IEEE, pp. 1928–1935 van den Berg J, Guy SJ, Lin M, Manocha D (2011) Reciprocal n-body collision avoidance. In: Robotics research, Springer, pp 3–19 Fiorini P, Shiller Z (1998) Motion planning in dynamic environments using velocity obstacles. Int J Rob Res 17(7):760–772 Frey BJ, Dueck D (2007) Clustering by passing messages between data points. Science 315(5814):972–976 Gu Q, Deng Z (2011) Formation sketching: an approach to stylize groups in crowd simulation. In: Proceedings of graphics interface 2011, Canadian Human-Computer Communications Society, pp 1–8 Gu Q, Deng Z (2013) Generating freestyle group formations in agent-based crowd simulations. IEEE Comput Graph Appl 33(1):20–31 Guy SJ, van den Berg J, Liu W, Lau R, Lin MC, Manocha D (2012) A statistical similarity measure for aggregate crowd dynamics. ACM Trans Graph 31(6):190:1–190:11 Han D, Noh J, Jin X, S Shin J, Y Shin S (2014) On-line real-time physics-based predictive motion control with balance recovery. Comput Graphics Forum 33:245–254. Wiley Online Library Han D, Hong S, Noh J, Jin X, Shin JS (2016) Online real-time locomotive motion transformation based on biomechanical observations. Comput Anim Virtual Worlds 27(3–4):378–384 Helbing D, Molnar P (1995) Social force model for pedestrian dynamics. Phys Rev E 51(5):4282 Helbing D, Farkas I, Vicsek T (2000) Simulating dynamical features of escape panic. Nature 407(6803):487–490 Henry J, Shum HP, Komura T (2012) Environment-aware real-time crowd control. In: Proceedings of the 11th ACM SIGGRAPH/Eurographics conference on computer animation, Eurographics Association, pp 193–200 Henry J, Shum HP, Komura T (2014) Interactive formation control in complex environments. IEEE Trans Vis Comput Graph 20(2):211–222 Jin X, Xu J, Wang CC, Huang S, Zhang J (2008) Interactive control of large-crowd navigation in virtual environments using vector fields. IEEE Comput Graph Appl 28(6):37–46 Klotsman M, Tal A (2012) Animation of flocks flying in line formations. Artif Life 18(1):91–105 Kuhn HW (1955) The hungarian method for the assignment problem. Naval Res Logist Q 2(1–2):83–97 Kwon T, Lee KH, Lee J, Takahashi S (2008) Group motion editing. ACM Trans Graph 27:80 Lakoba TI, Kaup DJ, Finkelstein NM (2005) Modifications of the helbing-molnar-farkasvicsek social force model for pedestrian evolution. Simulation 81(5):339–352 Lerner A, Fitusi E, Chrysanthou Y, Cohen-Or D (2009) Fitting behaviors to pedestrian simulations. In: Proceedings of the 2009 ACM SIGGRAPH/Eurographics symposium on computer animation, ACM, pp 199–208 Munkres J (1957) Algorithms for the assignment and transportation problems. J Soc Ind Appl Math 5(1):32–38

12

J. Ren et al.

Pelechano N, Allbeck JM, Badler NI (2007) Controlling individual agents in high-density crowd simulation. In: Proceedings of the 2007 ACM SIGGRAPH/Eurographics symposium on computer animation, Eurographics Association, pp 99–108 Pettré J, Ondřej J, Olivier AH, Cretual A, Donikian S (2009) Experiment-based modeling, simulation and validation of interactions between virtual walkers. In: Proceedings of the 2009 ACM SIGGRAPH/Eurographics symposium on computer animation, ACM, pp 189–198 Ren J, Wang X, Jin X, Manocha D (2016) Simulating flying insects using dynamics and data-driven noise modeling to generate diverse collective behaviors. PLoS One 11(5):e0155698 Reynolds CW (1987) Flocks, herds and schools: a distributed behavioral model. ACM SIGGRAPH Comput Graph 21(4):25–34 Takahashi S, Yoshida K, Kwon T, Lee KH, Lee J, Shin SY (2009) Spectral-based group formation control. Comput Graphics Forum 28: 639–648. Wiley Online Library Wang X, Ren J, Jin X, Manocha D (2015) Bswarm: biologically-plausible dynamics model of insect swarms. In: Proceedings of the 14th ACM SIGGRAPH/Eurographics symposium on computer animation, ACM, pp 111–118 Xu M, Wu Y, Ye Y, Farkas I, Jiang H, Deng Z (2015) Collective crowd formation transform with mutual information–based runtime feedback. Comput Graphics Forum 34:60–73. Wiley Online Library

Functional Crowds Jan M. Allbeck

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . State of the Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Animation to AI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Heterogeneity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . HCI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2 3 4 6 6 7 8 9

Abstract

Most crowd simulation research either focuses on navigating characters through an environment while avoiding collisions or on simulating very large crowds. Functional crowds research focuses on creating populations that inhabit a space as opposed to passing through it. Characters exhibit behaviors that are typical for their setting, including interactions with objects in the environment and each other. A key element of this work is ensuring that these large-scale simulations are easy to create and modify. Automating the inclusion of action and object semantics can increase the level at which instructions are given. To scale to large populations, behavior selection mechanisms must be kept relatively simple and, to demonstrate typical human behavior, must be based on sound psychological models. The creation of roles, groups, and demographics can also facilitate behavior selection. The simulation of functional crowds necessitates research in animation, artificial intelligence, psychology, and human-computer interaction (HCI). This chapter provides a brief introduction to each of these elements and their application to functional crowds. J.M. Allbeck (*) George Mason University, Fairfax, VA, USA e-mail: [email protected] # Springer International Publishing AG 2016 B. Müller, S.I. Wolf (eds.), Handbook of Human Motion, DOI 10.1007/978-3-319-30808-1_16-1

1

2

J.M. Allbeck

Keywords

Crowd simulation • Virtual humans • Patterns of life • Computer animation • AI

Introduction Virtual humans can be used as stand-ins when using real humans would be too dangerous and cost-prohibitive or precise control is required. Virtual humans are often used as extras or background characters in movies and games (see Fig. 1). They are similarly used in virtual training scenarios for military personnel and first responders. They can also be used to analyze urban and architectural design as well as various policies and procedures. For many of these applications and others, the virtual humans must both reflect typical or normal human behavior and also be controllable or directable. Furthermore, in order to create sizeable crowds of virtual humans functioning in rich virtual environments, they must have relatively simple behavior selection mechanisms. Functional crowds, in contrast to more typical crowd simulations, depict animated characters interacting with the environment in meaningful ways. They do not simply walk from one location to another avoiding obstacles. They perform the same behaviors we see from real humans every day, as well as not so typical behaviors that might be required for the application. The first element needed to achieve functional crowds is animation. Traditional crowd simulations focus on walking animation clips, perhaps with a few idle behaviors or depending on the application some battle moves. There is little or no interaction with objects in the environment. Animating virtual humans manipulating objects can be quite challenging. It involves detection of collisions and fine motor movements. We will give an overview of some of these challenges and approaches for solving them in this chapter.

Fig. 1 Virtual characters in a scene in the unreal game engine

Functional Crowds

3

Another required element relates to providing the virtual humans with the knowledge of what actions can be performed and what objects are required to perform them. If we are going to eat, we need an object corresponding to food to eat. We may optionally need instruments such as utensils. A lot of this needed information could be considered commonsense, but unless explicitly supplied to the virtual humans, they lack it. This information is also needed as input to higher-level artificial intelligence mechanisms such as behavior selectors, planners, and narrative generators. Functional crowds should also depict a heterogeneous population. In real life not everyone does the same thing, they do not have the same priorities, and they do not all perform tasks in exactly the same way. Some of these variations stem from prior observations and experiences. They are learned. Some stem from psychological states and traits, such as personalities and emotions. Finally, many, if not all, applications of functional crowds require some humancomputer interaction (HCI). This interaction may come during the authoring of the crowd behavior. The application or scenario may require some of the behaviors to be more tightly controlled or even scripted. The application may also require users (e.g., players, trainees, evaluators, etc.) to interact with the crowd during the simulation. These interactions may simply require the virtual humans to avoid collisions with the real human’s avatar, or they may require communication and perhaps even cooperation between the real and virtual humans. This chapter will present these various elements of functional crowds and discuss challenges and approaches to address them. We will start by providing a snapshot of the current state of the art in related research fields. Then we will in turn discuss issues related to AI and animation, psychological models, and HCI. Finally, we will conclude with a brief summary and potential future direction.

State of the Art In the past decade or so, crowd simulations have made enormous progress in the number of characters that can be simulated and in creating more natural behaviors. More detailed analysis of crowd simulation research can be found in a number of published volumes (Kapadia et al. 2015; Pelechano et al. 2008, 2016; Thalmann et al. 2007). It is now possible to simulate over one million characters in real time in high-density crowds. Crowd simulations can also be more heterogeneous. Not every character looks or behaves exactly the same. Certainly some variations stem from differences in appearance and motion clips (Feng et al. 2015; McDonnell et al. 2008). Others come from psychological models such as emotion and personality (Balint and Allbeck 2014; Durupinar et al. 2016; Li et al. 2012). Most crowd simulations assign, fairly randomly, starting positions and ending destinations for the characters in the simulation. While this appears fine for short durations at a distance, if a player follows a character for a period of time, it quickly appears false. Sunshine-Hill and

4

J.M. Allbeck

Badler have created a framework for generated plausible destinations for characters on the fly to provide reasonable “alibis” for them (Sunshine-Hill and Badler 2010). Simulating functional crowds also requires other advanced computer graphics techniques. Commercial game engines, such as Unity ® and Unreal ®, provide much of the technology necessary. In the past couple of years, they have both changed their licensing structure in ways that enable researchers to take advantage of and add to their capabilities. Other needed advancements come from the animation research community. A key feature of functional crowds is the ability of characters to interact with objects in their environment in meaningful ways. We require animations of characters sitting and eating food, getting in and out of vehicles, conversing with one another, displaying emotions, getting dressed, etc. (Bai et al. 2012; Clegg et al. 2015; Hyde et al. 2016; Shapiro 2011).

Animation to AI To simulate a functional crowd, we need the characters to interact with their objectrich environments and with each other. While great work has been done in pathfinding, navigation, and path following, additional advancements are still needed (Kapadia et al. 2015; Pelechano et al. 2016). Characters still struggle to get through cluttered environments with narrow walkways. We need to give characters enhanced abilities to turn sideways, sidestep, and even back up in seamless natural motions. Furthermore, characters need to be able to grab, carry, place, and use objects of different shapes and sizes and do so when the objects are placed at various locations in the world and approached from any direction. The core of motions for characters is generally generated in one of three ways: key framed, motion capture, or procedural. Artist created key-framed and motion-captured motions that tend to look natural and expressive, but lack the flexibility needed for most object interactions. Procedurally generated motions use algorithms such as inverse kinematics that work well to target object locations (e.g., for a reach and grab), but often lack a natural look and feel and require objects to be labeled with sites, regions, and directions referenced in the code. While progress continues in virtual human animation research, natural-looking functional crowds will require even more advancement to make the authoring and animating large populations of characters more feasible. Once the characters can be animated interacting with objects in the environment, they need to possess an understanding of what can be done with objects and what objects are needed in order to perform various actions. In other words, they need to understand object and action semantics. This includes knowing what world state must exist prior to the start of an action (i.e., applicability conditions and preparatory specifications), what state must hold during the action, what the execution steps of the action are, and finally what the new world state is after the successful execution of the action. As indicated previously, there also needs to be information about the parts and various locations of the objects (e.g., grasp locations, regions to sit on (see Fig. 2), etc.) so that animation components can be effective. Representations, such as the Parameterized Action Representation (PAR), are designed to hold this

Functional Crowds

5

Fig. 2 Regions indicating places where characters could sit

information, but authoring them is time-consuming and error prone (Bindiganavale et al. 2000). In order to scale to the level needed to simulate functional crowds in large, complex environments, the creation of action and object semantics needs to be automated. Automating action and object semantics would also help to ensure some consistency within and between scenarios, whereas ad hoc, handing authoring tends to be sloppy and error prone. Online lexical databases, such as WordNet, VerbNet, and FrameNet, have been shown to provide a viable foundation for action and object semantics for virtual worlds (Balint and Allbeck 2015; Pelkey and Allbeck 2014). Additional work is needed to ensure the information represented is what is needed for the applications in virtual worlds and to ensure that mechanisms for searching and retrieval are fast enough. Given that characters have some basic understanding of the virtual world they are inhabiting, the next question is at any given time, how should characters select their behaviors? Planning and other sophisticated AI techniques can be computationally intensive and difficult to control. For functional crowds, it would be better to start with simple techniques both in authoring and execution (J. M. Allbeck 2009, 2010). Human behaviors stem from a variety of different impetuses. Some behaviors, such as going to work or school or attending a meeting, are scheduled. These actions provide some structure to our lives and the lives of our virtual counterparts. They are selected based on the simulated time of day. Reactive actions are responses to the world around us. They add life and variation to the behaviors of virtual characters. They are selected based on the objects, people, and events around the character. Aleatoric or stochastic actions include sub-actions with different distributions. For example, we may want a character to appear to be working in her office, but are not very concerned with the details. Our WorkInOffice action would include sub-actions like talking on the phone, filing papers, and using the computer. The character would switch between these actions for the specified period of time at the specified distribution, but what exact sub-action is being performed at any point in time would not need to be specified. Need-based actions add humanity to the virtual characters. Needs grow over time and are satisfied by performing certain actions with the necessary object participants (e.g., eat food). As a need grows, the priority

6

J.M. Allbeck

of selecting a behavior that would fulfill it also grows. These needs could correspond to established psychological models, such as Maslow’s hierarchy of needs, or they could be specific to the scenario (e.g., drug addiction). These are just a few examples of simple behavior selection mechanisms. Certainly, others are possible and may be more applicable to some scenarios. Practically speaking, it may be best to completely script the behaviors of some key characters in a scenario. Background characters could then have variations in their schedules, reactions, needs and distributions. More sophisticated AI techniques could be included when and where needed, as long as the overall framework remains fast enough for human interaction.

Heterogeneity In real human populations, not everyone is doing the same thing at the same time. There are variations in behaviors that stem from different factors. The psychological research committee has spent decades positing numerous models of personality, emotions, roles, status, culture, and more. The virtual human research community has taken these models as inspiration for computational models for virtual human behaviors (Allbeck and Badler 2001; IVA 1998; Li and Allbeck 2011). Variations in behavior and behavior selection can also evolve as the characters learn about and from their environment and each other (Li and Allbeck 2012). All of this research needs additional attention and revision. In particular, how these different traits are manifest in expressive animation needs continuing work, as does the interplay of psychological models. How does personality effect emotion and the display of emotion? How does a character’s roles and changing role effect emotional displays? Certainly culture and its impacts are not well modeled in virtual humans. How do all of these psychological models influence a character’s priorities? At any point in time, a character’s behavior selection should reflect what is most important for them to achieve at that time. Their priorities can be influenced by any number of factors. For functional crowds, it is important that priorities be weighed quickly and behavior selection is not delayed by an overly complex psychological framework. An open question for most scenarios is what parts of human behavior are really important to model and what can be left out? It is possible that a fair amount of just random choices would suffice for the majority of the characters a lot of the time, but this depends on the duration of the simulation and how often the same character or characters are encountered by the viewer.

HCI Most applications of functional crowds require them to have some interaction with real humans either during the authoring process and/or while the simulation unfolds. Authoring the behavior of an entire population of characters from the ground up would be infeasible. Providing a layer of automatically generated common

Functional Crowds

7

understanding (i.e., action and object semantics) does help. Simple, yet robust, behavior selection mechanisms are also helpful. Furthermore, the action types described earlier can be linked to even higher-level constructs, such as groups and roles (Li and Allbeck 2011). When authoring behaviors, it is important to balance autonomy and control. To accomplish the objectives of the scenario, authors need to have control over the characters and their behaviors. However, authoring every element of every behavior of every character would be overwhelming even for short-duration simulations of forty or fifty characters. The characters need to have some level of autonomy. They need to be able to decide what to do and how to do it on their own. Then, when and if they receive instruction from the simulation author, they need to suspend or preempt their current behaviors to follow those instructions. There may also be times when authors have an overall narrative in mind for the simulations, but are less concerned about some of the details of the characters’ behaviors. This is one place where more sophisticated AI methods like partial planners may play a role (Kapadia et al. 2015). HCI also comes into play as one or more humans interact with the functional crowds during the simulation. They may be using a standard keyboard, mouse, and monitor. They may be using a mobile device. They may be using a gaming console. Or they may be using more advanced virtual reality (VR) devices. VR devices can provide a higher fidelity and therefore enable the subjects to see the virtual world in more detail. Using head-mounted displays (HMD) or CAVE systems allows the subject to view the virtual characters in a life-size format. The movements of subjects can also be motion captured in real time and displayed on their avatar, providing more realistic interaction with the virtual characters. Hardware interfaces can impact the level of a subject’s immersion into a virtual world and potentially their level of presence in the virtual world. Another aspect of HCI with virtual characters and functional crowds is a kind of history. If a subject spends longer durations in the virtual world and/or has repeated exposure to it, he or she may become familiar with some of the characters and form expectations about them. Subjects may learn their personality and behavioral quarks. Subject will expect some consistency in these behaviors. They may also expect the virtual characters to have some level of memory of past interactions. While these expectations can be met, it is still a challenge to provide the virtual characters with techniques that make these memories compact, efficient, and plausibly imperfect (Li et al. 2013). More research is needed.

Conclusions Functional crowds can increase the number of applications of crowd simulations and increase their accuracy, but as this chapter has discussed, there is additional research needed from character animation to AI to psychological models to HCI. Increased computing power will help, but is not an overall solution. Attempting to simulate realistic human behaviors is difficult. It is even more challenging at a large scale. When attempting to simulate realistic human behavior, we can end up losing focus.

8

J.M. Allbeck

One model or technique leads us to another and another until we have lost sight of our original goal. Too often researchers also design and implement a method and then go in search of a problem it might address. We might be better served to keep focused on an application or scenario and then determine what is and is not most critical to achieving its goals. Does the application really require a sophisticated planner or emotion model? How closely and for how long is the subject going to be observing the characters’ behaviors? Also, do we really need to simulate 500,000 characters at a time? At ground level in the center of a village or even large city, how many people can be seen at one time? Are there existing tools, open source or commercial, that can be used or modified? Too often researchers feel they have to construct their own models from scratch, ignoring years of effort done by others. In terms of both human effort and computation, use available resources wisely and do not put a large amount of effort into areas that will have little impact on the application. In this area of research, another question that is often asked is how do you validate your model. How can one validate human behavior? We could show videos of functional crowds to hundreds of people and ask them a variety of questions to try to determine if they think the character behaviors are realistic, reasonable, or even plausible, but we all have rather different ideas of what is reasonable behavior. Instead we choose to framework work in this area as the construction of a toolset to be used by subject matter experts to achieve their own goals. For example, an urban planner may wish to use functional crowds to analyze a proposed transportation system. Evaluate then becomes about whether or not the urban planner can use the functional crowds toolset to do the desired analysis. Does it have the parameters required? Is it usable by nonprogrammers? Can they increase or decrease fidelity relative to the input effort? As a research area, functional crowds is a young, but promising direction. It sits at the overlap of several other research communities, namely, computer graphics and animation, artificial intelligence, human-computer interaction, and psychology. As advances are made in each one of these disciplines, functional crowds can benefit.

Cross-References ▶ Biped Controller for Character Animation ▶ Blendshape Facial Animation ▶ Comparative Evalution of Crowd Animation ▶ Crowd Evacuation Simulation ▶ Crowd Formation Generation and Control ▶ Data-Driven Character Animation Synthesis ▶ Data-Driven Hand Animation Synthesis ▶ Depth Sensor Based facial and Body Animation Control ▶ Example-Based Skinning Animation ▶ Eye Animation ▶ Hand Gesture Synthesis for Conversational Characters

Functional Crowds

9

▶ Head Motion Generation ▶ Laughter Animation Generation ▶ Perceptual Evaluation of Human Animation ▶ Perceptual Study on Facial Expressions ▶ Perceptual Understanding of Virtual Patient Appearance and Motion ▶ Physically-Based Character Animation Synthesis ▶ Real-time Full Body Motion Control ▶ Real-Time Full Body Pose Synthesis and Editing ▶ Video-Based Performance Driven Facial Animation ▶ Visual Speech Animation

References Allbeck JM (2009) Creating 3D animated human behaviors for virtual worlds. University of Pennsylvania, Philadelphia, PA Allbeck JM (2010) CAROSA: a tool for authoring NPCs. In: Presented at the international conference on motion in games. Springer, pp 182–193. Allbeck J, Badler NI (2001) Consistent Communication with Control. In: Workshop on non-verbal and Verbal Communicative Acts to achieve contextual embodied agents at autonomous agents. Bai Y, Siu K, Liu CK (2012) Synthesis of concurrent object manipulation tasks. ACM Trans Graph 31(6):156 Balint T, Allbeck JM (2014) Is that how everyone really feels? emotional contagion with masking for virtual crowds. In: Presented at the international conference on intelligent virtual agents. Springer, pp 26–35 Balint T, Allbeck JM (2015) Automated generation of plausible agent object interactions. In: Presented at the international conference on intelligent virtual agents. Springer, pp 295–309 Bindiganavale R, Schuler W, Allbeck JM, Badler NI, Joshi AK, Palmer M (2000) Dynamically altering agent behaviors using natural language instructions. In: Proceedings of the fourth international conference on autonomous agents. ACM, New York, pp 293–300. doi:10.1145/ 336595.337503 Clegg A, Tan J, Turk G, Liu CK (2015) Animating human dressing. ACM Trans Graph 34(4):116 Durupinar F, Gudukbay U, Aman A, Badler N (2016) Psychological parameters for crowd simulation: from audiences to Mobs. IEEE Trans Vis Comput Graph 22(9):2145–2159 Feng A, Casas D, Shapiro A (2015) Avatar reshaping and automatic rigging using a deformable model. In: Presented at the proceedings of the 8th ACM SIGGRAPH conference on motion in games. ACM, pp 57–64 Hyde J, Carter E, Kiesler S, Hodgins J (2016) Evaluating animated characters: facial motion magnitude influences personality perceptions. ACM Trans Appl Percept 13(2):8:1–8:17 IVA (1998) International conference on intelligent virtual humans. Springer, Berlin Kapadia M, Pelechano N, Allbeck J, Badler N (2015) Virtual crowds: steps toward behavioral realism. Morgan & Claypool Publishers, San Rafael, California Li W, Allbeck JM (2011) Populations with purpose. In: Motion in games. Springer, Berlin/ Heidelberg, pp 132–143 Li W, Allbeck JM (2012) Virtual humans: evolving with common sense. In: Presented at the international conference on motion in games. Springer, pp 182–193 Li W, Di Z, Allbeck JM (2012) Crowd distribution and location preference. Comput Anim Virtual Worlds 23(3–4):343–351 Li WP, Balint T, Allbeck JM (2013). Using a parameterized memory model to modulate NPC AI. In: Presented at the intelligent virtual agents: 13th international conference, IVA 2013, Edinburgh, August 29–31, 2013, Proceedings, vol 8108. Springer, p 1

10

J.M. Allbeck

McDonnell R, Larkin M, Dobbyn S, Collins S, O’Sullivan C (2008) Clone attack! perception of crowd variety. In: Presented at the ACM Transactions on Graphics (TOG), vol 27. ACM, p 26 Pelechano N, Allbeck JM, Badler NI (2008) Virtual crowds: methods, simulation, and control. Synth Lect Comput Graph Animation 3(1):1–176 Pelechano N, Allbeck JM, Kapadia M, Badler NI (eds) (2016) Simulating heterogeneous crowds with interactive behaviors. CRC Press, Boca Raton, FL Pelkey CD, Allbeck JM (2014) Populating semantic virtual environments. Computer Animation and Virtual Worlds 25(3–4):403–410 Shapiro A (2011) Building a character animation system. In: Presented at the international conference on motion in games. Springer, pp 98–109 Sunshine-Hill B, Badler NI (2010) Perceptually realistic behavior through alibi generation. In Proceedings of the Sixth AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE’10), G. Michael Youngblood and Vadim Bulitko (Eds.). AAAI Press 83–88 Thalmann D, Musse SR, Braun A (2007) Crowd simulation, vol 1. Springer, Berlin

Crowd Evacuation Simulation Tomoichi Takahashi

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 State of the Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Lessons from the Past and Requirements for Simulation Systems . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Agent-Based Approach to Evacuation Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Crowd Evacuation Using Agent-Based Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Evacuation Scenarios and Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Agent Mental States and their Action Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Pedestrian Dynamics Model and the Mentality of Individuals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Guidance to Agents and Communication During Evacuation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

Abstract

Evacuation simulation systems simulate the evacuation behaviors of people during emergencies. In an emergency, people are upset and hence do not behave as they do during evacuation drills. Reports on past disasters reveal various unusual human behaviors. An agent-based system enables an evacuation simulation to consider these human behaviors, including their mental and social status. Simulation results that take the human factor into consideration seem to be a good tool for creating and improving preventions plans. However, it is important to verify and validate the simulation results for evacuations in unusual scenarios that have not yet occurred. This chapter shows that the combination of an agent’s physical and mental status and pedestrian dynamics is the key to replicating

T. Takahashi (*) Department of Information Engineering, Meijo University, Nagoya, Japan e-mail: [email protected] # Springer International Publishing Switzerland 2016 B. Müller, S.I. Wolf (eds.), Handbook of Human Motion, DOI 10.1007/978-3-319-30808-1_17-1

1

2

T. Takahashi

various human behaviors in crowd evacuation simulation. This realistic crowd evacuation simulation has the potential for practical application in the field. Keywords

Evacuation behavior • Emergency scenario • Agent-based simulation • Cognitive map • Psychological factor • Belief-desire-intention • Information transfer and sharing model • Verification and validation

Introduction Emergencies such as fires, earthquakes, or terrorist attacks can occur at any time in any location. Human lives are at risk both from man-made and natural disasters. The importance of emergency management has been reaffirmed by a number of reports related to various disasters. The September 11, 2001 World Trade Center (WTC) attacks and the Great East Japan Earthquake (GEJE) and ensuing tsunami that occurred on March 11, 2011 took many lives and caused serious injuries. Detailed reports that focus on occupant behavior during the WTC disaster and evacuation behavior after the tsunami alarm indicate that safety measures implemented beforehand and evacuation announcements on site can exert significant influence on individual evacuation behaviors (de Walle and Murray 2007; Averill et al. 2005; Cabinet Office Government of Japan). Many organizations engage in emergency preparation and provide training to save human lives during emergencies and reduce damage during future disasters (Cabinet Office of UK 2011; Turoff 2002). The disaster-prevention departments of governments, buildings, and other organizations develop these training programs. This training, executed beforehand, is useful to check whether the people are well prepared for unseen emergencies, can operate according to prevention plans, and evacuate quickly to safer locations. It is difficult to replicate emergent situations in the real world and drill for these situations while involving real humans. It is well known that humans behave differently when training and during emergencies. Sometimes, a drill can cause accidents. In fact, in December 2015, a university in Nairobi executed an antiterror exercise. The drill included the use of gunshots and this caused students and staff to panic. A number of people jumped from windows of the university buildings and were injured (News). Even statutory training in real situations can create risks for disabled people and some vulnerable groups. Simulation of the movement of people has been studied in various fields including computer graphics, movie special effects, and evacuations (Hawe et al. 2012; Dridi 2015). This technology allows a prevention center to simulate crowd evacuation behavior in multiple emergency scenarios that cannot be executed in the real world. Computer simulations help the prevention center to assess their plans for all emergencies that need to be considered. Crowd evacuation simulation is a key technology for making safety plans for future emergencies.

Crowd Evacuation Simulation

3

State of the Art Lessons from the Past and Requirements for Simulation Systems A number of studies have focused on human behavior during past disasters. The National Institute of Standards and Technology (NIST) examined occupant behavior during the attacks on the WTC buildings (Averill et al. 2005; Galea et al. 2008). The Cabinet Office of Japan also reported on evacuations of individuals during the GEJE (Cabinet Office Government of Japan). Common types of evacuation behaviors have been discovered: some individuals evacuated immediately when the disasters occurred, but others did not evacuate, even though they heard emergency alarms provided by the authorities. These people consisted of individuals who had family members located in remote areas, individuals who attempted to contact their families by phone, and individuals who continued to work because they believed they were safe. It is interesting to note that the individuals’ behaviors during these two disasters were similar to the behaviors of individuals during a flood in Denver, USA, on June 16, 1965, even though communication methods have changed over the past 50 years (Drabek 2013). Approximately 3,700 families were suddenly told to evacuate from their homes. The family behaviors that occurred following the warnings were categorized as follows: those who evacuated immediately, those who attempted to confirm the threat of disaster, and those who ignored the initial warning and continued with routine activities. Other features of human behaviors have been reported in other disasters. (1) In the 2003 fire in the Station nightclub, Rhode Island, most building occupants attempted to retract their steps to the entrance rather than follow emergency signs, even though the emergency exit was adequately signposted (Grosshandler et al. 2005). (2) In emergencies, humans tend to fulfill the roles assigned to them beforehand. For instance, trained people led the others in their offices promptly to safe places in the WTC attacks (Ripley 2008). (3) In contrast, a tragedy that occurred at the Okawa Elementary School during the GEJE demonstrates how untrained leaders may lead to tragedies (Saijo 2014). The school was located 5 km from the sea and had never practiced evacuation drills. When the earthquake occurred, an hour elapsed before teachers decided on an evacuation location. When moving to that location, they were informed that the tsunami was imminent and that their evacuation location was unsafe. They tried to evacuate to a higher location, but their efforts were too late. Most of the students and staff of the Okawa Elementary School were engulfed by the tsunami and died. The human behaviors that typically occur during emergencies vary by individual, and the behaviors may be different from those that are planned. The fluctuations in these behaviors are the key features that must be simulated in evacuation simulations. The evacuation behaviors depend on the individual who makes decisions and changes his/her actions according to his/her conditions and information. This information includes signs and public announcements (PAs) and is thought to affect

4

T. Takahashi

human behavior and be useful for guiding people quickly to safe places during dynamically changing situations.

Agent-Based Approach to Evacuation Simulations NIST simulated some evacuation scenarios to estimate the evacuation time from the WTC buildings (Kuligowski 2005). The travel times of several cases were simulated using several evacuation simulation systems, all which assume the following. People are equal mentally and functionally. In some simulators, sex and age are taken into consideration as parameters for walking speed in pedestrian dynamics models. To address roles in society, only human behaviors such as leaders in an office guiding people to get out of the building immediately were modeled. All people start their evacuation simultaneously. In fact, some people evacuate after they finish their jobs. The difference in premovement time of the individuals is not considered in these simulations. All people have the same knowledge about the building. They use one route when they evacuate from the building. Indeed, knowledge about the evacuation route differs among people, and the evacuation routes can be different. An agent-based approach provides a platform that corrects these assumptions. An agent-based simulation system (ABSS) models and computes individuals’ behaviors related to evacuation (Musse and Thalmann 2007). Various types of human behavior have been studied using the ABSS platform, for example, simulation of human behavior in a hypothetical human-initiated crisis in the center of Washington DC and a simulation tool incorporating different agent types and three kinds of interaction: emotion, information, and behavior (Tsai et al. 2011; Parikh et al. 2013). An ABSS consists of three parts: the agents, the interaction methods among agents and environments, and the surrounding environment. Agents perceive data from the environment and determine their actions according to their goals. An agent has the properties of physical features, social roles, mentality, and others. The actions are interactions with other agents and the environment. Information exchanges among agents and starting to evacuate are examples of actions. The interactions with the environment are simulated by sub-simulators and affect the status of the environment. The ABSS repeats these simulation steps: agent perception, agent decision-making, and environment calculations. The environment involves CAD models of buildings and scenarios of disaster situations. The following example demonstrates the ABSS process applied to an evacuation from a building during a fire. Agents hear alarms and PAs directing them to evacuate the building. The alarm noise and announcements can increase the anxiety of the agents, which is calculated using a psychological status model. The mental status and individual knowledge of the agent determine its actions. When it decides to go to a safe place, it visualizes the route to that place and moves. One sub-simulator

Crowd Evacuation Simulation

5

calculates the agent locations and the status of pedestrian jams inside the building, and the other sub-simulator calculates the spread of the fire.

Crowd Evacuation Using Agent-Based Simulations Evacuation Scenarios and Environment The environment corresponds to the tasks that the ABSS is applied to. The parameters in the environment affect the results of the simulations. Table 1 lists the categories of building evacuation scenarios. Case 1 is a situation in everyday life and the scenario corresponds to an emergency drill. The other four cases correspond to emergency situations in which some accident happens, but people do not have all the information they need. The conditions of each situation worsen from Cases 2 to 5. Providing a real-time evacuation guide for dynamically changing situations is thought to effectively reduce evacuation time. Case 2 corresponds to a minor emergency such as a small fire inside a building. The layout of the floor inside the buildings remains the same during the evacuation, as in Case 1. People also keep calm in this case. Cases 3, 4, and 5 correspond to situations where some people become distressed and may have trouble evacuating safely to exits. Case 3 is a situation where an earthquake causes furniture to fall to the floor that hinders or prevents evacuation. A case in which fire spreads and humans operate fire shutters to prevent the fire from spreading further is modeled in Case 4. This operation may block the evacuation routes and cause differences between the cognitive map of the evacuees and the real situation. Case 5 is the situation in extreme disasters, where large earthquakes cause so much destruction to parts of the building that the floor layout is completely changed. In Cases 3, 4, and 5, it is necessary to improve prevention plans in terms of available safe-escape time and required safe-escape time (ISO TR16738 2009). However, it is difficult to execute evacuation drills for such situations, as the case in Nairobi demonstrated. Evacuation simulation systems are instead proposed to simulate the evacuation behaviors of people in such situations. Table 1 Category of changing situations at evacuations Map Case 1 2 3 4 5

Situation Everyday Emergency

Map (3D) Static environment

Layout Same

Dynamic environment

Different Unknown

Agent Mental state Normal (getting more anxious)

Interaction mode Normal (getting more confusing)

Fitness for drills Fit (getting more unexpected)

Distressed

Crisis

Beyond the scope of drill

6

T. Takahashi

Agent Mental States and their Action Selection People’s state of distress reflects the motions of agents during emergencies. As a result, the agents take various actions according to the information that they have. Some people may prefer to trust only information from an authority figure, but others will trust their neighbors or heed messages sent from their acquaintances. These individual behaviors form into crowd behavior in emergencies. During the GEJE, about 34 % of 496 evacuees began their evacuation by taking the advice of acquaintances who themselves took the evacuation guidance seriously (Cabinet Office Government of Japan). The value of 34 % is the average of three prefectures, Iwate, Miyagi, and Fukushima. Their averages are 44 %, 30 %, and 3 %, respectively. The question then arises as to where and how people evacuate during emergencies. Abe, et al. conducted a questionnaire survey with individuals who shopped at a Tokyo department store (Abe 1986). Three hundred subjects were selected from shoppers in the department store. The number of male and female participants was equal, and participants ranged in age from teenagers to adults in their 60s. The questions addressed the following factors that occur during emergencies: the provision of evacuation instructions during emergencies, knowledge of emergency exit locations, an individual’s ability to evacuate safely, and other factors. The results in Table 2 reveal that: • Individuals’ intentions during emergencies were diverse. Differences were apparent between the sexes and between age groups. • Half of all surveyed individuals stated they would follow the authorities’ instructions. The other half stated they would select directions by themselves, and individuals who chose the fourth and fifth strategies (in Table 2) tended to choose opposite directions. Agents act according to their code of conduct or will, and social psychological factors affect human behavior. The implementation of autonomous agents includes modeling the process of an individual’s perception, planning, and decision-making. Modeling the mental state of an agent is key to simulating the evacuation behavior of people. The psychological factors affect human actions that include selfish Table 2 Responses to “In which direction would you evacuate?” (Abe 1986)

1 2 3 4 5 6 7 8

Selected actions Follow instructions from clerks or announcements Hide from smoke Go to the nearest staircase or emergency exit Follow other individuals' movements Go in the direction that has fewer people Go to bright windows Retrace his/her path Other

All (%) 48.7 26.3 16.7 3 3 2.3 1.7 0.3

Sex Male (%) 38 30.7 20.7 1.3 2.7 2.7 2.7 0.7

Female (%) 54.7 22 12.7 4.7 3.3 2 0.7 –

Crowd Evacuation Simulation

7

movements, altruistic movements, and others. The following cases demonstrate some properties of human behavior. These actions also change the behavior of crowd evacuations. • People swerve when they come close to colliding with each other. When people see responders approaching, they make way to pass them automatically. The two behaviors are similar; however, they are different at the conscious level of an agent. Agents categorize the agents around them into normal or high-priority groups depending on common beliefs in the agent’s community. For example, the agent gives consideration to the rescuers and disabled, both of whom are categorized as agents with high priority. • Families evacuate together. When parents are separated from their children during emergencies, they become anxious and go to their children at the risk of their own safety. For instance, the child might be in a toy section in a department store and have no ability to ask others about his/her parents.

Pedestrian Dynamics Model and the Mentality of Individuals The belief-desire-intention (BDI) model is one method for representing how agents select actions according to the situation during the sense-reason-act cycle (Weiss 2000). Belief represents the information that the agent obtains from the environment and other agents. Desire represents the objectives or situations that the agent would like to accomplish or bring about, and their actions, which are selected after deliberation, are represented by intention. In the case of evacuation in emergency situations, the desires are to move quickly to a safe place, know what happened, or join families. The associated actions are to move to specific places. These actions are represented as a sequence of target points. The target points are the places where people go to satisfy their desires. Movements, including bidirectional movements in a crowd, can be microsimulated in one step using pedestrian dynamics models (Helbing et al. 2000). The models are composed of geometrical information and a force model that resembles the behaviors of real people. The behaviors of individuals may block others who are hurrying to refuges and hence cause pedestrian jams in evacuation (Pelechano et al. 2008; Okaya and Takahashi 2014).

Guidance to Agents and Communication During Evacuation The NIST report showed differences in evacuation behaviors between the two buildings, WTC1 and WTC2. The buildings were similar in size and layout, and similar numbers of individuals were present in the buildings during the attacks. Individuals in both buildings began to evacuate when WTC1 was attacked, and WTC2 was attacked 17 min later. At that time, about 83 % of survivors from WTC1 remained inside the tower, and about 60 % of survivors remained inside WTC2. The

8

T. Takahashi

difference in evacuation rates between two buildings given similar conditions indicates that there are other interactive and social issues that should be taken into consideration to simulate crowd evacuation behavior. A PA gives evacuation guidance to people. According to the GEJE report, only 56 % of evacuees heard the emergency alert warning from a loudspeaker. Of these, 77 % recognized the urgent need for evacuation, and the remaining 23 % did not understand the announcement because of noisy and confused situations. Nowadays, people communicate with others in public using cellar phones. This behavior is assumed to happen during emergencies. Indeed, in GEJE, 2011, it was reported that people knew and shared information using SNS and personal communications (Okumura 2014). In a case of family’s evacuation, the following communications between parents and their children often occurs when they are apart. Where are you? I am at location X. All right, I will be there soon, stay there.

Information regarding the situation and personal circumstances play an important role when determining actions. The information affects both the premovement and travel times of evacuation behaviors. With respect to the information or knowledge of people, whether broadcast or communicated personally, the evacuation process has the following phases: When emergencies occur, people either perceive the occurrence themselves or authorities make announcements. The alarm contains urgent messages conveying that an emergency situation has occurred and gives evacuation instructions. People confirm and share the information that they obtain by communicating with people nearby. After that, people perform actions according to their personal reasons: some evacuate to a safe place, others hurry to their families, and still others join rescue operations. People who are unfamiliar with the building follow guidance from authorities or employees who act according to prescribed rules or manuals of the buildings. The information that authorities and employees have may vary with time. The information transfer and sharing model enables the announcement of proper guidance to people or information sharing during evacuation (Okaya et al. 2013). The difference in agents’ information and style of communication causes the diversity of human behavior and affects the behavior of evacuations (Niwa et al. 2015).

Future Directions An ABSS is expected to simulate the behaviors of agents in unusual scenarios that are difficult to test in the real world. We learn how people behave and evacuate during disasters from media stories and reports published by those in authority.

Crowd Evacuation Simulation

9

These reports cover evacuation from airplanes, ships, theaters, sport stadiums, stations, underground transport systems, and others (Wanger and Agrawal 2014; Peacock et al. 2011; Weidmann et al. 2014). Behavior models have been formulated to meet the innate human features that were described in the reports and are key features of these evacuation simulations. Table 3 shows the parameters of the evacuation models in which human behaviors are taken into account. The parameters represent the features of the agents, environment, and interactions among agents or others during the scenarios. In addition, the parameters specify the evacuation scenarios. Some of the parameters are related to each other; for example, parameters related to pedestrian dynamics are personal spaces, speed, and avoidance sides, and others are dependent on countries (Natalie Fridman and Kaminka 2013). In scientific and engineering fields, the following principle, hypothesis ! compute consequence ! compare results, has been used to make models and to increase the fidelity of simulations (Feynman 1967). Fundamentally, this principle is applied to the crowd evacuation simulation. The following points are assumed when modeling crowd evacuation behaviors: Whole-part relations assumption. A crowd evacuation simulation system is composed of subsystems: an agent’s action planning, pedestrian dynamics, and disaster situations. A model for evacuation behavior is implemented in each Table 3 Evacuation simulation parameters Subsystem Agent

Parameters Physical

Mental/social

Perception Action

Preference Environment

Map/buildings Subsystem

Interaction

Communication Human Relationship

Age Sex Impaired/unimpaired State of mind Human relationships (family, office member, etc.) Role (teacher, leader, rescue responder, etc.) Visual data Auditory data Evacuate (walk/run) Communicate (hear, talk, share information among agents) Others (altruistic behavior, rescue operation) Culture Nationality 2D/3D Elevator Pedestrian dynamics Disasters effects (fire, smoke, etc.) Announcement (guidance from PA) Information sharing Personal Community

10

T. Takahashi

agent, and the pedestrian dynamics models calculate the positions of the agents. The movements of agents are integrated into crowd behaviors. Subsystem causality assumption. The agent’s behavior is simulated by formulas or rules in each agent at every simulation step. In each step, the status of the system is changed to a new status according to the parameters, models, and formulas. They may be refined to cover more phenomena or make the results of subsystem simulations more consistent with experimental data or empirical rules. Total system validity assumption. The simulation results of the subsystems and the positions of all agents are integrated into the results of the crowd evacuation simulation. The results of the simulation are checked with empirical rules or previous data. At the second assumption, the model of the subsystem is verified with respect to real data, and the parameters are tuned to the conditions of the scenarios (Peacock et al. 2011; Weidmann et al. 2014; Ronchi et al. 2013). The Tokyo fire department publishes a guide for building fire safety certificates based on simulation results (Tokyo Fire Department). The results predict the evacuation time at fire under their specified method and can be used to certify the likelihood of a safe evacuation. The simulations are in Case 1 of Table 1, which is equivalent to evacuation drills in everyday conditions. At the third assumption, people evaluate the results of simulation from their personal and organizational perspectives. Using an ABSS with the functions mentioned in section “Introduction” can simulate more realistic conditions such as those in Cases 2 to 4.When the integrated simulation results are likely to be reasonable in unexpected situations, there is no evidence to endorse whether or not the results can be used in real applications. In a case in which the results do not fit the empirical rule, even though it may involve a significant predictor, it is difficult to adopt the simulation results in a prevention plan according to scientific and engineering principles because we do not have enough real data and cannot perform experiments in real situations, as in the case of evacuation simulation. It is important and required to verify the results of evacuation simulations for emergency situations that have not occurred and affirm that the planning based on the simulation results will work well in a possible emergency situation. Verification and validation (V&V) of the simulation tools and results has been one of the most important issues in crowd evacuation simulations. V & V problems are represented using the following questions: How do we judge if a tool is accurate enough? How many and which tests should be performed to assess the accuracy of the model predictions? Who should perform the test, i.e., the model developers, the model users, or a third party? Does the model accurately represent the source system? Does the model accommodate the experimental frame? Is the simulator correct?

Crowd Evacuation Simulation

11

The questions are essential to ABSS. Questions 1 to 3 are from the test methods that are suggested from quantitative/qualitative points for behavioral uncertainty (Ronchi et al. 2013). Questions 4 to 6 are from a study of validation on evacuation drills from a ten-story building (Isenhour and Löhner 2014). A method of comparing simulation results to real scenarios as macroscopic patterns in a quantitative manner has been proposed as a validation method (Banerjee and Kraemer 2010). Interactions among agents and dynamically changing environments also affect the behavior of crowd evacuations. A verification test is suggested in order to check evacuation plans under the dynamic availability of exits (Ronchi et al. 2013). The following qualitative standards are proposed for application in simulations without real-world data that involve real evacuation data and experimental data (Takahashi 2015): Consistency with data. The simulation results or its variations after changing parameters or modifying subsystems are compatible with past anecdotal reports. Generation of new findings. The results involve something that was not recognized as important before the simulations, which is reasonable given empirical rules. Accountability of results. The cause of the changes can be explainable from the simulation data systematically. While we do not have answers to these questions, ABSS has been applied to more realistic situations. For example, an evacuation from a building with fire shutters is a realistic case (Takahashi et al. 2015). The fire shutters are installed in buildings by law to prevent fire and smoke from spreading inside. Some agents evacuate instantly and others evacuate after finishing their jobs. Operators at the prevention center close the fire shutters at time t1 to prevent fire spreading. If there is no announcement regarding the shutter closing, the agents don’t know the changes of environments. They evacuate according to their own cognitive map, which might not be updated until they notice the fire shutter closing at t2. As a result, the evacuation time from t1 to t2 is wasted, even though the agent starts evacuation instantly. This simulation demonstrates that evacuation times change for various scenarios in dynamically changing environments corresponding to Case 3, 4 and 5 and proves the potential of evacuation simulation for future applications. In this chapter, we presented some features of crowd evacuation simulations: the role of human mental conditions during emergencies, the presentation of agent mental states, and information on evacuations. We also showed that the combination of an agent’s physical and mental status and pedestrian dynamics is the key to simulating crowd evacuation and replicating various human behaviors. Simulating crowd evacuation more realistically introduces additional human-related factors. This makes it difficult to systematically analyze the simulation results and compare them with data from the real world. At present, the simulation results are not so much objectively measured as subjectively interpreted by humans. Future research and model development will focus on the study of agent interactions, human mental models, and verification and validation problems.

12

T. Takahashi

References Abe K (1986) Panic and human science: prevention and safety in disaster management. Buren Shuppan. in Japanese, Japan Averill JD, Mileti DS, Peacock RD, Kuligowski ED, Groner NE (2005) Occupant behavior, egress, and emergency communications (NIST NCSTAR 1–7). Technical report, National Institute of Standards and Technology, Gaitherburg B. news Woman dies after ‘terror drill’ at Kenya’s strathmore university. http://www.bbc.com/news/ world-africa-34969266. Date:16 Mar 2016 Banerjee B, Kraemer L (2010) Validation of agent based crowd egress simulation (extended abstract). In: International conference on autonomous agents and multiAgent systems (AAMAS’10). pp 1551–1552. http://www.aamas-conference.org/proceeding.html Cabinet Office Government of Japan. Prevention Disaster Conference, the Great West Japan Earthquake and Tsunami. Report on evacuation behavior of people (in Japanese). http://www. bousai.go.jp/kaigirep/chousakai/tohokukyokun/7/index.html. Date: 16 Mar 2016. in Japanese. Cabinet Office of UK (2011) Understanding crowd behaviours: documents. https://www.gov.uk/ government/publications/understanding-crowd-behaviours-documents. 20 Mar 2016 de Walle BV, Murray T (2007) Emergency response information systems: emerging trends and technologies. Commun ACM 50(3):28–65 Drabek TE (2013) The human side of disaster, 2nd edn. CRC Press, Boca Raton Dridi M (2015) Simulation of high density pedestrian flow: a microscopic model. Open J Model Simul 3(4):81–95 Feynman RP (1967) Seeking new laws. In: The character of physical law. The MIT Press, Cambridge Galea ER, Hulse L, Day R, Siddiqui A, Sharp G, Boyce K, Summerfield L, Canter D, Marselle M, Greenall PV (2008) The uk wtc9/11 evacuation study: an overview of the methodologies employed and some preliminary analysis. In: Schreckenberg A, Klingsch WWF, Rogsch C, Schreckenberg M (eds) Pedestrian and evacuation dynamics 2008 (pp. 3–24). Springer, Heidelberg Grosshandler WL, Bryner NP, Madrzykowski D, Kuntz K (2005) Report of the technical investigation of the station nightclub fire (NIST NCSTAR 2). Technical report, National Institute of Standards and Technology, Gaitherburg Hawe GI, Coates G, Wilson DT, Crouch RS (2012) Agent-based simulation for large-scale emergency response. ACM Comput Surv 45(1):1–51 Helbing D, Farkas I, Vicsek T (2000) Simulating dynamical features of escape panic. Nature 407:487–490 Isenhour ML, Löhner R (2014) Validation of a pedestrian simulation tool using the {NIST} stairwell evacuation data. Transp Res Procedia 2:739–744, The Conference on Pedestrian and Evacuation Dynamics 2014 (PED 2014), 22–24 October 2014, Delft, The Netherlands ISO:TR16738:2009. Fire-safety engineering – technical information on methods for evaluating behaviour and movement of people Kuligowski ED (2005) Review of 28 egress models. In: NIST SP 1032; Workshop on building occupant movement during fire emergencies. Musse SR, Thalmann D (2007) Crowd simulation. Springer-Verlag, London Natalie Fridman AZ, Kaminka GA (2013) The impact of culture on crowd dynamics: an empirical approach. In: International conference on autonomous agents and multiagent systems, AAMAS’13, p 143–150 Niwa T, Okaya M, Takahash T (2015) TENDENKO: agent-based evacuation drill and emergency planning system. Lecture Notes in Computer Science 9002. Springer, Heidelberg Okaya M, Takahashi T (2014) Effect of guidance information and human relations among agents on crowd evacuation behavior. In: Kirsch U, Weidmann U, Schreckenberg M (eds) Pedestrian and evacuation dynamics 2012. Springer, Cham

Crowd Evacuation Simulation

13

Okaya M, Southern M, Takahashi T (2013) Dynamic information transfer and sharing model in agent based evacuation simulations. In: International conference on autonomous agents and multiagent systems, AAMAS 13. pp 1295–1296 Okumura H (2014) The 3.11 disaster and data. J Inf Process 22(4):566–573 Parikh N, Swarup S, Stretz PE, Rivers CM, Bryan MVM, Lewis L, Eubank SG, Barrett CL, Lum K, Chungbaek Y (2013) Modeling human behavior in the aftermath of a hypothetical improvised nuclear detonation. In: International conference on autonomous agents and multiagent systems, AAMAS’13, pp 949–956 Peacock RD, Kuligowski ED, Averill JD (2011) Pedestrian and evacuaion dynamics. Springer, Heidelberg Pelechano N, Allbeck J, Badler N (2008) Virtual crowds: methods, simulation, and control. Morgan & Claypool Publishers series, California, New York Ripley A (2008) The unthinkable: who survives when disaster strikes – and why. New York: Three Rivers Press Ronchi E, Kuligowski ED, Reneke PA, Peacock RD, Nilsson D (2013) The process of verification and validation of building fire evacuation models. Technical report. National Institute of Standards and Technology, Gaitherburg. Technical Note 1822 Saijo T (2014) Be a tsunami survivor. http://wallpaper.fumbaro.org/survivor/tsunami_en_sspj.pdf. Date:17 Mar 2016 Takahashi T (2015) Qualitative methods of validating evacuation behaviors. In Takayasu H, Ito N, Noda I, Takayasu M (eds) Proceedings of the international conference on social modeling and simulation, plus econophysics colloquium 2014. Springer proceedings in complexity. Springer International Publishing, pp 231–242 Takahashi T, Niwa T, Isono R (2015) Method for simulating the evacuation behaviours of people in dynamically changing situations. In: Proceedings of TGF2015. Springer. To be published in 2016 Fall Tokyo Fire Department. Excellence mark -certified fire safety building indication system. http:// www.tfd.metro.tokyo.jp/eng/inf/excellence_mark.html. Date: 25 Jan 2016 Tsai J, Fridman N, Bowring E, Brown M, Epstein S, Kaminka G, Marsella S, Ogden A, Rika I, Sheel A, Taylor ME, Wang X, Zilka A, Tambe M (2011) Escapes: evacuation simulation with children, authorities, parents, emotions, and social comparison. In: The 10th international conference on autonomous agents and multiagent systems, vol 2, AAMAS’11. International Foundation for Autonomous Agents and Multiagent Systems, Richland, pp 457–464 Turoff M (2002) Past and future emergency response information systems. Commun ACM 45 (4):29–32 Wanger N, Agrawal V (2014) An agent-based simulation system for concert venue crowd evacuation modeling in the presence of a fire disaster. Expert Syst Appl 41:2807–2815 Weidmann U, Kirch U, Schreckenberg M (eds) (2014) Pedestrain and evacuation dynamics 2012. Springer, Heidelberg Weiss G (2000) Multiagent systems. The MIT Press, Massachusets

Perceptual Study on Facial Expressions Eva G. Krumhuber and Lina Skora

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 State of the Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Early Beginnings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Dynamic Advantage in Facial Expression Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Temporal Characteristics: Directionality and Speed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Effects of Facial Motion on Perception and Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Ratings of Authenticity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Person Judgments and Behavioral Responses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Facial Mimicry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Neuroscientific Evidence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Abstract

Facial expressions play a paramount role in character animation since they reveal much of a person’s emotions and intentions. Although animation techniques have become more sophisticated over time, there is still need for knowledge in terms of what behavior appears emotionally convincing and believable. The present chapter examines how motion contributes to the perception and interpretation of facial expressions. This includes a description of the early beginnings in research on facial motion and more recent work, pointing toward a dynamic advantage in facial expression recognition. Attention is further drawn to the potential characteristics (i.e., directionality and speed) that facilitate such dynamic advantage. This is followed by a review on how facial motion affects perception and

E.G. Krumhuber (*) • L. Skora University College London, London, UK e-mail: [email protected] # Springer International Publishing AG 2016 B. Müller, S.I. Wolf (eds.), Handbook of Human Motion, DOI 10.1007/978-3-319-30808-1_18-1

1

2

E.G. Krumhuber and L. Skora

behavior more generally, with the neural systems that underlie the processing of dynamic emotions. The chapter concludes by discussing remaining challenges and future directions for the animation of natural occurring emotional expressions in dynamic faces. Keywords

Motion • Dynamic • Facial expression • Emotion • Perception

Introduction Among the 12 principles of animation developed in the early 1930s, animators at the Walt Disney Studios considered motion to be fundamental for creating believable characters. They were convinced that the type and speed of an action help define a character’s intentions and personality (Kerlow 2004). Since the early days of character animation, much has changed in animation techniques and styles. From hand-drawn cartoon characters to real-time three-dimensional computer animation, the field has seen a major shift toward near-realistic characters that exhibit humanlike behavior. Whether those are used for entertainment, therapy, or education, the original principle of motion continues to be of interest in research and design. This particularly applies to the topic of facial animation as subtle elements of the character’s thoughts and emotions are conveyed through the face (Kappas et al. 2013). Facial expressions provide clues and insight about what the character thinks and feels. They act as a powerful medium in conveying emotions. Although the tools for facial animation have become more sophisticated over time (i.e., techniques for capturing and synthesizing facial expressions), there is still need for knowledge about how humans respond to emotional displays in moving faces. Only if the character appears emotionally convincing and believable will the user/audience feel comfortable in interaction. The present chapter aims to help with this task by providing an overview of the existing literature on the perception of dynamic facial expressions. Given the predominant focus on static features of the face in past research, we seek to highlight the beneficial role of facial dynamics in the attribution of emotional states. This includes a description of the early beginnings in research on facial motion and more recent work pointing toward a dynamic advantage in facial expression recognition. The next section draws attention to the potential characteristics that facilitate such dynamic advantage. This is followed by a review on how facial motion affects perception and behavior more generally. Neural systems in the processing of dynamic emotions and their implications for action representation are also outlined. The final section concludes the paper by discussing remaining challenges and future directions for the animation of natural occurring emotional expressions in dynamic faces.

Perceptual Study on Facial Expressions

3

State of the Art Early Beginnings In everyday settings, human motion and corresponding properties (e.g., shapes, texture) interact to produce a coherent percept. Yet, motion conveys important cues for recognition even in isolation from the supportive information. The human visual system, having evolved in dynamic conditions, is highly attuned to dynamic signals within the environment (Gibson 1966). It can use this information to identify an agent or infer its actions purely by the motion patterns inherent to living organisms, called biological motion (Johansson 1973). Investigations of biological motion of the face suggest that the perception of faces is aided by the presence of nonrigid facial movements, such as stretching, bulging, or flexing of the muscles and the skin. In an early and now seminal point-light paradigm (Bassili 1978), all static features of actors’ faces, such as texture, shape, and configuration, were obscured with the use of black makeup. Subsequently, the darkened faces were covered with approximately 100 luminescent white dots and video recorded in a dark room displaying a range of nonrigid motion, from grimaces to the basic emotional expressions (happiness, sadness, fear, anger, surprise, and disgust). The dark setup resulted in only the bright points being visible to the observer, moving as a result of facial motion. In a recognition experiment, the moving dots were recognized as faces significantly better than when the stimulus was shown as a sequence of static frames or as a static image. Similarly, moving point-light faces enabled above-chance recognition of the six basic emotional expressions in comparison to motionless point-light displays (Bassili 1979; Bruce and Valentine 1988). This suggests that when static information about the face is absent, biological motion alone is distinctive enough to provide important cues for recognition.

Dynamic Advantage in Facial Expression Recognition Subsequent research has pointed toward a motion advantage especially when static facial features are compromised. This is of particular relevance for computergenerated, synthetic faces (e.g., online avatars, game characters). In comparison to natural human faces, synthetic faces are still inferior in terms of their realistic representation of the finer-grained features, such as textures, skin stretching, or skin wrinkling. Such impairment in quality of static information can be remedied by motion. Numerous studies have shown that expression recognition in dynamic synthetic faces consistently outperforms recognition in static synthetic faces (Ehrlich et al. 2000; Wallraven et al. 2008; Wehrle et al. 2000; Weyers et al. 2006). This suggests that motion is able to add a relevant layer of information when synthetic features fail to provide sufficient cues for recognition. The effect is found both under uniform viewing quality and when the featural or textural information is degraded (e.g., blurred).

4

E.G. Krumhuber and L. Skora

For natural human faces, however, the dynamic advantage is weaker or inexistent when the quality of both static and dynamic displays is comparably good (Fiorentini and Viviani 2011; Kamachi et al. 2001; Kätsyri and Sams 2008). As such, motion is likely to provide additional cues for recognition when key static information is missing (i.e., in degraded and obscured expressions). Its benefits may be redundant when the observer can draw enough information from the static properties of the face. This applies to static stimuli that typically portray expressions at the peak of emotionality. Such stimuli, prominently used in face perception research, are characterized by their stereotypical depiction of a narrow range of basic emotions. They are often also posed upon instructions by the researcher and follow a set of prototypical criteria (e.g., Facial Action Coding System, FACS; Ekman and Friesen 1978). In this light, it is likely that stylized static expressions contain the prototypical markers of specific emotions, thereby facilitating recognition. Yet, everyday emotional expressions are spontaneous and often include non-prototypical emotion blends or patterns. They are normally also of lower intensity, potentially becoming more difficult to identify without supportive cues such as motion. For instance, low-intensity expressions, which tend to be more difficult to identify the less intense they get, are recognized significantly better in a dynamic than static form (Ambadar et al. 2005; Bould and Morris 2008). In this context, motion appears to provide additional perceptual cues, making up for insufficient informative signals.

Temporal Characteristics: Directionality and Speed How can we explain the motion advantage in expression recognition? Could it simply derive from an increase in the amount of cues in a dynamic sequence? Early hypotheses point out that a moving sequence contains a greater number of static information from which to infer emotion judgments than a single static portrayal (Ekman and Friesen 1978). Arguably, as a dynamic sequence unfolds, it provides multiple samples of the developing expression compared to a single sample in static displays. To test this assumption, Ambadar et al. (2005) compared emotion recognition performances between dynamic, static, and multi-static expressions. In the multi-static condition, static frames constituting a video were interspersed with visual noise masks disrupting the fluidity of motion. Out of these, dynamic expressions were recognized with a significantly greater accuracy than both multi-static and static portrayals (see also Bould and Morris 2008). This suggests that the intrinsic temporal quality of the unfolding expression is what helps to disambiguate its content rather than a mere increase in static frames. A likely candidate that facilitates the dynamic advantage is the directionality of change in the expression over time. Research shows that humans are highly sensitive to the direction in which the expression unfolds. For example, they are able to accurately detect the directionality in a set of scrambled images and arrange them into a temporally correct sequence (Edwards 1998). Similarly, disrupting the natural temporal direction of the expression results in worse recognition accuracy than when

Perceptual Study on Facial Expressions

5

the expressions unfold naturally. In a series of experiments, Cunningham and Wallraven (2009b) demonstrated this by applying various manipulation techniques to the direction of unfolding, such as scrambling the frames in a dynamic sequence or playing them backward. Their results indicate that the identification of emotional expressions suffers considerably when natural motion is interrupted. Recognition performance also appears to be better in sequences in which the temporal unfolding is preserved, thereby allowing the directionality of change to be observed as the expression emerges (Bould et al. 2008, but see Ambadar et al. 2005 for a contrasting result). Yet, it is noteworthy that this effect might not affect all emotions equally. For example, happiness is typically recognized better than other basic emotions regardless of condition. In addition to the movement direction, the velocity of unfolding plays a crucial role in emotion perception. Changes in viewing speed, such as slowing down or speeding up of the dynamic sequences, significantly affect expression recognition accuracy. This effect appears to be different between emotions based on the differences in their intrinsic optimum velocities. For example, sadness is naturally slow; so slowed-down viewing conditions do not impact it negatively as much as they impact recognition accuracy for all other tested emotions (Kamachi et al. 2001). Conversely, surprise is naturally fast, and it could be its natural velocity that distinguishes it from the morphologically similar expression of fear which is slower (Sato and Yoshikawa 2004). Importantly, changing the speed throughout an entire expression results in different effects as changes to the duration of the peak. This suggests that the beneficial effects of natural movements cannot simply be explained by the mere exposure time to the expression (Kamachi et al. 2001; Recio et al. 2013). Overall, altering the speed of expression unfolding appears to influence perception without affecting the direction of change. As such, the intrinsic velocities of particular emotional expressions are likely to provide stronger cues for recognition than the perception of change alone (Bould et al. 2008). Finally, the perception of dynamic faces is also linked to the quality of motion. While expressions in real faces unfold in a biologically natural manner (i.e., nonlinearly), facial animations have been often characterized by linear techniques. Such linearly unfolding facial expressions (e.g., dynamic displays morphed from individual static displays) yield slower and poorer recognition accuracy in comparison to natural, nonlinear unfolding, as well as worse naturalness and genuineness ratings (Cosker et al. 2010; Wallraven et al. 2008). As a result, linear morphs might not constitute a good representation of the real-life quality of facial motion, which is particularly relevant to the construction of realistic synthetic faces. However, recent developments within the field of affective computing identify multiple parameters linked to naturalistic expression unfolding that can improve the quality of motion in computer-generated faces and raise their recognition rates, such as appropriate speeds, action unit (AUs) activations, intensities, asymmetries, and textures (Krumhuber et al. 2012; Recio et al. 2013; Yang et al. 2013). As such, the benefits provided by motion appear to be more than the perception of motion itself. Instead, it is a comprehensive set of information deriving from the temporal characteristics

6

E.G. Krumhuber and L. Skora

including the perception of change, intrinsic velocity of an expression, and the quality of motion.

Effects of Facial Motion on Perception and Behavior In addition to the supportive role in expression recognition, motion also affects a number of perceptual and behavioral factors. Those include expression judgments such as intensity and authenticity, as well as behavioral responses and even mimicry. Firstly, emotions expressed in a dynamic form are perceived to be more intense than the same emotions in a static form (Biele and Grabowska 2006; Cunningham and Wallraven 2009a). Motion appears to enhance intensity estimates because of the changes in the expression as it develops from neutral to fully emotive. While static portrayals retain the same intensity level throughout the presentation time, dynamic changes highlight the contrast between the neutral and fully emotional expression. As such, the contrast makes the expression seem more intense (Biele and Grabowska 2006). Another explanation for this effect was offered in terms of representational momentum (RP). RP is a visual perception phenomenon in which the observer exaggerates the final position of a gradually moving stimulus. It often involves a forward displacement. For example, when a moving object disappears from the visual field, observers tend to report its final position as displaced further down its trajectory than it objectively was. In a study about dynamic facial expressions and RP, Yoshikawa and Sato (2008) found that participants exaggerated the last – fully emotive – frame of the dynamic sequence and remembered it as more intense that it was in reality. The effect also got more pronounced with increasing velocity of expression unfolding. As such, it seems that the gradual shift from neutral to emotional in dynamic expressions generates a forward displacement, inducing an exaggerated and intensified perception of the final frame in the sequence.

Ratings of Authenticity Motion also appears to help observers assess the authenticity of an expression better than static portrayals can. Authenticity refers to more than correct identification of the emotional expression observed. It is a quality telling us whether the emotion is genuinely experienced or not. Smiles have been prominently used to study this dimension. Being universal and widespread in everyday interactions, smiles can indicate a range of feelings, from happiness and amusement to politeness and embarrassment (Ambadar et al. 2009). However, smiles can also be easily used to mask real emotions or to deceive others (e.g., Ekman 1985). As such, they constitute a good stimulus to study the genuineness of the underlying feeling. Traditionally, the so-called Duchenne marker has been considered as an indicator of smile authenticity (Ekman et al. 1990), where its presence signals that a smile is genuine (“felt”) as opposed to false (“unfelt”). The Duchenne marker involves, in addition to the lip corner puller (zygomaticus major muscle), the activation of the

Perceptual Study on Facial Expressions

7

orbicularis oculi muscle surrounding the eye. This results in wrinkling on the sides of the eyes, commonly referred to as crow’s feet. While the validity of the Duchenne marker in the perception of static expressions is well documented, motion properties are crucial for assessing smile authenticity in dynamic displays (e.g., Korb et al. 2014; Krumhuber and Manstead 2009). For example, genuine smiles differ in lip corner and eyebrow movements from deliberate, false smiles (Schmidt et al. 2006; Schmidt et al. 2009). More specifically, Frank et al. (1993) highlighted three dynamic markers of genuine smiles: expression duration, synchrony in muscle activation (between zygomaticus major and orbicularis oculi muscles), and smoothness of mouth movements. Overall, genuine smiles last between 500 and 4000 ms, whereas false smiles tend to be shorter or longer (Ekman and Friesen 1982). Furthermore, the smoothness and duration of the expressive components of smiles are meaningful indicators. Bugental (1986) and Weiss et al. (1987) were first to show that the onset and offset in false smiles tend to be faster in comparison to felt smiles (see also Hess and Kleck 1990). To investigate whether these differences affect expression perception, Krumhuber and Kappas (2005) manipulated onset, apex, and offset timings of computergenerated smiles. Their results confirmed the proposition that each dynamic element of a smiling expression has an intrinsic duration range at which it looks genuine. In particular, expressions are perceived as more authentic the longer their onsets and offsets, while a long apex is linked to lower genuineness ratings.

Person Judgments and Behavioral Responses Besides their effects on authenticity ratings, dynamic signals influence trait attributions and behavioral responses to the target expressing an emotion. For instance, people displaying dynamic genuine smiles (long onset and offset) are rated as more trustworthy, more attractive, and less dominant than those who show smile expressions without those characteristics (Krumhuber et al. 2007b). In addition, facial movement helps to regulate interpersonal relations by shaping someone’s intention to approach or cooperate with another person. In economic trust games, participants can receive a financial gain if their counterpart cooperates but incur a loss if the counter-player fails to cooperate. As such, their performance depends on accurate assessment of the counterpart’s intentions. Krumhuber and colleagues (Krumhuber et al. 2007a) showed that people are more likely to trust and engage in an interaction with a counterpart who displays a dynamic authentic smile than a dynamic false smile or neutral expression. Participants with genuinely smiling counterparts also ascribe more positive emotions and are more inclined to meet them again. Furthermore, people showing dynamic genuine smiles are evaluated more favorably and considered more suitable candidates in a job interview than those who do not smile or smile falsely (Krumhuber et al. 2009). Notably, this effect applies to real human faces as well as to computer-generated ones. When comparing static and dynamic facial features, it appears that they contribute to different evaluations and social decisions. Static and morphological features,

8

E.G. Krumhuber and L. Skora

such as bone structure or width, have been found to affect judgments of ability and competence. In turn, features that are dynamic and malleable, like muscular patterns in emotional expressions, affect judgments of intentions (Hehman et al. 2015). Given that these facial signals are also linked to evaluations of trustworthiness and likeability, they are likely to drive decision-making in social interactions. In line with this argument, participants were shown to choose a financial advisor, a role requiring trust, based on dynamic rather than static facial properties (Hehman et al. 2015).

Facial Mimicry Existing evidence suggests that dynamic facial displays elicit involuntary and subtle imitative responses more evidently than do static versions of the same expression (Rymarczyk et al. 2011; Sato et al. 2008; Weyers et al. 2006). Those responses, interpretable as mimicry, are a result of activity in facial muscles corresponding to a given perceived expression (i.e., lowering the eyebrows in anger, pulling the lip corners in happiness). They occur spontaneously and swiftly (about 800–900 ms) after detecting a change in the observed face. While involuntary facial mimicry is a subtle rather than full-blown replication of a witnessed emotion, it is evident enough to be distinguished in terms of its valence (positive or negative quality) by independent observers (Sato and Yoshikawa 2007a). Crucially, the presence of mimicry has a supporting role in emotion perception. For example, being able to mimic helps observers to recognize the emotional valence of expressions (Sato et al. 2013). Happiness and disgust are less well identified when corresponding muscles are engaged by biting on a pen which effectively blocks mimicry in the lower part of the face (Oberman et al. 2007; Ponari et al. 2012). In a similar vein, blocking mimicry in the upper part of the face by drawing together two stickers placed above the eyebrows impairs the recognition of anger. Mimicry also appears useful in detecting changes in expressions. Having to identify the point at which an expression transforms from one emotion into another (e.g., happiness to sadness) proves more difficult when mimicry is blocked by holding a pen sideways between the teeth. For this task, participants who are free to mimic are quicker in spotting changes in the dynamic trajectory of facial expressions (Niedenthal et al. 2001). Furthermore, mimicry aids emotion judgments, particularly in the context of smile authenticity. Dynamic felt smiles are more easily distinguished from dynamic false ones when expressions can be freely mimicked compared to when mimicry is blocked by a mouth-held pen (Maringer et al. 2011; Rychlowska et al. 2014). Overall, those findings suggest that facial mimicry helps to make inferences about dynamic emotional faces such as emotion recognition and trajectory changes or authenticity judgments. As such, it adds to the evidence that facial motion conveys information that is essential to comprehensive expression perception, while also driving behavioral responses.

Perceptual Study on Facial Expressions

9

Neuroscientific Evidence Evidence from neuroscience suggests that differences in the processing of dynamic and static facial stimuli begin at a neural level. For example, studies of patients with brain lesions or neurological disorders point toward a dissociation in the neural routes for processing dynamic and static faces. In the most notable cases, patients who are unable to recognize emotions from static displays can easily do so from moving displays (Adolphs et al. 2003; Humphreys et al. 1993). In healthy people, dynamic facial expressions evoke significantly larger and more widespread activation patterns in the brain than static expressions (LaBar et al. 2003; Sato et al. 2004). This enhanced activation is apparent in a range of brain regions, starting with the visual area V5 which subserves motion perception. It has also been observed in the fusiform face area (FFA), a number of frontal and parietal regions, and the superior temporal sulcus (STS), areas implicated in the processing of faces, emotion, and biological motion, respectively (Kessler et al. 2011; Trautmann et al. 2009). The STS has been given particular consideration due to its involvement in interpreting social signals, in addition to biological motion. As such, enhanced activation in the STS in response to dynamic facial stimuli could be related to extracting socially relevant information (i.e., intentions) from the changeable features of the face (Arsalidou et al. 2011; Kilts et al. 2003). Additionally, in an electroencephalograph (EEG) study, attention-related brain activity was found to be greater and longer when participants observed dynamic compared to static stimuli (Recio et al. 2011). This higher activity continued throughout the duration of an expression, contributing to more elaborate processing of dynamic faces. Such enhanced and more widespread brain activation in response to facial motion could be caused by the fact that dynamic expressions are inherently more complex to process. Equally, it could derive from greater evolutionary experience with moving faces and the need to extract social meaning from them for effective communication. In this light, neurological evidence lends support to the behavioral findings. Improved recognition accuracy, sensitivity to the temporal characteristics, and the ability to make inferences about genuineness, trustworthiness, or approachability could be an effect of enhanced processing of dynamic faces. Besides phenomena of neural adaptation, there is work suggesting that brain activity while observing facial movements may encompass regions which are linked to one’s own experience of emotional states, as well as areas reported to contain mirror neurons (Dapretto et al. 2006). Initially observed in macaque monkeys, mirror neurons fire both when performing an action and when watching the action in others (Rizzolatti et al. 1996). Emotion perception may therefore be partially subserved by the mirror neuron system (i.e., premotor and parietal regions, superior temporal sulcus; Iacoboni and Dapretto 2006; Rizzolatti and Craighiero 2004) which activates an internal representation of the observed state almost as if it was felt by oneself. Supportive evidence comes from research showing that facial mimicry in response to observed expressions activates similar patterns in the brain of the perceiver (Lee et al. 2006). Also, observing an emotional experience of someone elicits

10

E.G. Krumhuber and L. Skora

corresponding subjective arousal in oneself (Lundqvist and Dimberg 1995) which is found to be stronger for dynamic than static faces (Sato and Yoshikawa 2007b). Importantly, it has been proposed that this mirror neuron system has evolved to produce an implicit internal understanding of others’ mental states and intentions (Dimberg 1988; Gallese 2001). Following from this assumption, mirroring brain activity in response to facial expressions could be the driving force behind higherorder cognitive processes such as empathy or mentalizing (Iacoboni 2009). For example, witnessing a painful expression on someone’s face and feeling pain oneself activate largely overlapping neural pathways which are correlated with regions linked to empathy (Botvinick et al. 2005; Singer et al. 2004). The ability to mimic expressions was also shown to cause greater prosocial behavior, arguably mediated by greater empathy derived from mimicry and shared activations (Stel et al. 2008). Overall, this has been taken to suggest that humans understand, empathize with, and make inferences about mental states of others because the action-perception overlap activates internal experiences of the same state (Schulte-Rüther et al. 2007).

Future Directions From the literature reviewed above, there is conclusive evidence suggesting that humans have remarkable abilities to perceive and understand the actions of others. Driven by the universal need for social connection, the efficient detection and interpretation of social signals appears essential for successful interaction. Given the rapid advances in technology, these uniquely adaptive skills are likely to be translated to a new form of social partners in the near future. With the move of computing into the social domain, nonhuman agents are envisaged to become integral parts of our daily lives, from the workplace to social and private applications (Küster et al. 2014). As a result, many interactions will not occur in their traditional form (i.e., human to human) but instead involve computer-generated avatars and social robots. In order to build animated systems that emit appropriate social cues and behavior, it is imperative to understand the factors that influence perception. Facial expressions prove to play a vital part in this process since they reveal much of a character’s emotions and intentions. While animation techniques offer more control than ever over visual elements, subtle imperfections in the timing of facial expressions could evoke negative reactions from the viewer. In 1970, Masahiro Mori described a phenomenon called the “uncanny valley” (UV) in which human-realistic characters are viewed negatively if they are almost but not quite perfectly human. As such, increased human-likeness may result in unease when appearance or behavior fall short of emulating those of real human beings. Classic examples can be found in computer-animated films, such as The Polar Express and Final Fantasy: The Spirits Within, which many viewers find disturbing due to their human-realistic but eerie characters (Geller 2008). According to Mori, this perceived deviation from normal human behavior is further pronounced when movement is added. Particularly, if the appearance is more advanced than the behavior, violated perceptual expectations could make the moving character less

Perceptual Study on Facial Expressions

11

acceptable. In line with this argument, Saygin et al. (2012) showed that androids that look human but don’t move in a humanlike (biological) manner elicit a prediction error that leads to stronger brain activity in the perceiver. Furthermore, virtual characters are more likely to be rated as uncanny when their facial expressions lack movement in the forehead and eyelids (Tinwell et al. 2011). Although the exact role of motion in the UV remains an issue of debate (see Kätsyri et al. 2015), there is increasing evidence suggesting that natural human motion positively influences the acceptability of characters, particularly those that would fall appearance-wise into the UV (i.e., zombies; Piwek et al. 2014). As such, high-quality motion has the potential to improve ratings of familiarity and humanlikeness by eliciting higher affinity (McDonnnell et al. 2012, Thompson et al. 2011). In order for natural motion to become the standard in animation, it is essential to rely on behavior representative of the real world. At the moment, databases depicting dynamic emotional expressions are still limited in the range and type of facial movements being captured. The majority of them contain deliberately posed affective displays recorded under highly constrained conditions (for a review see Krumhuber et al. in press). Such acted portrayals may not provide an optimal basis for the modeling of naturally occurring emotions. For progress to occur in the future, efforts that target the dynamic analysis and synthesis of spontaneous behavior will prove fruitful. This also includes the study of how multiple dynamic cues interact to produce a coherent percept. Only once the dynamic nature of facial expressions is fully understood will it be possible to successfully incorporate this knowledge into animation models. The present chapter underscores the importance of this task by showing that perceivers are highly sensitive to the motion dynamics in the perceptual study of facial expressions.

Cross-References ▶ Blendshape Facial Animation ▶ Real-Time Full Body (or face) Posing ▶ Video-Based Performance Driven Facial Animation

References Adolphs R, Tranel D, Damasio AR (2003) Dissociable neural systems for recognizing emotions. Brain Cogn 52:61–69. doi:10.1016/S0278-2626(03)00009-5 Ambadar Z, Cohn JF, Reed LI (2009) All smiles are not created equal: morphology and timing of smiles perceived as amused, polite, and embarrassed/nervous. J Nonverbal Behav 33:17–34. doi:10.1007/s10919-008-0059-5 Ambadar Z, Schooler JW, Cohn JF (2005) Deciphering the enigmatic face. The importance of facial dynamics in interpreting subtle facial expressions. Psychol Sci 16:403–410. doi:10.1111/j.09567976.2005.01548.x Arsalidou M, Morris D, Taylor MJ (2011) Converging evidence for the advantage of dynamic facial expressions. Brain Topogr 24:149–163. doi:10.1007/s10548-011-0171-4

12

E.G. Krumhuber and L. Skora

Bassili JN (1978) Facial motion in the perception of faces and of emotional expression. J Exp Psychol Hum Percept Perform 4:373–379. doi:10.1037/0096-1523.4.3.373 Bassili JN (1979) Emotion recognition: the role of facial movement and the relative importance of upper and lower face. J Pers Soc Psychol 37:2049–2058. doi:10.1037//0022-3514.37.11.2049 Biele C, Grabowska A (2006) Sex differences in perception of emotion intensity in dynamic and static facial expressions. Exp Brain Res 171:1–6. doi:10.1007/s00221-005-0254-0 Botvinick M, Jha AP, Bylsma LM, Fabian SA, Solomon PE, Prkachin KM (2005) Viewing facial expressions of pain engages cortical areas involved in the direct experience of pain. Neuroimage 25:312–319. doi:10.1016/j.neuroimage.2004.11.043 Bould E, Morris N (2008) Role of motion signals in recognizing subtle facial expressions of emotion. Br J Psychol 99:167–189. doi:10.1348/000712607X206702 Bould E, Morris N, Wink B (2008) Recognising subtle emotional expressions: the role of facial movements. Cognit Emot 22:1569–1587. doi:10.1080/02699930801921156 Bruce V, Valentine T (1988) When a nod’s as good as a wink: the role of dynamic information in facial recognition. In: Gruneberg MM, Morris PE, Sykes RN (eds) Practical aspects of memory: current research and issues, vol 1. John Wiley and Sons, New York, pp 169–174 Bugental DB (1986) Unmasking the “polite smile”. Situational and personal determinants of managed affect in adult-child interaction. Pers Soc Psychol Bull 12:7–16. doi:10.1177/ 0146167286121001 Cosker D, Krumhuber EG, Hilton A (2010) Perception of linear and nonlinear motion properties using a FACS validated 3D facial model. In: Proceedings of the symposium on applied Perception in graphics and visualization (APGV), Los Angeles Cunningham DW, Wallraven C (2009a) The interaction between motion and form in expression recognition. In: Bodenheimer B, O’Sullivan C (eds) proceedings of the 6th symposium on applied perception in graphics and visualization (APGV2009), New York Cunningham DW, Wallraven C (2009b) Dynamic information for the recognition of conversational expressions. J Vis 9:1–17. doi:10.1167/9.13.7 Dapretto M, Davies MS, Pfeifer JH, Scott AA, Sigman M, Bookheimer SY, Iacoboni M (2006) Understanding emotions in others: mirror neuron dysfunction in children with autism spectrum disorders. Nat Neurosci 9:28–30. doi:10.1038/nn1611 Dimberg U (1988) Facial electromyography and the experience of emotion. J Psychophysiol 2:277–282 Edwards K (1998) The face of time: temporal cues in facial expressions of emotion. Psychol Sci 9:270–276. doi:10.1111/1467-9280.00054 Ehrlich SM, Schiano DJ, Sheridan K (2000) Communicating facial affect: it’s not the realism, it’s the motion. In: Proceedings of ACM CHI 2000 conference on human factors in computing systems, New York Ekman P (1985) Telling lies. Norton, New York Ekman P, Friesen WV (1982) Felt, false, and miserable smiles. J Nonverbal Behav 6:238–252. doi:10.1007/BF00987191 Ekman P, Friesen WV (1978) Facial action coding system: a technique for the measurement of facial movement. Consulting Psychologists Press, Palo Alto Ekman P, Davidson RJ, Friesen WV (1990) The Duchenne smile: emotional expression and brain physiology: II. J Pers Soc Psychol 58:342–353. doi:10.1037/0022-3514.58.2.342 Fiorentini C, Viviani P (2011) Is there a dynamic advantage for facial expressions. J Vis 11:1–15. doi:10.1167/11.3.17 Frank MG, Ekman P, Friesen WV (1993) Behavioral markers and recognizability of the smile of enjoyment. J Pers Soc Psychol 64:83–93. doi:10.1037/0022-3514.64.1.83 Gallese V (2001) The ‘shared manifold’ hypothesis. From mirror neurons to empathy. J Conscious Stud 8:33–50 Geller T (2008) Overcoming the uncanny valley. IEEE Comput Graph Appl 28:11–17. doi:10.1109/ MCG.2008.79

Perceptual Study on Facial Expressions

13

Gibson JJ (1966) The senses considered as perceptual systems. Houghton Mifflin, Boston doi:10.1080/00043079.1969.10790296 Hehman E, Flake JK, Freeman JB (2015) Static and dynamic facial cues differentially affect the consistency of social evaluations. Pers Soc Psychol Bull 41:1123–1134. doi:10.1177/ 0146167215591495 Hess U, Kleck RE (1990) Differentiating emotion elicited and deliberate emotional facial expressions. Eur J Soc Psychol 20:369–385. doi:10.1002/ejsp.2420200502 Humphreys GW, Donnely N, Riddoch MJ (1993) Expression is computed separately from facial identity, and is computed separately for moving and static faces: neuropsychological evidence. Neuropsychologia 31:173–181. doi:10.1016/0028-3932(93)90045-2 Iacoboni M (2009) Imitation, empathy, and mirror neurons. Annu Rev Psychol 60:653–670. doi:10.1146/annurev.psych.60.110707.163604 Iacoboni M, Dapretto M (2006) The mirror neuron system and the consequences of its dysfunction. Nat Rev Neurosci 7:942–951. doi:10.1038/nrn2024 Johansson G (1973) Visual perception of biological motion and a model for its analysis. Percept Psychophys 14:201–211. doi:10.3758/BF03212378 Kamachi M, Bruce V, Mukaida S, Gyoba J, Yoshikawa S, Akamatsu S (2001) Dynamic properties influence the perception of facial expressions. Perception 30:875–887. doi:10.1068/p3131 Kappas A, Krumhuber EG, Küster D (2013) Facial behavior. In: Hall JA, Knapp ML (eds) Nonverbal communication (Handbooks of Communication Science, HOCS 2). Mouton de Gruyter, Berlin, pp 131–165 Kätsyri J, Sams M (2008) The effect of dynamics on identifying basic emotions from synthetic and natural faces. Int J Hum Comput Stud 66:233–242. doi:10.1016/j.ijhcs.2007.10.001 Kätsyri J, Förger K, Mäkäräinen M, Takala T (2015) A review of empirical evidence on different uncanny valley hypotheses: support for perceptual mismatch as one road to the valley of eeriness. Front Psychol 6:390. doi:10.3389/fpsyg.2015.00390 Kerlow IV (2004) The art of 3D computer animation and effects, 3rd edn. John Wiley and Sons, Hoboken Kessler H, Doyen-Waldecker C, Hofer C, Hoffmann H, Traue HC, Abler B (2011) Neural correlates of the perception of dynamic versus static facial expressions of emotion. GMS Psychosoc Med 8:1–8. doi:10.3205/psm000072 Kilts CD, Egan G, Gideon DA, Ely TD, Hoffmann JM (2003) Dissociable neural pathways are involved in the recognition of emotion in static and dynamic facial expressions. Neuroimage 18:156–168. doi:10.1006/nimg.2002.1323 Korb S, With S, Niedenthal PM, Kaiser S, Grandjean D (2014) The perception and mimicry of facial movements predict judgments of smile authenticity. PLoS One 9:e99194. doi:10.1371/ journal.pone.0099194 Krumhuber EG, Tamarit L, Roesch EB, Scherer KR (2012) FACSGen 2.0 animation software: generating three-dimensional FACS-valid facial expressions for emotion research. Emotion 12:351–363. doi:10.1037/a0026632 Krumhuber EG, Kappas (2005) Moving smiles: the role of dynamic components for the perception of the genuineness of smiles. J Nonverbal Behav 29:3–24. doi:10.1007/s10919-004-0887-x Krumhuber EG, Manstead ASR (2009) Can Duchenne smiles be feigned? New evidence on felt and false smiles. Emotion 9:807–820. doi:10.1037/a0017844 Krumhuber EG, Manstead ASR, Cosker D, Marshall D, Rosin PL (2009) Effects of dynamic attributes of smiles in human and synthetic faces: a simulated job interview setting. J Nonverbal Behav 33:1–15. doi:10.1007/s10919-008-0056-8 Krumhuber EG, Manstead ASR, Cosker D, Marshall D, Rosin PL, Kappas A (2007a) Facial dynamics as indicators of trustworthiness and cooperative behavior. Emotion 7:730–735. doi:10.1037/1528-3542.7.4.730 Krumhuber EG, Manstead ASR, Kappas A (2007b) Temporal aspects of facial displays in person and expression perception: the effects of smile dynamics, head-tilt and gender. J Nonverbal Behav 31:39–56. doi:10.1007/s10919-006-0019-x

14

E.G. Krumhuber and L. Skora

Krumhuber EG, Skora P, Küster D, Fou L (in press) A review of dynamic datasets for facial expression research. Emotion Rev Küster D, Krumhuber EG, Kappas A (2014) Nonverbal behavior online: a focus on interactions with and via artificial agents and avatars. In: Kostic A, Chadee D (eds) Social psychology of nonverbal communications. Palgrave MacMillan, New York, pp 272–302 LaBar KS, Crupain MJ, Vovodic JT, McCarthy G (2003) Dynamic perception of facial affect and identity in the human brain. Cereb Cortex 13:1023–1033. doi:10.1093/cercor/13.10.1023 Lee TW, Josephs O, Dolan RJ, Critchley HD (2006) Imitating expressions: emotion specific neural substrates in facial mimicry. Soc Cogn Affect Neurosci 1:122–135. doi:10.1093/scan/nsl012 Lundqvist L, Dimberg U (1995) Facial expressions are contagious. J Psychophysiol 9:203–211 Maringer M, Krumhuber EG, Fischer AH, Niedenthal P (2011) Beyond smile dynamics: mimicry and beliefs in judgments of smiles. Emotion 11:181–187. doi:10.1037/a0022596 McDonnnell R, Breidt M, Buelthoff HH (2012) Render me real? Investigating the effect of render style on the perception of animated virtual humans. ACM Trans Graph 31:1–11. doi:10.1145/ 2185520.2185587 Mori M (1970) Bukimi No Tani. The Uncanny Valley (MacDorman KF and Minato T, Trans). Energy 7:33–35 Niedenthal P, Brauer M, Halberstadt JB, Innes-Ker AH (2001) When did her smile drop? Facial mimicry and the influences of emotional state on the detection of change in emotional expression. Cognit Emot 15:853–864. doi:10.1080/02699930143000194 Oberman LM, Winkielman P, Ramachandran VS (2007) Face to face: blocking facial mimicry can selectively impair recognition of emotional expressions. Soc Neurosci 2:167–178. doi:10.1080/ 17470910701391943 Piwek L, McKay LS, Pollick FE (2014) Empirical evaluation of the uncanny valley hypothesis fails to confirm the predicted effect of motion. Cognition 130:271–277. doi:10.1016/j. cognition.2013.11.001 Ponari M, Conson M, D’Amico NP, Grossi D, Trojano L (2012) Mapping correspondence between facial mimicry and emotion recognition in healthy subjects. Emotion 12:1398–1403. doi:10.1037/a0028588 Recio G, Schacht A, Sommer W (2013) Classification of dynamic facial expressions of emotion presented briefly. Cognit Emot 27:1486–1494. doi:10.1080/02699931.2013.794128 Recio G, Sommer W, Schacht A (2011) Electrophysiological correlates of perceiving and evaluating static and dynamic facial emotional expressions. Brain Res 1376:66–75. doi:10.1016/j. brainres.2010.12.041 Rizzolatti G, Craighero L (2004) The mirror-neuron system. Annu Rev Neurosci 27:169–192. doi:10.1146/annurev.neuro.27.070203.144230 Rizzolatti G, Fadiga L, Gallese V, Fogassi L (1996) Premotor cortex and the recognition of motor actions. Cogn Brain Res 3:131–141. doi:10.1016/0926-6410(95)00038-0 Rychlowska M, Canadas E, Wood A, Krumhuber EG, Niedenthal P (2014) Blocking mimicry makes true and false smiles look the same. PLoS One 9:e90876. doi:10.1371/journal. pone.0090876 Rymarczyk K, Biele C, Grabowska A, Majczynski H (2011) EMG activity in response to static and dynamic facial expressions. Int J Psychophysiol 79:330–333. doi:10.1016/j. ijpsycho.2010.11.001 Sato W, Yoshikawa S (2004) Brief report. The dynamic aspects of emotional facial expressions. Cognit Emot 18:701–710. doi:10.1080/02699930341000176 Sato W, Yoshikawa S (2007a) Spontaneous facial mimicry in response to dynamic facial expressions. Cognition 104:1–18. doi:10.1109/DEVLRN.2005.1490936v Sato W, Yoshikawa S (2007b) Enhanced experience of emotional arousal in response to dynamic facial expressions. J Nonverbal Behav 31:119–135. doi:10.1007/s10919-007-0025-7 Sato W, Fujimura T, Suzuki N (2008) Enhanced facial EMG activity in response to dynamic facial expressions. Int J Psychophysiol 70:70–74. doi:10.1016/j.ijpsycho.2008.06.001

Perceptual Study on Facial Expressions

15

Sato W, Fujimura T, Kochiyama T, Suzuki N (2013) Relationships among facial mimicry, emotional experience, and emotion recognition. PLoS One 8:e57889. doi:10.1371/journal.pone.0057889 Sato W, Kochiyama T, Yoshikawa S, Naito E, Matsumura M (2004) Enhanced neural activity in response to dynamic facial expressions of emotion: an fMRI study. Cogn Brain Res 20:81–91. doi:10.1016/S0926-6410(04)00039-4 Saygin AP, Chaminade T, Ishiguro H, Driver J, Frith C (2012) The thing that should not be: predictive coding and the uncanny valley in perceiving human and humanoid robot actions. Soc Cogn Affect Neurosci 7:413–422. doi:10.1093/scan/nsr025 Schmidt KL, Ambadar Z, Cohn J, Reed LI (2006) Movement differences between deliberate and spontaneous facial expressions: zygomaticus major action in smiling. J Nonverbal Behav 301:37–52. doi:10.1007/s10919-005-0003-x Schmidt KL, Bhattacharya S, Delinger R (2009) Comparison of deliberate and spontaneous facial movement in smiles and eyebrow raises. J Nonverbal Behav 33:35–45. doi:10.1007/s10919008-0058-6 Schulte-Rüther M, Markowitsch HJ, Fink GR, Piefke M (2007) Mirror neuron and theory of mind mechanisms involved in face-to-face interactions: a functional magnetic resonance imaging approach to empathy. J Cogn Neurosci 19:1354–1372. doi:10.1162/jocn.2007.19.8.1354 Singer T, Seymour B, O’Doherty JP, Frith CD (2004) Empathy for pain involves the affective but not the sensory components of pain. Science 303:1157–1162. doi:10.1126/science.1093535 Stel M, Van Baaren RB, Vonk R (2008) Effects of mimicking: acting prosocially by being emotionally moved. Eur J Soc Psychol 38:965–976. doi:10.1002/ejsp.472 Thompson JC, Trafton JG, McKnight P (2011) The perception of humanness from the movements of synthetic agents. Perception 40:695–704. doi:10.1068/p6900 Tinwell A, Grimshaw M, Nabi DA, Williams A (2011) Facial expression of emotion and perception of the Uncanny Valley in virtual characters. Comput Hum Behav 27:741–749. doi:10.1016/j. chb.2010.10.018 Trautmann SA, Fehr T, Hermann M (2009) Emotions in motion: dynamic compared to static facial expressions of disgust and happiness reveal more widespread emotion-specific activations. Brain Res 1284:100–115. doi:10.1016/j.brainres.2009.05.075 Wallraven C, Breidt M, Cunningham DW, Bülthoff H (2008) Evaluating the perceptual realism of animated facial expressions. ACM Trans Appl Percept 4:1–20. doi:10.1145/1278760.1278764 Wehrle T, Kaiser S, Schmidt S, Scherer K (2000) Studying the dynamics of emotional expression using synthetized facial muscle movement. J Pers Soc Psychol 78:105–119. doi:10.1037/00223514.78.1.105 Weiss F, Blum GS, Gleberman L (1987) Anatomically based measurement of facial expressions in simulated versus hypnotically induced affect. Motiv Emot 11:67–81. doi:10.1007/BF00992214 Weyers P, Mühlberger A, Hefele C, Pauli P (2006) Electromyographic responses to static and dynamic avatar emotional facial expressions. Psychophysiology 43:45–453. doi:10.1111/ j.1469-8986.2006.00451.x Yang M, Wang K, Zhang L (2013) Realistic real-time facial expression animation via 3D morphing target. J Softw 8:418–425. doi:10.4304/jsw.8.2.418-425 Yoshikawa S, Sato W (2008) Dynamic facial expressions of emotion induce representational momentum. Cogn Affect Behav Neurosci 8:25–31. doi:10.3758/CABN.8.1.25

Utilizing Unsupervised Crowdsourcing to Develop a Machine Learning Model for Virtual Human Animation Prediction Michael Borish and Benjamin Lok

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . State of the Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CB Framework and VPF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Model Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Application and Real-Time Prediction Adjustments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Crowdsourcing Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Analysis and Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Prediction Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . User Perception . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Crowdsourcing and Expert Cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2 3 5 5 5 7 8 9 10 11 11 14 15 16 17 17

Abstract

One type of experiential learning in the medical domain is chat interactions with a virtual human. These virtual humans play the role of a patient and allow students to practice skills such as communication and empathy in a safe, but realistic sandbox. These interactions last 10–15 min, and the typical virtual human has approximately 200 responses. Part of the realism of the virtual human’s response is the associated animation. These animations can be time consuming to create and associate with each response. M. Borish (*) • B. Lok Computer and Information Sciences and Engineering Department, University of Florida, Gainesville, FL, USA e-mail: mborish@ufl.edu; [email protected]fl.edu # Springer International Publishing Switzerland 2016 B. Müller, S.I. Wolf (eds.), Handbook of Human Motion, DOI 10.1007/978-3-319-30808-1_21-1

1

2

M. Borish and B. Lok

We turned to crowdsourcing to assist with this problem. We decomposed the process of creating basic animations into a simple task that nonexpert workers can complete. We provided workers with a set of predefined basic animations: six focused on head animation and nine focused on body animation. These animations could be mixed and matched for each question/response pair. Then, we used this unsupervised process to create a machine learning model for animation prediction: one for head animation and one for body animation. Multiple models were evaluated and their performance was assessed. In an experiment, we evaluated participant perception of multiple versions of a virtual human suffering from dyspepsia (heartburn-like symptoms). For the version of the virtual human that utilized our machine learning approach, participants rated the character’s animation on par with a commercial expert. Head animation specifically was rated more natural and typically expected than other versions. Additionally, analysis of time and cost show the machine learning approach to be quicker and cheaper than an expert alternative. Keywords

Crowdsourcing • Machine Learning • Virtual Human • Animation Pipeline

Introduction One style of experiential learning in the medical domain is chat interactions with virtual humans. Virtual human interactions allow students to learn by interacting with and observing the response of a virtual human. A typical example of this style of interaction is shown in Fig. 1. Here, students type in questions to the virtual patient and receive responses. Students usually interact with multiple versions of a character that differ in certain details. A single interaction lasts 10–15 min and trains the student on appropriate questions to ask in order to diagnose a patient. To facilitate these interactions, the virtual human typically contains approximately 200 responses. Part of the realism of a virtual human response is animation. Reasonable response animations play a role in believability of a virtual human and fulfill our natural coherency model with how a human should behave. Animations also play a role in emotion and personality expression (Cassell and Thorisson 1999). However, developing animations for such a large number of responses can contribute to significant logistical, technological, and content requirements necessary to deliver an effective interaction (Triola et al. 2007, 2012). There are numerous approaches for creating the necessary animations for a virtual human. Approaches include automated and procedural algorithms, motion capture pipelines, and animation experts. All of these approaches have trade-offs when considering cost, time, and quality. We propose the use of crowdsourcing as an alternative to create a machine learning model for animation prediction. This prediction model is specifically meant for virtual humans playing the roles of patients for medical interviews.

Utilizing Unsupervised Crowdsourcing to Develop a Machine Learning Model for. . .

3

Fig. 1 Example interaction page – student interacts with a virtual patient by typing questions and receiving responses while also tracking their interview progress

We first decomposed the process of adding animations to a virtual human into a simple task that nonexperts could complete. We then leveraged nonexperts recruited from Amazon’s Mechanical Turk Marketplace to provide creative input to generate complex animations from a small, generic set of 15 basic animations. These animations then provided the basis for a machine learning model to predict future animations. In this chapter, we describe the machine learning models created via our crowdsourcing process. We then apply the prediction models to a virtual human and report on an experiment. In this experiment, participants reported improved perceptions of the virtual human using these prediction models. We also provide a detailed analysis of the prediction models separate from the experiment as well as a time and cost analysis.

State of the Art The effort and people required to construct a virtual human scenario is significant (Triola et al. 2007, 2012). One area of significant effort is the realism of virtual humans afforded through animations. Expert modelers and animators represent a

4

M. Borish and B. Lok

gold standard in providing realism as the skills these experts utilize can take many years to acquire. An expert was compared to our machine learning models, and details of this comparison will be discussed in both the Experiment and Results section. This expert has over 15 years of modeling and animation experience in the video game industry with over a dozen games to his name. While the contributions of this individual are of high quality, the effort and time provided by this expert are substantial. When faced with limited resources, various attempts to automate the creation and application of animations have been made. Relative success with procedural algorithms has been found in facial animation. Work such as Hoon et al. (2014) has shown automatic generation of facial expressions to be possible and effective. In work by Brand and Hertzmann (2000), reference motion capture clips were used as the basis of new animation synthesis. Additionally, Deng et al. (2009) presented a method of example-based animation selection and creation for virtual characters. Similarly, Min and Chai (2012) developed a methodology for procedurally generated animation via short descriptions such as “walk four steps to green destination.” In this work too, motion capture clips were used as reference. Our work builds upon a similar structure as reference animation blocks were used to construct more complex animations. However, unlike this work, our model does not rely exclusively on motion capture data or expert coding. Rather, our model could be applied to a variety of animations and all construction is handled via crowdsourcing. Generation of animation from user cues has also been explored (Cassell and Thorisson 1999; Sargin et al. 2006). While this work has been successful, typically, the user is evaluated during the interaction for behavior such as posture and intonation. While such analysis provides additional features to evaluate, we limit ourselves to text of a virtual human’s response and basic parameters of the audio. This allows our model to be used for virtual humans in large classroom settings, typical of many medical schools. In these settings, individual simulation presentation is not feasible, and a student is likely to interact via laptop in informal surroundings. In-depth analysis of audio and visual components has also been used with success. In both Marsella et al. (2013) and and Levine and Theobolt (Levine and Theobolt 2009), in-depth analysis of audio cues were evaluated to predict appropriate gestures for a virtual human response. These gestures were created using motion capture clips as a ground truth to establish timings and constraints on the gestures. Similarly, in Xu et al. (2014), audio and visual analysis was conducted to structure the gesture generation into ideational units. Ideational units are conceptual units that bind verbal and nonverbal behavior together as well as provide constraints on various attributes of interaction such as transitions and rhythm. All of this work assumes the existence of databases of motion capture information or video clips tagged by experts with a variety of potentially complex pieces of information. While our system does require annotation information, the information is simpler than in similar systems and can be provided on demand by crowdsourced workers. Additionally, this simple information can produce the same affect as an expertly animated character.

Utilizing Unsupervised Crowdsourcing to Develop a Machine Learning Model for. . .

5

CB Framework and VPF Our animation process builds upon both the Crowdsource Bootstrapping (CB) Framework (Borish and Lok 2016) and Virtual People Factory (VPF) (Rossen et al. 2010; Rossen and Lok 2012). VPF is a web-based tool for the creation and improvement of virtual humans. VPF also facilitates online virtual human interactions and has been integrated into multiple classes at several universities. VPF has been used by thousands of medical, health, and pharmacy students to practice interpersonal skills and develop diagnostic reasoning. The CB Framework is a gateway tool for VPF that allows an educator to rapidly develop a new virtual experience once a need is identified. The CB Framework decomposes the process of virtual human creation into several discrete steps that utilize crowdsourced nonexperts. The completion of these steps results in a basic virtual human corpus. The corpus is the structured set of text that comprises the knowledge of the virtual human. These stages can be completed in a matter of hours with minimal commitment from the author. With the initial stages of the CB Framework complete, our animation process can be applied as a subsequent stage in the CB Framework. Alternatively, this process can be applied to already existing virtual humans as well. However, the crowdsourcing and machine learning models are framework agnostic, and the implementation described in the subsequent section can be applied to any creation pipeline or virtual human.

Implementation In order to rapidly create animation predictions for the virtual human’s responses, several discrete steps are necessary. The dataflow outlining these steps is shown in Fig. 2. The animation predictions for each response occur before our virtual human begins an interaction. These predictions are stored as part of the virtual human’s corpus. Any additional crowdsourcing also occurs as part of this process. Once the interaction begins, real-time adjustments to the predictions are needed. First, we will describe feature selection and prediction model specifics. We will then discuss real-time adjustments needed to combat repetition in the application of the models. Lastly, we will discuss details related to the crowdsourcing task used to create the prediction models.

Model Metrics In order to create the animation prediction models, we focused on two sets of related features: sentiment and lexical similarity. We reasoned that an accurate animation choice for a specific response should be similar in both sentiment and lexical content. To facilitate these features, we utilized N-grams. N-grams are often used in NLP analysis, and bigrams have proven effective when utilized for sentiment analysis

6

M. Borish and B. Lok Highly Ranked Input Response

Input Response

Prediction Model

NLP comparison

Predicted Animation

Shuffled and Sentimentappropriate Animation Real-time Adjustment

Lowly Ranked Input Response

Crowdsourcing

Fig. 2 Example dataflow for a response that utilizes our animation prediction system and the role of crowdsourcing within it

(Wang and Cardie 2014). Since multiple sentiment metrics were used as part of the animation prediction model, we too utilized bigrams as well as unigrams as features for prediction. All feature and machine learning model analyses were carried out using Weka (Hall et al. 2009). A list of the features used is as follows: • Sentence Sentiment – overall sentiment for the entire sentence was provided as part of the crowdsourcing task. This will be discussed in more detail in section “Crowdsourcing Task”. • Bigram Sentiment – automated calculations of sentiment were provided by the Stanford NLP Sentiment Pipeline (Socher et al. 2013). Sentiment analysis consisted of five categories including very negative, negative, neutral, positive, and very positive. These five categories form a distribution of overall sentiment opinion. Further, five additional metrics were calculated. These metrics included kurtosis, skewness, minimum, maximum, and range of the sentiment distribution. These metrics were calculated to describe the overall shape and agreement of the distribution. Bigram and sentence sentiment have previously been used to create virtual human animation (Hoque et al. 2013). Many previous systems are generally concerned with facial animation resulting from specific emotions. Our system is concerned with body animation; however, like facial animation, body animation is informed in part by sentiment. • Bigram Position and Total – the position of a bigram in the sentence as well as the total number of bigrams in a sentence were included. We found bigrams near the beginning of sentences to be of relatively higher importance. Typically, crowdsourced workers would favor the beginning of sentences to assign an animation even in multi-sentence responses. This makes intuitive sense as animations would be expected to begin when the speech for a response does. Sentence and clause boundaries have also been shown to be important locations for information in a sentence including head motion (Toshinori et al. 2014). • Bigram Part of Speech (POS) – bigram POS was also included. POS tagging for individual words was provided by the Stanford CoreNLP Pipeline. Each bigram POS was an aggregation of the POS tagging of the individual words that comprise the bigram.

Utilizing Unsupervised Crowdsourcing to Develop a Machine Learning Model for. . .

7

• Bigram and Bag of Words – the actual bigram as well as a “bag of words” approach to the sentence the bigram was drawn from was included in the model. We reasoned that beyond sentiment similarity and location. Any prediction should be based on lexically similar bigrams and sentences. Thus, we broke each sentence into unigrams for a “bag of words” approach commonly used for lexical similarity. • Head Animation – this feature was only included in the body animation prediction model. Feature evaluation indicated that the head animation was the single best predictor. Thus, the body prediction model forms a small predictor chain whereby head animations are predicted first. Then, all listed features including the predicted head animation are used to predict body animation.

Application and Real-Time Prediction Adjustments The animation prediction models previously described can suffer from repetition. For example, if two subsequent responses that occur during an interaction were “No, I don’t drink.” and “No, I don’t smoke.” similar predictions of “Shake No Once” and “Arms Crossed” might be predicted. While both predictions would be correct from a machine learning perspective, the repetition would hurt user perception during the actual interaction. So, our system also adds an additional layer of animation selection logic at interaction time. This selection logic is a simple shuffled deck algorithm. A shuffled deck algorithm randomly shuffles all items into a list and iterates over that list. Then, all items are reshuffled and the process repeated. This process creates a pseudo-random selection. For head animations, the shuffled deck algorithm only shuffles animations that have the same general sentiment grouping. Body animations were simply shuffled regardless of sentiment. As will be shown in Tables 3 and 5 in the Results section, there is a clear grouping for head animations while body animations do not show the same pattern. Before an interaction begins, animation prediction is applied to each response in a virtual human’s corpus. First, each bigram in the input response has head and body animations predicted. Then, these predictions are culled. Animation predictions for bigrams at the start of sentences are prioritized due to the importance of sentence boundaries as previously explained. Once these predictions are selected, remaining time is greedily filled. Remaining time is calculated based on access to the audio file associated with the input response and the length of the animations already predicted. Additionally, no animations that extend past the end of the audio file will be suggested. This restriction is to prevent the character from performing gestures after the response to a question is complete. A similarity evaluation takes place separately and simultaneously from animation prediction. This evaluation compares the full input response to the responses used in construction of the animation prediction model. The probability that the input response is a paraphrase of any of the other responses is calculated. This comparison is exactly the same as the NLP algorithm that performs paraphrase selection during a conversation in VPF and is based on the work of McClendon et al. (2014). While

8

M. Borish and B. Lok

animation predictions will still be returned regardless of score, scores below a certain threshold are sent to the crowdsourcing task to be evaluated by workers and included for future predictions. In this way, we increase the size of the prediction model whenever a lexically dissimilar response is encountered during the prediction process.

Crowdsourcing Task As previously mentioned, whenever a lexically dissimilar response is encountered during the prediction process, animation predictions are still provided. However, that input response is sent to a crowdsourcing task for future inclusion in the prediction model. The interface for the crowdsourcing task is shown in Fig. 3. Here, at the top, workers are shown a simple unity scene with the character. Below the scene, workers are presented with the current question/response pair for which animation assignment will take place. The words that comprise the response are spread across two timelines: one for head and one for body animations. These timelines represent the length of the audio and all animations are sized proportional to this length for each question/response pair. Proportional sizing of the animations was done so that animation length would be commensurate with the speech and not be assigned or playing for a significant period of time afterwards. In this area, there is also a button to play the entire timeline. This button allows workers to see how their animation selections appear in context with the audio and facial animations that the virtual character normally has for the given response. Workers can also click on any of the individual animations to have the virtual character perform a single, simple animation in order to better visualize the resulting timeline selection. On the bottom right are the animation lists and each animation can be dragged to the corresponding timeline. For example, in Fig. 3, for the question/response pair of “Can you read the lines?” and “I can’t read any of the lines.” a worker might drag a “Shake No Once” and “Hand Gesture” animation to the head and body timelines respectively. By providing a defined set of animations, overall task difficulty can be reduced. Low task difficulty leads to lower variability in responses by workers and overall better quality data (Callison-Burch and Dredze 2010). Additionally, workers can be confident they are suggesting an animation that is worthwhile since a majority of question/response pairs should be covered by the predefined set of generic animation building blocks. Further, confidence of the workers also plays a role in the quality of data that is provided (Madirolas and de Polavieja 2014). Once animation selections are complete, workers were also asked to provide a rating for the overall sentiment of the response. Worker choices were limited to negative, neutral, and positive. The ultimate sentiment of a response was determined by simple majority vote. This vote occured after at least three workers provided animation and sentiment selections for a question/response pair.

Utilizing Unsupervised Crowdsourcing to Develop a Machine Learning Model for. . .

9

Fig. 3 Crowdsourced worker task page – workers suggest animations for a question/response pair from a predefined, base set

Compensation payment for this task was set at $.40 a Human-Intelligence Task (HIT) on Amazon’s Mechanical Turk service and workers were asked to complete five question/response pairs. The HIT compensation was selected at this level due to previous pilot study findings related to the effect of payment on quality and completion time. Paying workers additional money does not improve the overall quality of data as there has been no link found between compensation and quality of work (Mason et al. 2009). As a result, keeping compensation reasonable will result in a lower overall cost for the task without a detriment to quality. In contrast, low compensation reduces the incentive by mainstream workers to complete your task. Additionally, high compensation as Adda et al. (2013) note can attract malicious workers that game the task in order to maximize payment and thus reduces data quality.

Analysis and Experiment We conducted an experiment to evaluate user perception of a virtual human utilizing these head and body animation prediction models. Four versions of a virtual patient suffering from dyspepsia (heartburn-like symptoms) were created. The first version is our control and consisted of a virtual patient whose only animation was lip syncing and an idle “breathing” animation. This version will be referred to as Dcontrol. The

10

M. Borish and B. Lok

second version added response animations randomly selected from the set of possible animations supplied to crowdsourced workers. This version will be Drandom. The third version had animations suggested by two predictive models: one for head animation and one for body animation as described in the last section. This version will be referred to as DML. Finally, DPro was created by an animation expert. This expert was provided the character and audio files and was allowed to animate the character as he saw fit. This animator regularly produces animations for commercial projects and serves as a gold standard comparison. We also conducted an in-depth analysis of the predictive models in isolation as well as a comparison of cost/time for the crowdsourcing process and expert animator. These will be presented separately.

Procedure Participants were recruited and paid US$1.00 to view a video showing a previous interaction between a student and one of the virtual humans previously described. Participants then filled out a survey. Previous interaction logs were used to create the interaction and the same interaction was used for all four versions of the dyspepsia virtual human. The video was approximately 5 min in length. The survey consisted of questions based on assessments for naturalness of virtual human interactions (Huang et al. 2011; Ho and MacDorman 2010). The questions were answered on a 7-point scale from 1- Not at all to 7- Absolutely and were as follows: Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9

Do you think the virtual agent’s overall behavior is natural? Do you think the virtual agent’s overall behavior is sincere? Do you think the virtual agent’s overall behavior is typical? Do you think the virtual agent’s head behavior is natural? Do you think the virtual agent’s head behavior is sincere? Do you think the virtual agent’s head behavior is typical? Do you think the virtual agent’s body behavior is natural? Do you think the virtual agent’s body behavior is sincere? Do you think the virtual agent’s body behavior is typical?

The participants also described the virtual human on a number of attributes. Each attribute was on a 7-point binary scale. Those attributes were: Q10 Q11 Q12 Q13 Q14 Q15 Q16

Artificial – Natural Synthetic – Real Human-made – Humanlike Mechanical – Biological movement Predictable – Thrilling Passive – Animated Smooth/graceful – Sudden/jerky movement

Utilizing Unsupervised Crowdsourcing to Develop a Machine Learning Model for. . .

11

Results and Discussion In total, N = 89 participants were recruited and were distributed as follows: N = 16 viewed Dcontrol, N = 23 viewed Drandom, N = 25 viewed DML, and N = 25 viewed DPro.

Prediction Model The animations used in the prediction models were taken from previous virtual humans. A simple description of the animations is provided below. Animations A–F represent the head animations, while animations G–O represent the body animations. A B C D E F G H I J K L M N O

NodYesOnce – Single head nod NodYesTwice – Multiple head nods TiltHeadLeft – Head tilt looking toward the left side TiltHeadRight – Head tilt looking toward the right side ShakeNoOnce – Single head shake ShakeNoTwice – Multiple head shakes HandsInLap – Both hands are placed in the lap ArmsSweepOut – Arms progress from low to high position sweeping out from the body ArmsCrossed – Both arms are crossed in front of body HandFlickGestureA – Both hands are raised and in motion in front of the body. The animation ends with the hands flicking away from the body HandFlickGestureB – Similar to the previous gesture, however, the length of arm motion is shortened and the flick is less pronounced ScratchHeadLeft – Left hand is used to scratch the head ScratchHeadRight – Right hand is used to scratch the head HandGesture – Arms are up in front of the body at alternate times Shrug – Simple shoulder shrug

Multiple models were evaluated using both test sets created from 10 % of the data set and 10-fold cross-validation. The head prediction model contained 992 entries, while the body prediction model contained 939. Due to the relatively small size of the test sets, we present 10-fold cross-validation as more representative of performance. The overall accuracy of several different models is shown in Table 1. Ultimately, we settled on the use of a Bayesian network. The Bayesian net outperformed simpler models and was also on par with more computationally expensive models. We also investigated how often specific animations were applied by crowdsourcing workers according to the sentiment of the response. Tables 2 and 3 show the confusion matrix and sentiment distribution for the head prediction model. Tables 4 and 5 show the confusion matrix and sentiment distribution for the body prediction model.

12 Table 1 Model accuracies – percentage accuracies for a selection of different machine learning models for head and body animation

M. Borish and B. Lok

Head model 42.9 37.8 48.4 21.7

J48 Naive Bayes Bayesian Net Multilayer Preceptron

Table 2 Head confusion matrix

A 131 77 20 10 0 0

Table 3 Sentiment of head animations according to crowdsourced workers (all values are percentages)

Animation A B C D E F

Table 4 Body confusion matrix

G 32 16 27 15 10 12 7 35 12

B 37 22 7 6 0 0

H 58 63 50 26 14 18 13 28 31

C 48 19 157 110 10 11

D 1 0 0 0 0 0

Negative 1.0 1.0 8.6 12.4 99.4 94.2

I 0 1 1 0 2 0 1 1 1

J 0 0 0 0 0 0 0 0 0

K 0 0 0 0 0 0 0 0 0

Body model 16.4 14.8 19.8 16.6

E

F 1 0 4 3 15 17

1 0 16 13 153 103

Neutral 20.4 14.4 78.0 76.1 0.6 5.8

L 0 0 0 0 0 0 0 0 0

M 0 0 0 0 0 0 0 0 0

A B C D E F

Positive 78.6 84.6 13.4 11.5 0 0

N 45 25 21 24 14 16 7 42 13

O 38 28 34 26 16 28 10 35 43

G H I J K L M N O

As can be seen from the confusion matrix in Table 2, the head prediction is reasonably accurate overall and achieves an accuracy of 48.4 %. This is significantly better than a random chance of 16.7 % for a classification with six alternatives. Further, when the prediction model is incorrect, it is generally incorrect in a reasonable way. Most of the errors in the confusion matrix are with interchangeable animations. For example, if the participant were to ask “Do you do any drugs?” and receive a response of “No, I don’t do any illegal drugs.” then the typical response would include an animation in which the virtual human is shaking its head no. Whether the model predicts shaking the head no once or multiple times, both

Utilizing Unsupervised Crowdsourcing to Develop a Machine Learning Model for. . . Table 5 Sentiment of body animations according to crowdsourced workers (all values are percentages)

Animation G H I J K L M N O

Negative 36.5 50.8 37.5 26.3 31.3 28.0 36.1 21.4 34.4

Neutral 25.8 22.7 29.7 40.0 29.2 40.0 36.1 40.5 46.9

13 Positive 37.7 26.6 32.8 33.8 39.6 32.0 27.8 38.2 18.8

are reasonable. The same holds true for the other head animations. The interchanging of certain animations is also shown in the sentiment distribution. There are three clear groupings in the distribution as assigned by crowdsourced workers. The highly skewed distributions indicate agreement among workers as to when certain head animations are expected based on the sentiment of the response. Indeed, feature evaluation shows that worker assigned sentiment is the best predictor of head animation out of all features. While the head animation prediction model showed clear patterns and groupings, the same is not true for the body prediction model. The confusion matrix and sentiment distributions are shown in Tables 4 and 5, respectively. As can be seen in Table 4, the overall accuracy is lower than the head prediction model and is around 20 %. This is still higher than random chance at 11 %, but several of the animations are simply never predicted. Alternative simpler models such as naive Bayes and J48 do capture some of these predictions; however, the overall accuracy of these models is even lower. Again, similar to head prediction, more complex models did not produce an increase in overall accuracy. The reason for this performance becomes clearer when looking at the sentiment distribution for body animations. As shown in Table 5, the sentiment is roughly evenly distributed for each of the body animations. The body animations do not show any clear groupings that were evident for head animations. This relatively equal use of a body animation regardless of response sentiment indicate crowdsourced workers could not truly arrive at an agreement on what a “correct” body animation was for a given response. Our use of a neutral medical interview scenario could be one factor. Many of our virtual humans such as the dyspepsia scenario are meant to allow students to practice basic interviewing skills. The students are learning what questions to ask and how to gather information. A typical example of this interview would be an exchange similar to the following: “Do you do any drugs?”, “No, I don’t do any illegal drugs.”, “Do you drink at all?”, “Yeah, I have a couple of beers a week.” For both of these responses, there are expected head behaviors. The first response would have some variation of shaking no, while the second response would contain some variation of nodding yes. However, what is the “correct” body animation?

14

M. Borish and B. Lok

In such an example, a patient might simply need to be animated to be believable without any pattern. An alternative scenario such as revealing a diagnosis of cancer may contain emotional moments where specific body motions would be expected and represents one avenue of future work. Another factor might have been the coarse grained nature of control provided to crowdsourced workers. Workers were allowed only to place animations on the appropriate timeline. However, there are multiple ways to provide more fine-grained control. Two examples would be specific gaze targets to accompany the head animation and speed control to allow fine tuning of an animation. With additional controls, a more notable pattern to the body animations might present itself. The expansion of the controls is still another opportunity for future investigation.

User Perception An ANOVA was calculated for each survey response, Q1–Q16, listed in the previous section. For each question for which statistical significance was found, a Tukey analysis was conducted. While the ANOVA was significant for questions Q7, Q8, Q10, Q11, Q13, Q14, and Q15, the Tukey analysis did not show any interesting results. For all these questions, statistical significance for a difference in means was between Dcontrol and one of the other models. This makes sense as any type of animation would increase user perception over a virtual human with no animation. Tukey analysis showed results of interest for Q4 and Q6. These questions asked whether or not the virtual human head behavior is natural or typical, respectively. The results are summarized in Table 6. Users found the head animation of DML more natural and more typical than both DControl and DRandom. Further, DML was found to be comparable to DPro. However, no difference was found for the overall or body animations. These differences make sense in context of the model analysis from the previous section. With clear sentiment groupings and patterns for the head prediction model, crowdsourced workers generally agree that there exists a “correct” head animation for specific responses. This is reflected in the improved ratings for the virtual human. Additionally, the machine learning models produced from the crowdsourced workers produced the same affect as an expert animator. As we will discuss in the next section, these models were created cheaper and faster than the expert version. These savings make this an attractive alternative. In contrast, the body animation model consisted of animations whose sentiment was evenly distributed across all categories. There were no clear patterns and crowdsourced workers could not agree on a “correct” body animation for a response. This lack of agreement aligns with user perception as body predictions for DML were not perceived any differently from the other versions. As participant perception shows, appropriate animation for a virtual human’s response is important. In the medical domain, realistic virtual human interactions are an increasingly used educational component. This realism is directly effected by the appropriate choice of head animation and a reasonable choice for body animation.

Utilizing Unsupervised Crowdsourcing to Develop a Machine Learning Model for. . .

15

Table 6 User perception results with significant Tukey analysis Question Do you think the virtual agent’s head behavior is natural?

FAnova 7.593

pAnova .000

Do you think the virtual agent’s head behavior is typical?

7.194

.000

Tukey significance DMLDpro = .062 DMLDRandom = .025 DMLDControl = .000 DMLDPro = .105 DMLDRandom = .049 DMLDControl = .000

Mean DControl = 3.50 DRandom = 4.360 Dml = 5.38 DPro = 4.48

SD DControl = 1.50 DRandom = 1.15 D ml = 1.01 DPro = 1.32

DControl = 3.44 DRandom = 4.40 Dml =5.45 DPro = 4.56

DControl = 1.75 DRandom = 1.29 Dml = 0.93 DPro = 1.50

Additionally, the animations predicted by our machine learning approach produce the same affect as an expertly animated character.

Crowdsourcing and Expert Cost As shown in Table 7, the cost and time per response for the machine learning model construction is much lower than the use of an expert. The model construction required 15.75 h of effort and cost $75.00, while the expert animator required 25.75 h of effort and $901.25 to complete their work. While time will always be variable in a crowdsourced approach, the construction effort occurred over an approximately 24 h period. We are confident that 24–48 h is plenty of time to accomplish model construction in the future. The expert required one week to accomplish animation of the responses. Additionally, the machine learning models covered a larger breadth of responses and was constructed from 314 responses, while the expert animator worked on only 34 responses. The animator focused on these 34 responses because those were the responses necessary for the 5 min video shown to participants. This creates a cost and time per response as shown in Table 7. Based on these results, if every response in the 314 responses used in the machine learning model required a unique animation, an expert animator would require approximately six weeks of full-time work and $8,000.00 to complete the work. The machine learning models utilized for DML have several clear advantages. The models cover a larger breadth of responses and were evaluated much more quickly. Further, while there will certainly be some reuse of the animations created by the expert, the animations were not intended to be generic and reusable. These results suggest a shift in the role of expert animators from covering all responses to covering only necessary responses. As previously described, our system performs a similarity comparison. This comparison is a paraphrase identification algorithm. This algorithm determines if a particular response is a paraphrase than any

16

M. Borish and B. Lok

Table 7 Time and cost estimates ML model construction Animation expert

Time per response (minutes) 3 45

Cost per response (US dollars) $0.24 $26.51

other response previously encountered. For those responses that are scored low, an expert could create the necessary animation. Alternatively, different approaches that specifically address unfamiliar content could be used. For responses that score highly, the expert’s time is better spent elsewhere as the machine learning models can produce the same affect quicker and cheaper. These highly scoring responses are likely to make up the bulk of any virtual human corpus for a chat interaction in the medical domain. As mentioned at the beginning of this chapter, these virtual humans have several hundred responses but contain numerous similarities as multiple versions of the same scenario are often created. For example, most virtual humans would be asked whether or not they smoke, drink, or do drugs. Expert animators’ time can be utilized more efficiently by targeting information that requires their attention rather than these common responses. Our crowdsourcing approach also offers the benefit of continuous improvement. Whenever an unfamiliar response is encountered, regardless of whether an animation expert provides a new animation, the crowdsourcing algorithm can quickly incorporate new information on demand. The crowdsourcing algorithm can send the response to workers who provide timeline selections and sentiment scores. Our system also avoids requiring experts to provide complex information tagging as in related systems previously described.

Limitations Our prediction models have been applied to virtual humans in the medical domain for a dialogue type interaction with a relatively short conversation time of 10–15 min. While this process would generalize to similar dialogue domains, other types of interactions such as instruction or open-ended conversations may not be applicable. Such interaction styles may not have the same type of information overlap present here and may suffer from repetition from this model, even with the mitigation described. However, this does present an opportunity for future improvement as crowdsourced workers could be given a larger set of animations to work with and additional controls. This does need to be balanced with the increase in task difficulty and the potential decrease in agreement among workers. Our prediction models are also bias based on the interaction context and domain. Virtual human chat interactions in the medical domain are meant to teach students how to ask questions and retrieve information. A majority of these questions have muted emotional context as they are usually matter of fact. When emotional context is involved, the context is skewed negative toward sad emotions as the virtual humans are suffering from some medical issue. These biases must be accounted

Utilizing Unsupervised Crowdsourcing to Develop a Machine Learning Model for. . .

17

for and the models trained on appropriate data if the domain differs from what is described here.

Future Directions Our process has demonstrated the unsupervised creation of an animation prediction model via crowdsourced workers to be a viable alternative to more resource intensive creation methods. A virtual human whose animations are assigned by our prediction models has head animation that is regarded as more natural and typical by participants. Based on crowdsourced workers input, any reasonable body animation will suffice as no difference in perception occured. This was in alignment with the data collected for the machine learning model that did not find a clear agreement on what constitutes a “correct” body animation for a general virtual human response. The virtual human animated using our prediction model produces similar affect to a version of the virtual human animated by an expert. Importantly, this crowdsourced model is cheaper and faster to create, and can also be updated on demand through the use of additional crowdsourcing. These benefits highlight the need to refocus experts only on the information that requires their attention while leaving mundane responses to be animated automatically by our models. By freeing experts from repetitious work, our system aims to reduce the barrier to creation to allow virtual humans to be utilized more widely in medical education. We intend to pursue this research with additional studies. One such study we are planning is expansion of the crowdsourcing task. With additional controls such as animation speed and gaze targets, additional patterns may present themselves. We also intend to continue development of the body prediction model. As mentioned, specific emotional events or scenarios may play a role in the “correct”-ness of body animation, and we plan to investigate such cases with models specifically tuned to those situations. Additionally, we plan to further investigate whether other features may be required to identify an already existing pattern in body animations.

References Adda G, Mariani J, Besacier L, Gelas H (2013) Economic and ethical background of crowdsourcing for speech. In: Crowdsourcing for speech processing: applications to data collection, pp 303–334 Borish M, Lok B (2016) Rapid low-cost virtual human bootstrapping via the crowd. Trans Intell Syst Technol 7(4):47 Brand M, Hertzmann A (2000) Style machines. In: 27th SIGGRAPH, pp 183–192 Callison-Burch C, Dredze M (2010) Creating speech and language data with Amazons Mechanical Turk. In: Proceedings of the NAACL HLT 2010 workshop on creating speech and language data with Amazon’s Mechanical Turk, number June, pp 1–12 Cassell J, Thorisson KR (1999) The power of a nod and a glance: envelope vs. emotional feedback in animated conversational agents. Appl Artif Intell 13(4–5):519–538

18

M. Borish and B. Lok

Deng Z, Gu Q, Li Q (2009) Perceptually consistent example-based human motion retrieval. In: Interactive 3D graphics and games, vol 1, pp 191–198 Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software. ACM SIGKDD Explor Newsl 11(1):10 Ho C-C, MacDorman KF (2010) Revisiting the uncanny valley theory: developing and validating an alternative to the Godspeed indices. Comput Hum Behav 26(6):1508–1518 Hoon LN, Chai WY, Aidil K, Abd A (2014) Development of real-time lip sync animation framework based on viseme human speech. Arch Des Res 27(4):19–29 Hoque ME, Courgeon M, Mutlu B, Picard RW, Link C, Martin JC (2013) MACH: My Automated Conversation coacH. In: Pervasive and ubiquitous computing, pp 697–706 Huang L, Morency LP, Gratch J (2011) Virtual rapport 2.0. In: Intelligent virtual agents, pp 68–79 Levine S, Theobalt C (2009) Real-time prosody-driven synthesis of body language. ACM Trans Graph 28(5):17 Madirolas G, de Polavieja G (2014) Wisdom of the confident: using social interactions to eliminate the bias in wisdom of the crowds. In: Collective intelligence, pp 2012–2015 Marsella S, Lhommet M, Feng A (2013) Virtual character performance from speech. In: 12th SIGGRAPH/Eurographics symposium on computer animation, pp 25–35 Mason W, Street W, Watts DJ (2009) Financial incentives and the performance of crowds. SIGKDD 11(2):100–108 Mcclendon JL, Mack NA, Hodges LF (2014) The use of paraphrase identification in the retrieval of appropriate responses for script based conversational agents. In: Twenty-seventh international flairs conference, pp 19–201 Min J, Chai J (2012) Motion graphs++. ACM Trans Graph 31(6):153 Rossen B, Lok B (2012) A crowdsourcing method to develop virtual human conversational agents. IJHCS 70(4):301–319 Rossen B, Cendan J, Lok B (2010) Using virtual humans to bootstrap the creation of other virtual humans. In: Intelligent virtual agents, pp 392–398 Sargin ME, Aran O, Karpov A, Ofli F, Yasinnik Y, Wilson S, Erzin E, Yemez Y, Tekalp AM (2006) Combined gesture-speech analysis and speech driven gesture synthesis. In: Multimedia and Expo, number Jan 2016, pp 893–896 Socher R, Perelygin A, Wu JY, Chuang J, Manning CD, Ng AY, Potts C (2013) Recursive deep models for semantic compositionality over a sentiment treebank. In: EMNLP, p 1642 Toshinori C, Ishiguro H, Hagita N (2014) Analysis of relationship between head motion events and speech in dialogue conversations. Speech Comm 57:233–243 Triola MM, Campion N, Mcgee JB, Albright S, Greene P, Smothers V, Ellaway R (2007) An XML standard for virtual patients: exchanging case-based simulations in medical education. In: AMIA, pp 741–745 Triola MM, Huwendiek S, Levinson AJ, Cook DA (2012) New directions in e-learning research in health professions education: report of two symposia. Med Teach 34(1):15–20 Wang L, Cardie C (2014) Improving agreement and disagreement identification in online discussions with a socially-tuned sentiment lexicon. In: ACL, vol 97, p 97 Xu Y, Pelachaud C, Marsella S (2014) Compound gesture generation: a model based on ideational units. In: IVA, pp 477–491

Clinical Gait Assessment by Video Observation and 2D Techniques Andreas Kranzl

Abstract

Observational gait analysis, in particular video-based gait analysis, is extremely valuable in the daily clinical routine. Certain requirements are necessary in order to be able to perform a high-quality analysis. The walking distance must be sufficiently long enough (dependent on the type of patient), the utilized equipment should meet the requirements, and there should be a recording log. The quality of the videos for evaluation is dependent on the recording conditions of the video cameras. Exposure time, additional lighting, and camera position all need to be adjusted properly for sagittal and frontal imaging. Filming the video in a room designated for this purpose will help to ensure constant recording conditions and quality. The recordings should always be carried out based on a recording log. The test form can act as a guide for the evaluation of the video. This provides an objective description of the gait. It is important to always keep in mind that the evaluation must remain subjective to a certain degree. Based on the gait parameter, the reproducibility of this value (intra- and inter-reliability) is moderate to good. In addition to a database function, current video recording software is able to measure angles and distances. It should also be possible to play back two videos in parallel, in order to, for example, play back both the presurgical and postsurgical gait simultaneously. Despite the implementation of three-dimensional measurement systems for gait analysis, observation or videosupported gait analysis is justified in daily clinical operations. Keywords

Database • Observational gait • Reliability • Room size • Video camera

A. Kranzl (*) Laboratory for Gait and Human Motion Analysis, Orthopedic Hospital Speising, Vienna, Austria e-mail: [email protected] # Springer International Publishing AG 2017 B. Müller, S.I. Wolf (eds.), Handbook of Human Motion, DOI 10.1007/978-3-319-30808-1_24-1

1

2

A. Kranzl

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . State of the Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Observational Gait Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Treadmill . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Patient Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Structured Analysis of the Gait . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reliability of the Evaluation of Observational Video-Based Gait Analysis . . . . . . . . . . . . . . . . . . . . Video Recording Log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Clinical Use of 2D Motion Capture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Database Software and Capturing Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion/Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2 2 3 9 9 10 10 11 12 12 13 14 15

Introduction Apart from pure observational gait analysis (optical), recording of gait with a video camera is the most common type of analysis. Modern video cameras for the consumer market are readily available and fairly inexpensive. Recording is mostly performed directly in the camera or stored on a PC using suitable software. Advantages of video recording are being able to replay the video imaging several times, the pause option and slow motion option. Video recording is also easy to use. Video recordings are often also used additionally to three-dimensional motion capture systems, on the one hand, to visually document how the patient walks and, on the other hand, to document movements that the implemented biomechanical model does not image in the analysis. Size of the room, position of the camera, lighting in the room, marking of the walking distance, and the recording log are all important for producing high-quality video for gait evaluation.

State of the Art Video recordings are used in the clinical movement analysis laboratory as well as in other institutions such as medical offices and physiotherapy offices, to evaluate movement processes (in particular in walking). The use of gait analysis as an additional component in treatment plans is uncontested. Recordings are usually carried out using two video cameras (sagittal and frontal planes). The advantage of video recordings without additional measurement instruments is that the patient is not burdened with additional material (markers, measurement equipment, etc.). Video recordings are stored in a database and are therefore easy to find. This allows simple comparisons to be carried out before and after therapy.

Clinical Gait Assessment by Video Observation and 2D Techniques

3

Fig. 1 Room for video recording Optimal use of a room 6 m wide and 10 m long is shown in Fig. 1. The walking distance is color coded (gray) and cross lines help determine stride length

Observational Gait Analysis Observation of gait analysis has an invaluable role to play in clinical routine. However, human vision can only observe a frequency of 12–18 Hz. Due to this low resolution, not all movement details can be recognized with the human eye. A further problem is that gait disturbances not only occur in one plane of the joint, but in numerous planes and numerous joints simultaneously. The recording of gait with a video camera significantly simplifies analysis. With the possibility of watching the video several times and at various speeds, a detailed analysis of individual joints and planes is possible. A video analysis can be carried out quickly; however, preparations are necessary in order to perform a high-quality video analysis. In order to obtain a high-quality recording, a suitable recording room is required and certain technical requirements need to be fulfilled for the camera. It is not always easy to find a room of suitable size. For recording the sagittal plane, the room should be at least 10 m long to ensure sufficient room for walking, enabling the patient to walk at their normal speed with enough steps recorded for evaluation of the gait (Fig. 1). This walking distance is sufficient for patients with mild gait disturbances; the distance can be shorter for patients with more severe gait disturbances or for children, due to the shorter step length. For better guidance of the patient, it is useful to mark the walking distance in a color that contrasts from the rest of the floor (Fig. 1). The start and end of the walking distance should be marked for the patient. Subtle lines or predefined distances along the walking distance help determine step length in the subsequent

4

A. Kranzl

Fig. 2 Treadmill wide angle Due to the size of the room, a video camera cannot be positioned far enough from the treadmill to be able to see the entire patient in the image (image within the red section), whereas a wide angled lens allows imaging of the entire patient in the same position. The camera position in the left image is oriented at exactly 90 to the axis of movement, the central image is positioned at an angle of 30 to the patient, and the right image at 45 . The measured knee angles are 30 , 28 , and 25 , although the recordings were performed simultaneously with three cameras. These different angle measurements are caused by parallax error. It is necessary to be aware of this error if the patient is not centered in the image

evaluation by video. The spatial depth of the room is important for the lateral recording. The spatial depth of the room is defined by the number of double steps as well as by the camera optics. In the sagittal recording, in order to perform an adequate evaluation, the entire patient needs to be seen, from head to toe. The camera should be positioned at a right angle to the walking distance. It is possible to have the camera follow the patient; however, this can create difficulties in the measurement of joint positions and distances. Parallax error can lead to one or more errors in the determination of the joint angles. Therefore, care is to be taken to ensure that the planes of movement are at a right angle to the camera. For evaluation of joint positions, the view of the camera should be positioned centrally on the patient and not at an angle. The evaluation of joint angles in the border areas of the video image is only possible to a limited extent due to parallax error. A wide-angle lens can be of assistance in a room with insufficient depth (Fig. 2). A wide-angle lens allows video recordings to be carried out in rooms with limited room depth. It should be noted though that distortion can occur in the border areas of the video. Depending on the quality of the wide-angle lens, these can be more or less pronounced. For frontal recordings, it is necessary to ensure that the patient is in the center of the image. For all recordings, the patient should be imaged as large as possible. This can be performed with the zoom function of the video camera. The video format 16:9

Clinical Gait Assessment by Video Observation and 2D Techniques

5

Fig. 3 Patient recorded with three cameras simultaneously Fig. 4 Accompanying camera

also has the advantage that the patient can be recorded in portrait layout. It is also possible to record numerous gait sequences in the sagittal recording with the wider video image. In some centers, the side camera is mounted on a motorized or manually operated track system (Fig. 4). This has the advantage that the camera can follow the patient at a 90 angle and thereby record numerous gait sequences. Numerous gait sequences can also be recorded by moving the camera, but these extra gait sequences are more difficult to evaluate due to parallax error (Fig. 3). A camera that accompanies the patient is shown in Fig. 4. This camera position allows a right-angled view of the patient. The camera is mounted on a track system and is controlled with a motion capture system. It is useful to record the frontal image of the patient in full size, from the pelvis downward and from the knees downward (focus ankle joint). The detailed view allows a more precise evaluation of movements. This can be performed either via the positioning of the camera on a height adjustable tripod or on a height adjustable wall

6

A. Kranzl

Fig. 5 Optimized view of the relevant segments Fig. 6 Camera position for frontal recordings With a height-adjustable frontal camera, an optimized position can be achieved to be able to image the relevant body segments as large as possible. From left to right: entire body, pelvis downward, and foot area (Fig. 5)

mount (manual or electric) (Fig. 6). In general, it is useful to attach the video camera to a tripod, which allows optimal horizontal positioning. A fixed position on the wall is even better, so that the camera position is the same for all recordings and does not need to be adjusted or checked for each recording. In the left of the image (Fig. 6), a manual height-adjustable camera can be seen, which is focused on the walking distance of the instrumented 3D gait analysis. On the right, a camera can be seen which is automatically height adjusted with the PC and focused on the walking distance for the video recording. Apart from the room measurements, lighting conditions and the color of the floor and walls are also important. There should be no interfering light from windows,

Clinical Gait Assessment by Video Observation and 2D Techniques

7

Fig. 7 Influence of illumination on the imaging quality

since this can lead to lighting problems. Furthermore, the background of the video recording should have calm colors. Equipment that is standing around should also not be visible in the video. The room lighting should be suitable for evening lighting from above. The room should be evenly illuminated. The light source should not flicker, as especially in high-resolution video recording, a flickering light can be very disturbing. Conventional video cameras have an automatic feature that adjusts exposure time and shutter, depending on available light. For high-quality recordings, however, a manual adjustment of exposure time is useful. Exposure time should be as short as possible without the video image being too dark (Fig. 7). Illumination of 4000 lx or more is useful. Additional lighting, which is appropriate for the camera,

8

A. Kranzl

can be helpful to improve recording conditions. It is necessary to ensure, however, that no reflections on the skin are created with the additional lighting. For a patient with moderate walking speed on a treadmill, exposure time and additional lighting will vary. Top left: automatic setting of the camera without additional lighting, top right: Automatic setting of the camera without additional lighting, left center: exposure time 1/100 with additional lighting, right center: exposure time 1/500 with additional lighting, bottom left: exposure time 1/1000 with additional lighting. It can be seen that the exposure time of 1/500 with additional lighting provides the clearest image; a shorter exposure time creates an image that is too dark. Some centers also perform video recordings from above and/or from below in order to better record transversal movement in the gait. The transversal view, however, does not provide imaging of all body parts while walking. From below, the feet block the pelvis; from above, the upper body is well recognized; but from the pelvis downward, there is almost no visibility. In order to evaluate femur rotation, marking the patella is useful. The foot angle is evaluated from behind in the frontal video image. The sagittal and frontal video recordings are performed either sequentially or simultaneously. The advantage of a simultaneous video is that gait events can be viewed in the frontal and sagittal video recordings at the same time. Videos can also be compiled to a single video image (split screen) or videos can be recorded in parallel, synchronized on a computer screen. Most video cameras have a built-in storage option for recording the video. This has the disadvantage that it can be difficult to find the recording of the video. It is better to record directly to the computer via suitable recording software. This should include a database function so that patient videos can be found quickly and without difficulty. The connection of the video camera to the computer depends on the requirements of the camera. Common connections to the computer are USB or HDMI or the component connection. The FireWire/IEEE 1394 connection is seldom found in video cameras today, if at all. If possible, when using two or more video cameras, care should be taken that the hardware can be synchronized. If the hardware cannot be synchronized, at least the software should allow subsequent synchronization. Technical requirements for a video camera: • • • • • • •

Lens size Manual adjustment of shutter and illumination Automatic and manual focus 16:9 image sensor ratio Optical zoom USB or HDMI or component connection Recording frequency of 50 Hz or more depending on speed of motion

Clinical Gait Assessment by Video Observation and 2D Techniques

9

The costs for a video camera range from € 200 to € 1500, depending on the quality. A higher purchase price for a high-quality camera is quickly reimbursed by the evaluation of the video recordings. High-quality cameras usually have a large lens and sensors, which means better imaging quality with the same light conditions. The recording frequency should be at least 25 images per second for gait evaluation. For running analyses, the recording speed should be at least 100 Hz. Most operating systems have simple software for video recording on the computer. There is, however, usually a limitation in playback options. It is better to purchase suitable software for video recordings. There are numerous manufacturers that have developed such software. The requirements for software include the possibility of recording one or more video sources simultaneously, as well as a database function. The following points are important for playback: Real time, slow motion, exact focusing of frozen images, as well as moving forward and backward in the frozen images. To playback numerous videos for comparison of various conditions and examination times is extrem helpful.

Treadmill The use of a treadmill in combination with video cameras allows optimal positioning of the camera as well as the ability to record a larger number of gait sequences in a short time. With the exact positioning of the camera parallel to or at a right angle to the axis of motion, the measurement of angles and distances is simplified. However, not all patients are used to walking on a treadmill. Examinations show that healthy patients require a period of 6 min for walking and running, whereas older patients require at least 14 min (Wass et al. 2005). Even then there are differences in the kinematics and kinetics of the gait (Alton et al. 1998; Wass et al. 2005). Therefore, the use of a treadmill for recording gait is only sensible if the patient is used to it. Usually, there is insufficient time and staff to provide the patient with enough time for adaptation.

Patient Preparation It is important for the quality of the recording that the patient is adequately adjusted. For this, it is best if the patient walks in his/her undergarments during the recording. Shorts and a t-shirt are also possible; however, it must be ensured that the pelvis can be well observed and that the t-shirt does not cover the pelvis. Marking of the joint points (ankle, knee, hip) as well as marking of the patella is helpful in video recordings in order to be able to better evaluate and/or measure the joint angle during walking. The markings should be performed during standing (not while lying down ➔ due to skin movement). Otherwise, the marked spots may no longer represent the desired skeletal reference points.

10

A. Kranzl

Structured Analysis of the Gait It is important for a high-quality evaluation that every plane and every joint is observed in a structured manner. This should ensure that gait disturbances or gait patterns are also recognized, even if this is not the main focus of the pathology. Numerous gait cycles should be observed: Experts tend to concentrate on certain parameters in the gait (Toro et al. 2003). Prefabricated examination forms which guide the examiner through the analysis are useful. One of the most well-known examination forms is the form from Jaqueline Perry (1992). Values can be entered for each gait phase and for each joint. Other structured examination forms go a step further. In addition, a further evaluation system is introduced (Visual Gait Assessment Scale, Edinburgh Visual Gait Scale, Observational Gait Scale, Physician’s Rating Scale). This score system allows determination of the degree of a gait disturbance and whether the gait is improved in the second analysis following the therapeutic measures (Viehweger et al. 2010). In particular, in the therapeutic environment, observational gait analysis is suitable for documenting therapeutic advances (Coutts 1999). The score compilation describes the gait with one or more values. However, it needs to be taken into account that a reduction of data may no longer allow an exact description of the disturbance.

Reliability of the Evaluation of Observational Video-Based Gait Analysis With respect to study data for gait evaluation (validity and reliability) via observational gait analysis (including implementation of videos), different images are presented dependent on the parameters of the analysis. Determination of the initial floor contact, for example, shows good interobserver reliability (Mackey et al. 2003). The reproducibility of the results depends on the experience of the reviewer. Experienced persons tend to demonstrate a higher reproducibility of results. These evaluations, however, remain subjective and mostly only show moderate reproducibility (Borel et al. 2011; Brunnekreef et al. 2005; Eastlack et al. 1991; Hillman et al. 2010; Krebs et al. 1985; Rathinam et al. 2014). The recording of force is not possible with video recordings only, although force plates can be included via video recording software products. Following calibration of the position of the force plates with the video image, an overlay function is provided which adds a force vector to the video image. In this way, force and leverage can be visualized. This is especially helpful in the fine adjustment of lower leg orthotics and prosthetics.

Clinical Gait Assessment by Video Observation and 2D Techniques

11

Video Recording Log It is useful to be guided by a standardized recording log so that a basic recording is present for each patient. This log (a and b) is carried out for each video recording at Speising Orthopaedic Hospital in Vienna. These recordings are usually carried out barefoot, if it is possible for the patient to perform the examination barefoot. In video recordings with numerous auxiliary means (shoes, orthoses), it needs to be ensured that the patient does not become tired during the recording. The gait speed should be determined by the patient. Prior to the actual recording, the patient should have time to become familiar with the laboratory surroundings and the requested performance. If possible, the patient should walk the walking distance a few times prior to the actual recording. (a) While walking: 1. sagittal and frontal recording 2. entire body in the frontal recording 3. from the pelvis downward in the frontal recording 4. lower legs and feet in the frontal recording The video image is zoomed in the frontal recording in order to continuously have the entire person or the relevant body segments in the image (Fig. 5). The recording during walking is performed at normal speed, and additionally at a quicker speed and running. Stopping and turning of the patient is also recorded in the frontal recording. If the patient requires an aid or various aids, the recording is performed with each of these aids individually. It is to be noted that recording with additional aids also requires more recording and analysis time. (b) While standing: 1. standing on one leg, both left and right 2. standing on toes 3. standing on heels 4. knee bending with both legs A further recording possibility is the recording of standing up from a chair and sitting down again. The 2D analysis can be used relatively easily to get an overview of the gait pattern. As already mentioned, 2D video analyses show only moderate reproducibility. If the motion occurs strictly perpendicular to the plane, we do not have any parallax error. Data presented by Davis et al. (1991) at the International Symposium on 3D Analysis of Human Movement in 1991 compared 62 normal subjects with 124 sides and 5 patients with 10 sides with 2D- and 3D-captured data. For normal subjects, they found a good accordance for the hip (sagittal, mean relative % difference 1%1% and frontal, mean relative % difference 9%7%) and knee joint angles (sagittal, mean relative % difference 4%2%). For the ankle joint angle, it seems that the accordance is less good (sagittal, mean relative % difference 13% 5%). For the impaired gait pattern, these values increase in all joints. Hip joint angle

12

A. Kranzl

(sagittal, mean relative % difference 8%8% and frontal, mean relative % difference 28%17%), knee joint angles (sagittal, mean relative % difference 8%5%), and ankle joint angle (sagittal, mean relative % difference 54%120%). The authors conclude the following: “Moreover, the utilization of 2D gait analysis strategies in clinical settings where the pathology can result in significant ‘out-of-plane’ motion is not appropriate and ill-advised. For gait analysis, there is no substitute for 3D motion analysis if we have ‘out-of-plane’ motion even though the 2D method is simpler and less expensive, it may produce results which are wrong.” Also Clarke and Murphy (2014) were able to show for healthy participants that for the sagittal knee joint motion, a revealed excellent agreement between 2D and 3D measurement exists. This is supported by Nielsen and Daugaard (2008) and Fatone and Stine (2015).

Clinical Use of 2D Motion Capture Looking at the literature in which the observational gait analysis is used, it is shown that this is used in a wide variety of diseases to analysis gait disorders. A large area of use is found in the range of the assessment of musculoskeletal abnormalities or gait disorders with cerebral palsy (Chan et al. 2014; Chantraine et al. 2016; Deltombe et al. 2015; Esposito and Venuti 2008; Maathuis et al. 2005; Satila et al. 2008). In addition to the quantification of gait disorders (Moseley et al. 2008), it is also used for the examination of therapeutic programs and therapeutic devices (Taylor et al. 2014). In patients with amputation of the lower extremity, it is used for checking and determining the function of prostheses. Through the standardized use of the video analysis, an objective documentation of the advantages and disadvantages of prosthesis parts is achieved (Lura et al. 2015; Vrieling et al. 2007). In the area of neurology, such as Parkinson’s patients (Guzik et al. 2017; Johnson et al. 2013; Obembe et al. 2014) or traumatic brain injuries (Williams et al. 2009), the video analysis serves as a tool to describe gait pattern. Button et al. (2008) used two-dimensional gait analysis to analyze patients with anterior cruciate ligament rupture after rehabilitation.

Database Software and Capturing Software The use of a database for video recordings is useful since it is easier to find previous recordings for gait comparison. Specialized software products for recording usually include a database. Depending on the software product, the search function may not be present or be only rudimentary, for searching for already recorded videos. If the software allows a keyword for the video, this should be carried out. The keyword makes finding the video with a special aid significantly easier. There are many programs on the market that allow you to record your videos on a PC and use a database function at the same time (e.g., Opensource KINOVEA or

Clinical Gait Assessment by Video Observation and 2D Techniques

13

Fig. 8 Knee joint angle measurement at initial contact, manual digitization

Tracker) or commercial products like TEMPLO (Contemplate), SIMI Motion (SIMI), myDartfish (Dartfish), MyoMotion (Noraxon), and others. The requirements for video recording software are relatively low. However, additional functions are essential for the quality of the gait analysis itself. Besides the playback function in real-time speed, the possibility of a slow motion should be available. In addition to the exact selection of a still image, a frame-by-frame function should also be available. For a more accurate analysis, it is helpful to measure distances and/or joint angles (Fig. 8) from the video. This is supported by most analysis programs and the results can be taken relatively easily into a report. When it comes to comparing a gait with two conditions (e.g., walking barefoot and walking with the shoe, or preoperative and postoperative), the software should allow you to play two videos or more in parallel (Fig. 9). Thus, a direct comparison between the conditions is possible. Another good option for comparing conditions is the overlay function; here two videos are superimposed and thus changes in the gait are easier to recognize. Newer systems also use a tracking function (Fig. 10). Thus, it is possible to output joint angles over the full gait cycle. For this purpose, use of markers is useful (white/black circular stickers).

Conclusion/Summary The use of the observational gait analysis is an important part of clinical practice. The use of video cameras for the documentation and evaluation of gait disturbances supports diagnosis and selection of treatment. Therefore, optimal recording

14

A. Kranzl

Fig. 9 Follow-up control, comparison of changes over years (screenshot TEMPLO software from Contemplas)

Fig. 10 Automatic tracking option to calculate the knee joint angle sagittal (screenshot MyoVideo software(MR3)) from Noraxon (Screen capture, myoVIDEO™ software, © Noraxon USA. Reprinted with permission)

conditions as well as structured processes in the analysis of the videos are important. In addition, optimal room size as well as optimum adjustment of equipment is a key component in obtaining high-quality video recordings.

Clinical Gait Assessment by Video Observation and 2D Techniques

15

Cross-References ▶ Activity Monitoring in Orthopaedic Patients ▶ Ankle Foot Orthoses and Their Influence on Gait ▶ Assessing Club Foot and Cerebral Palsy by Pedobarography ▶ Assessing Pediatric Foot Deformities by Pedobarography ▶ Gait Scores – Interpretations and Limitations ▶ Motion Analysis Through Video – How Does Dance Changes with the Visual Feedback ▶ The Use of Low Resolution Pedoboragraphs ▶ Upper Extremity Activities of Daily Living

References Alton F, Baldey L, Caplan S, Morrissey MC (1998) A kinematic comparison of overground and treadmill walking. Clin Biomech 13:434–440 Borel S, Schneider P, Newman CJ (2011) Video analysis software increases the interrater reliability of video gait assessments in children with cerebral palsy. Gait Posture 33:727–729 Brunnekreef JJ, van Uden CJ, van Moorsel S, Kooloos JG (2005) Reliability of videotaped observational gait analysis in patients with orthopedic impairments. BMC Musculoskelet Disord 6:17 Button K, van Deursen R, Price P (2008) Recovery in functional non-copers following anterior cruciate ligament rupture as detected by gait kinematics. Phys Ther Sport 9:97–104 Chan MO, Sen ES, Hardy E, Hensman P, Wraith E, Jones S, Rapley T, Foster HE (2014) Assessment of musculoskeletal abnormalities in children with mucopolysaccharidoses using pGALS. Pediatr Rheumatol Online J 12:32 Chantraine F, Filipetti P, Schreiber C, Remacle A, Kolanowski E, Moissenet F (2016) Proposition of a classification of adult patients with hemiparesis in chronic phase. PLoS One 11:e0156726 Clarke L, Murphy A (2014) Validation of a novel 2D motion analysis system to the gold standard in 3D motion analysis for calculation of sagittal plane kinematics. Gait Posture 39(Suppl 1): S44–S45 Coutts F (1999) Gait analysis in the therapeutic environment. Man Ther 4:2–10 Davis R, Ounpuu S, Tyburski D & Deluca P (1991) A comparison of two dimensional and three dimensional techniques for the determination of joint rotation angles. Proceedings of international symposium on 3D analysis of human movement. p 67–70. Deltombe T, Bleyenheuft C, Gustin T (2015) Comparison between tibial nerve block with anaesthetics and neurotomy in hemiplegic adults with spastic equinovarus foot. Ann Phys Rehabil Med 58:54–59 Eastlack ME, Arvidson J, Snyder-Mackler L, Danoff JV, McGarvey CL (1991) Interrater reliability of videotaped observational gait-analysis assessments. Phys Ther 71:465–472 Esposito G, Venuti P (2008) Analysis of toddlers’ gait after six months of independent walking to identify autism: a preliminary study. Percept Mot Skills 106:259–269 Fatone S, Stine R (2015) Capturing quality clinical videos for two-dimensional motion analysis. J Prosthet Orthot 27:27–32 Guzik A, Druzbicki M, Przysada G, Kwolek A, Brzozowska-Magon A, Wolan-Nieroda A (2017) Analysis of consistency between temporospatial gait parameters and gait assessment with the use of Wisconsin gait scale in post-stroke patients. Neurol Neurochir Pol 51:60–65 Hillman SJ, Donald SC, Herman J, McCurrach E, McGarry A, Richardson AM, Robb JE (2010) Repeatability of a new observational gait score for unilateral lower limb amputees. Gait Posture 32:39–45

16

A. Kranzl

Johnson L, Burridge JH, Demain SH (2013) Internal and external focus of attention during gait re-education: an observational study of physical therapist practice in stroke rehabilitation. Phys Ther 93:957–966 Krebs DE, Edelstein JE, Fishman S (1985) Reliability of observational kinematic gait analysis. Phys Ther 65:1027–1033 Lura DJ, Wernke MM, Carey SL, Kahle JT, Miro RM, Highsmith MJ (2015) Differences in knee flexion between the Genium and C-Leg microprocessor knees while walking on level ground and ramps. Clin Biomech (Bristol, Avon) 30:175–181 Maathuis KG, van der Schans CP, van Iperen A, Rietman HS, Geertzen JH (2005) Gait in children with cerebral palsy: observer reliability of physician rating scale and Edinburgh visual gait analysis interval testing scale. J Pediatr Orthop 25:268–272 Mackey AH, Lobb GL, Walt SE, Stott NS (2003) Reliability and validity of the observational gait scale in children with spastic diplegia. Dev Med Child Neurol 45:4–11 Moseley AM, Descatoire A, Adams RD (2008) Observation of high and low passive ankle flexibility in stair descent. Percept Mot Skills 106:328–340 Nielsen D, Daugaard M (2008) Comparison of angular measurements by 2D and 3D gait analysis. Dissertation, Jonkoping University. Obembe AO, Olaogun MO, Adedoyin R (2014) Gait and balance performance of stroke survivors in south-western Nigeria – a cross-sectional study. Pan Afr Med J 17(Suppl 1):6 Perry J (1992) Gait analysis, normal and pathological function. SLACK, Thorofare Rathinam C, Bateman A, Peirson J, Skinner J (2014) Observational gait assessment tools in paediatrics – a systematic review. Gait Posture 40:279–285 Satila H, Pietikainen T, Iisalo T, Lehtonen-Raty P, Salo M, Haataja R, Koivikko M, Autti-Ramo I (2008) Botulinum toxin type A injections into the calf muscles for treatment of spastic equinus in cerebral palsy: a randomized trial comparing single and multiple injection sites. Am J Phys Med Rehabil 87:386–394 Taylor P, Barrett C, Mann G, Wareham W, Swain I (2014) A feasibility study to investigate the effect of functional electrical stimulation and physiotherapy exercise on the quality of gait of people with multiple sclerosis. Neuromodulation 17:75–84. Discussion 84 Toro B, Nester CJ, Farren PC (2003) The status of gait assessment among physiotherapists in the United Kingdom. Arch Phys Med Rehabil 84:1878–1884 Viehweger E, Zurcher Pfund L, Helix M, Rohon MA, Jacquemier M, Scavarda D, Jouve JL, Bollini G, Loundou A, Simeoni MC (2010) Influence of clinical and gait analysis experience on reliability of observational gait analysis (Edinburgh gait score reliability). Ann Phys Rehabil Med 53:535–546 Vrieling AH, van Keeken HG, Schoppen T, Otten E, Halbertsma JP, Hof AL, Postema K (2007) Obstacle crossing in lower limb amputees. Gait Posture 26:587–594 Wass E, Taylor NF, Matsas A (2005) Familiarisation to treadmill walking in unimpaired older people. Gait Posture 21:72–79 Williams G, Morris ME, Schache A, Mccrory P (2009) Observational gait analysis in traumatic brain injury: accuracy of clinical judgment. Gait Posture 29:454–459

The Conventional Gait Model - Success and Limitations Richard Baker, Fabien Leboeuf, Julie Reay, and Morgan Sangeux

Abstract

The Conventional Gait Model (CGM) is a generic name for a family of closely related and very widely used biomechanical models for gait analysis. After describing its history, the core attributes of the model are described followed by evaluation of its strengths and weaknesses. An analysis of the current and future requirements for practical biomechanical models for clinical and other gait analysis purposes which have been rigorously calibrated suggests that the CGM is better suited for this purpose than any other currently available model. Modifications are required, however, and a number are proposed. Keywords

Clinical Gait Analysis • Biomechanical Modeling

R. Baker (*) University of Salford, Salford, UK e-mail: [email protected] F. Leboeuf • J. Reay School of Health Sciences, University of Salford, Salford, UK e-mail: [email protected]; [email protected] M. Sangeux Hugh Williamson Gait Analysis Laboratory, The Royal Children’s Hospital, Parkville/Melbourne, VIC, Australia Gait laboratory and Orthopaedics, The Murdoch Childrens Research Institute, Parkville/Melbourne, VIC, Australia e-mail: [email protected] # Springer International Publishing AG 2017 B. Müller, S.I. Wolf (eds.), Handbook of Human Motion, DOI 10.1007/978-3-319-30808-1_25-2

1

2

R. Baker et al.

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . State of the Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Model Structure and Anatomical Segment Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Marker Placement to Estimate Anatomical Segment Position . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kinematic Outputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kinetic Outputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Variants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Strengths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Weaknesses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Alternatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2 2 4 4 4 7 8 9 10 11 14 15 17 17

Introduction The Conventional Gait Model (CGM) is a generic name for a family of biomechanical models which emerged in the 1980s based on very similar principles and giving very similar results. It has a rather complex history (outlined below) and as a consequence has been referred to by a range of different names. The use of the name Conventional Gait Model is an attempt to emphasize the essential similarity of these models despite those different names. For a number of reasons, the CGM became the de facto standard for gait analysis in the 1990s, particularly in clinical and clinical research applications. Despite considerable strengths, technological advances have left aspects of the CGM looking quite outdated. The model, as originally formulated, also has a number of intrinsic limitations and, as these have become more widely appreciated, a variety of modifications and alternatives have been developed. Although the model can no longer be regarded as an industry-wide standard as was once the case, many of the more established and respected clinical centers still prefer to use the model considering its strengths to outweigh its limitations. After a brief summary of the historical development of the CGM, this chapter will describe its characteristics and then assess its strengths and limitations, concluding with some suggestions as to how the model could be developed in future in order to address those limitations while preserving its strengths.

History (Italicized words in this section are names that are sometimes used to refer to the CGM.) The origins of the model can be traced to the work of John Hagy in the laboratory established by David Sutherland (Sutherland and Hagy 1972) who digitized the

The Conventional Gait Model - Success and Limitations

3

positions of skin markings indicating anatomical landmarks from bi-planar movie stills. The coordinates were then used to compute a number of joint angles. Patrick Shoemaker extended this approach (Shoemaker 1978) to incorporate Ed Chao’s ideas on representing three-dimensional joint motion as Euler angles (Chao 1980). Jim Gage on a visit to San Diego prior to developing his own gait analysis laboratory at the Newington Hospital in Connecticut and a succession of engineers including Scott Tashman, Dennis Tyburski, and Roy Davis(Davis et al. 1991) further developed the ideas in a number of ways. Perhaps the most important of these were the calculation of joint angles on the basis of estimated joint centers (rather than directly from marker locations) and the incorporation of three-dimensional inverse dynamics to estimate joint moments (Ounpuu et al. 1991) based on the approach of David Winter (Winter and Robertson 1978). At about this time Murali Kadaba developed a very similar model at the Helen Hayes Hospital (Kadaba et al. 1990; Kadaba et al. 1989). There was communication between the two groups over this period, but there are now different memories as to the extent of this collaboration and the precise role of the different individuals involved. Although some minor modifications have been proposed since, the subsequent history is largely about how the model was distributed. The Helen Hayes Model was developed as a package and distributed across seven American hospitals. A little later, Oxford Metrics (now Vicon), the manufacturers of Vicon movement analysis systems, chose to develop their own version of the model (with support from individuals at both Newington and Helen Hayes). This was embedded within a package known as the Vicon Clinical Manager (VCM) and later developed as the Plug-in Gait (PiG) model for Workstation software. Most manufacturers of gait analysis systems produce some version of the model which goes under a variety of names. Perhaps because of commercial sensitivities it is generally rather unclear what level of agreement there is between data processed with these alternative models. Perhaps the most important factor leading to the widespread adoption of the CGM was the prominence of Vicon measurement systems in clinical and academic gait analysis at this time with VCM and PiG being delivered alongside with their hardware. Many of the more established clinical services were founded at this time and most adopted VCM and continued to use PiG. Jim Gage became a strong advocate for clinical gait analysis and with Roy Davis and Silvia Ounpuu established extremely well-regarded teaching courses first at Newington, then Gillette Children’s Hospital which were based on what they regarded as the Newington Model. The model was also explained and validated in a number of key papers (Kadaba et al. 1989, 1990; Davis et al. 1991; Ounpuu et al. 1991,1996) in considerably more detail than any other model at the time. Thus by the early 2000s, the CGM had become established as the predominant gait model for clinical and clinical research purposes, and a large community of users had developed embodying a solid understanding of its strengths and limitations. Since that time, this status has diminished somewhat. A larger number of suppliers to the gait analysis market and the increasing ease of integrating different software have widened the options for data processing. There have been

4

R. Baker et al.

considerable and often justified criticisms of the limitations of the CGM and a general failure of the CGM community to develop the model to address these issues. Despite this, the model is still almost certainly the most widely used and understood single model within the clinical and clinical research community.

State of the Art As stated above, the CGM is actually a family of closely related models but for simplicity this section will be limited to a description of that embodied in the VCM and PiG which are identical and the most commonly used versions. It is arguable whether the CGM is a model at all as the word is now understood in biomechanics and it was originally described as “an algorithm for computing lower extremity joint motion” (Kadaba et al. 1990) and “a data collection and reduction technique” (Davis et al. 1991) when first described. In the sections below, however, a modern understanding of biomechanical modeling will be used to describe the underlying concepts.

Model Structure and Anatomical Segment Definitions The model has seven segments linked in a chain by ball joints (three rotational degrees of freedom) in the sequence left foot, left tibia, left femur, pelvis, right femur, right tibia, right foot. An orthogonal coordinate system is associated with each segment. While the three segment axes are mathematically equivalent, clinical convention is to define the segment alignment in terms of the alignment of a primary axis and the rotation about this as defined by some off-axis reference point. The primary axes for each segment is taken to be that linking the joints which attach it to the two neighboring segments in the kinematic chain. Conceptually the segment axis systems are thus defined by specifying a primary axis and reference point for each. These are defined in Table 1.

Marker Placement to Estimate Anatomical Segment Position Markers are placed in such a way that the segment orientations can be estimated. When the model was developed, optoelectronic measurements systems were limited to resolving a small number of markers and thus the minimum number of markers possible is used. This is based on the assumption that the proximal joint of any leg segment (all those other than the pelvis) is known from the position and orientation of the joint to which it is linked proximally. More distal segment orientations are dependent on the orientation of the more proximal segments and the model is thus often described as being hierarchical. Because of the difficulty in resolving more than two markers on the foot at the time when the model was developed, it defined

The Conventional Gait Model - Success and Limitations

5

Table 1 Anatomical segment definition for the CGM Pelvis The primary axis is the mediolateral axis running from one hip joint center to the other. In most clinical applications, it is assumed that the pelvis is symmetrical and that this axis is thus parallel to the line running from one anterior superior iliac spine (ASIS) to the other. The reference point for rotation about this axis is the mid-point of the posterior superior iliac spines (PSIS). Femur The primary axis is that running from the hip joint center to knee joint center. The reference point is the lateral epicondyle. For validation purposes: • The hip joint center will be taken as the geometrical center of a sphere fitted to the articular surface of the femoral head. • The knee joint center will be taken as the mid-point of the medial and lateral epicondyles. These are often difficult to palpate, however, and for some purposes the line between these landmarks will be assumed to be parallel to that linking the most posterior aspects of the femoral condyles. Tibia The primary axis is that running from the knee joint center to ankle joint center. The reference point is the lateral malleolus. For validation purposes: • The ankle joint center will be assumed to be the mid-point of the medial and lateral epicondyles.

Foot The primary axis is that running from the most posterior axis of the calcaneus along the second ray and parallel to the plantar surface of the foot. Rotation about this axis is not defined.

the orientation of its primary axis but not any rotation about this. The locations of markers are given in Table 2. The hierarchical process requires a method for determining the location of the joints within each segment. The hip joint location within the pelvis coordinate system is specified by three equations (Davis et al. 1991) which are functions of leg length and ASIS to ASIS distance. These are measured during physical examination (although ASIS to ASIS distance can also be calculated from the marker positions during a static trial). The knee joint center in the femur coordinate system is assumed to lie in the coronal plane at the point at which the lines from it to the hip joint center and lateral femoral epicondyle are perpendicular and the distance between joint center and epicondyle is half the measured knee width. The ankle

6

R. Baker et al.

Table 2 Marker placement for the CGM Pelvis Markers are placed over both ASIS and PSIS in order that they lie in the plane containing the anatomical landmarks. A set of equations are used to estimate the location of the hip joint within the pelvic coordinate system. Femur The hip joint center within the femur is coincident with that within the pelvis. A marker is placed over the lateral femoral epicondyle and another on a wand on the lateral thigh in such a way that the two markers and the hip joint center lie within the coronal plane of the femur. The knee joint center is to be defined such that it, the hip joint center and the epicondyle marker form a right angle triangle within the coronal plane of the femur with a base of half the measured knee width. Tibia The knee joint center within the tibia is coincident with that within the femur. A marker is placed over the lateral malleolus and another on a wand on the lateral leg in such a way that the two markers and the knee joint center lie within the coronal plane of the tibia. The ankle joint center is to be defined such that the knee joint center and the malleolar marker form a right angle triangle within the coronal plane of the tibia with a base of half the measured ankle width. Foot The ankle joint center in the foot is defined to be coincident with that with the tibia. A marker is placed on the forefoot. Another marker is placed on the posterior aspect of the heel for the static trial such that the line between the two makers is parallel to the long axis of the foot. The angles between this and the line from the ankle joint center to the forefoot marker in the sagittal and horizontal planes are calculated. The heel marker is not used in walking trials but the offsets are used to estimate the alignment of the long axis of the foot based on the line between ankle joint center and forefoot marker.

joint center within the tibia is specified analogously with respect to the lateral malleolus. The wand markers (on both femur and tibia) are thus important to define the segmental coronal plane. Use of the wand (rather than a surface mounted marker) has two main purposes. The first is that wands (particularly those with a moveable ball and socket joint at the base) can be adjusted easily to define the correct plane. At least as important, however, is that by moving the marker away from the primary axis of the segment they make definition of the coronal plane much less sensitive to marker placement error or soft tissue artifact. Concerns have been expressed that the markers

The Conventional Gait Model - Success and Limitations

7

wobble but there is little evidence of this in gait data (it would appear as fluctuation in the hip rotation graph) if they are taped or strapped securely to the thigh. The foot segment uses the ankle joint center (which has already been defined in the tibia coordinate system) and one forefoot or toe marker. The placement of this marker varies considerably with some centers placing quite distally (typically at the level of the metatarsophalangeal joint) in which case it indicates overall foot alignment. Other centers, particularly those dealing with clinical populations who often have foot deformities, choose a more proximal placement (typically at the level of the cuneiforms) in order to give a better indication of hind foot alignment. Placement of a heel marker during the static trial also allows for offsets to ensure that ankle measurements were aligned with the long axis of the foot rather than simply by the line from the ankle joint center to the toe marker. A common variant is to calculate the plantar flexion offset on the assumption that the foot is flat and thus that the long axis of the foot is in the horizontal plane, during the static trial.

Kinematic Outputs Kinematic outputs are mainly joint angles describing the orientation of the distal segment with respect to that of the proximal segment. The orientation of the pelvis is output as segment angles (with respect to the laboratory-based axis system) as is the transverse plane alignment of the foot (called foot progression). In three dimensions, the orientation of one segment with respect to another must be represented by three numbers. The CGM uses Cardan angles which represent the set of sequential rotations about three different and mutually perpendicular axes that would rotate the distal segment from being aligned with the proximal segment (or the laboratorybased coordinate system) to its actual position. In the original model, the rotation sequence was about the medial-lateral, then the anterior-posterior and finally the proximal-distal axis for all joints (and segments). Although this sequence maps onto the conventional clinical understanding of the angles for most joints, it does not for the pelvis (Baker 2001). This is because with this rotation sequence, pelvic tilt is calculated as the rotation around the mediallateral axis of the laboratory coordinate system, rather than the medial-lateral axis of the pelvis segment, as per conventional understanding. Baker (Baker 2001) proposed to reverse the rotation sequence which results in pelvic angles that more closely map onto the conventional clinical understanding of these terms (confirmed by Foti et al. 2001). Following Baker’s recommendation to use globographic angles (Baker 2011), these can be interpreted exactly as listed in Table 3. While not formally a part of the model, the CGM is closely associated with a particular format of gait graph (see Fig.1). All data is time normalized to one gait cycle and the left side data plotted in one color (often red) and the right side data in another (often green, but blue reduces the risk of confusion by those who are color blind). The time of toe off is denoted by a vertical line across the full height of the graph and opposite foot off and contact by tick marks at either the top or bottom of the graphs (in the appropriate color). Normative data is often plotted as a grey band

8

R. Baker et al.

Table 3 Definition of joint angles as commonly used with the CGM Pelvis (with respect to global coordinate system) Internal/external rotation: rotation of the mediolateral axis about the vertical axis Obliquity (up/down): rotation of the mediolateral axis out of the horizontal plane Anterior/posterior tilt: rotation around the mediolateral axis Hip (femur with respect to pelvis coordinate system) Flexion/extension: rotation of the proximal-distal axis about the medio-lateral axis Ad/abduction: rotation of the proximal-distal axis out of the sagittal plane Internal/external rotation: rotation around the proximal-distal axis Knee (tibia with respect to femur coordinate system) Flexion/extension: rotation of the proximal distal axis about the medio-lateral axis Ad/abduction: rotation of the proximal-distal axis out of the sagittal plane Internal/external rotation: rotation around the proximal-distal axis Ankle (foot with respect to tibia coordinate system) Dorsiflexion/plantarflexion: rotation of the proximal distal axis about the medio-lateral axis Internal/external rotation: rotation of the proximal-distal axis out of the sagittal plane Foot (with respect to global coordinate system) Foot progression (in/out): rotation of the proximal-distal axis out of the “sagittal” plane

in the background (typical  one standard deviation about the mean). The graphs are then commonly displayed as arrays with the columns representing the different anatomical planes and the rows representing the different joints.

Kinetic Outputs The CGM is commonly used to calculate kinetic as well as kinematic outputs (Davis et al. 1991; Kadaba et al. 1989). Both the Newington and Helen Hayes approaches used inverse dynamics to estimate joint moments from force plate measurements of the ground reaction, an estimate of segment accelerations from kinematic data and estimates of segment inertial parameters. The main difference was that the Newington group took segment inertial parameters from the work of Dempster (1955) whereas the Helen Hayes group (Kadaba et al. 1989) took them from Hindrichs (1985) based on Clauser et al. (1969). Joint moments are fairly insensitive to these parameters (Rao et al. 2006; Pearsall and Costigan 1999), and it is unlikely that this would have led to noticeable differences in output. VCM and PiG used values from Dempster (1955). Joint moments were presented in the segment coordinate systems. The early papers do not specify whether the proximal or distal segment was used for this. PiG and VCM allowed the user to select which (or to use the global coordinate system) and the default setting of the distal system is probably most widely used. Joint power is also calculated as the vector dot product of the joint moment and angular velocity (note that this should be the true angular velocity vector and not that of the time derivatives of the Cardan angles). Power is a scalar quantity and there is thus no biomechanical justification for presenting “components” of power.

The Conventional Gait Model - Success and Limitations

9

75

Knee flexion (degrees)

60

45

30

15

0 Left Right -15 0%

20%

40%

60%

80%

100%

% gait cycle Fig.1 A standard gait graph. The curves represent how a single gait variable varies over the gait cycle. The vertical lines across the full height of the graph represents foot-off and the tick marks represent opposite foot off (to the left of graph) and opposite foot contact (to the right). Line in red is for the left side and in blue is for the left side. The grey areas represent the range of variability in some reference population as 1 standard deviation either side of the mean value

Variants Over the years a number of variants to the CGM have been implemented by particular groups. Most of these have not been formally described in the academic literature. • The original papers describing the model assumed that the femur and tibia wand markers could be placed accurately. Early experience was that this was challenging and an alternative technique was developed in which the markers were only positioned approximately and a Knee Alignment Device (KAD) was used during static trials to indicate the orientation of the knee flexion axis and hence the coronal plane of the femur. This allowed rotational offsets to be calculated to correct for any misalignment of the wand markers (with the tibial offset requiring an estimate of tibial torsion from the physical examination). • A development within PiG allowed a medial malleolar marker to indicate the position of the transmalleolar axis during the static trial and hence to calculate a value of tibial torsion rather than requiring this to be measured separately.

10

R. Baker et al.

• A method of allowing for the thickness of soft tissues over the ASIS was provided by allowing the measurement of the ASIS to greater trochanter distance which is an estimate of the distance by which the hip joint center was posterior to the base plate of the ASIS marker. • A technique called DynaKAD has been proposed (Baker et al. 1999) to define the thigh rotation offset by minimizing the varus-valgus movement during the walking trial. Other techniques have been used suggested to define this from functional calibration trials (Schwartz and Rozumalski2005; Sauret et al. 2016; Passmore and Sangeux2016). • VCM and PiG introduced an angular offset along the tibia such that knee rotation is defined as being zero during a static trial when the KAD is used and the orientation of the ankle joint axis is defined by a measurement of tibial torsion made during the physical exam (rather than the tibial wand marker). • Another development of PiG allowed the heel marker to be used to give an indication of inversion/eversion of the foot (rotation about the long axis) if it was left in place during the walking trial. • A further development allowed an angular offset to be applied allowing for the foot being pitched forward by a known amount during a static trial (to take account of the pitch of a shoe, for example). • An upper body model was developed by Vicon which, though widely used, has never been rigorously validated.

Strengths Recent opinion has tended to emphasize the weaknesses of the CGM, but it is also important to acknowledge its many strengths. In a world in which clinical governance is increasingly important, the CGM has been more extensively validated than any other model in routine clinical use. The early papers of Kadaba et al. were considerably ahead of their time in their approach to validation. The basic description of the model (Kadaba et al. 1990) includes presentation of normative data, a comparison of this against normative data from a range of previous papers and a sensitivity analysis of the most common measurement artifact arising from the difficulty in placing thigh wands accurately. The follow up paper (Kadaba et al. 1989, which was actually published first!) is also a definitive repeatability study. 15 out of the 23 papers identified in the classic systematic review of repeatability studies of kinematic models of McGinley et al. (McGinley et al. 2009) used a variant of the CGM, and a more recent study (Pinzone et al. 2014) has demonstrated the essential similarity of normative kinematic data collection from gait analysis services on different sides of the world but captured by the CGM. This body of formal validation literature is strongly reinforced by a large number of papers reporting use of the CGM in a very wide range of clinical and research applications. The CGM is thus particularly appropriate as a standardized and validated model for users who are more interested in interpreting what the results mean than in further model development and validation.

The Conventional Gait Model - Success and Limitations

11

Although the implementation of the model is not trivial, the basic concepts are about as simple as possible for a clinically useful model. It uses a minimal marker set which can be applied efficiently in routine clinical practice. The model is deterministic (does not require any optimized fitting process) and thus the effects of marker misplacement and or soft tissue artifact are entirely predictable (Table 4 illustrates the effect that a given movement in each marker will affect outputs). It is thus possible to develop a comprehensive understanding of how the model behaves without being an expert in biomechanics. This can be logically extended to give clear indications of how marker placement can be best adapted in order to obtain clinically meaningful outputs in the presence of bone and joint deformities or devices such as orthoses and prostheses. It is unfortunate, therefore, that in the early years the model developed a reputation for behaving as a “black box.” This probably arose because the most commonly available implementation, in the VCM, incorporated some refinements to the previously published versions (e.g., the thigh and shank offsets) which were only described conceptually in the accompanying product documentation. Many people assumed that there was insufficient information to fully understand the model; an assumption proved false by a number of exact clones emerging (Baker et al. 1999 is an example).

Weaknesses Accuracy While the CGM has been subjected to several studies to investigate its repeatability, there have been very few studies of its accuracy and those have focused on very specific issues such as the location of the hip joint center ( Sangeux et al. 2011,2014; Peters et al. 2012) and orientation of the knee flexion axis (Sauret et al. 2016; Passmore and Sangeux 2016) in standing. The model is intended to track the movements of the bones and there have been no studies performed to establish how accurately it can do this. This is principally because gold standard methods for tracking bone movement during walking are challenging (although a range of techniques are available – see section on “Future Directions” below). It should be emphasized, however, that this is a weakness of all commonly used biomechanical models for gait analysis and not just the CGM. Hip Joint Center Position A considerable body of knowledge now suggests that there are better methods for specifying the location of the hip joint center within the pelvic coordinate system than those used within the CGM (Leardini et al. 1999; Sangeux et al. 2011, 2014; Harrington et al. 2007; Peters et al. 2012). While the first of these (Leardini et al. 1999) suggested that functional calibration methods were superior to equations, more recent studies suggest that alternative equations can give results at least as good as functional methods in healthy adults (Sangeux et al. 2011, 2014; Harrington et al. 2007) and better in children with disabilities (Peters et al. 2012).

Tilt Obliquity Rotation -0.9 1.4 0.1 0.4 0.2 1.8

Pelvis

-0.1

0.1

-0.5

1.3

Flexion Adduction -1.2 1.8 -0.1 2

Hip

-1.8

2.8

Internal rotation -0.5 -0.3

-0.1 -0.9

-0.2 2.2

-0.9

1

Varus -0.4 -0.1 0.1

-0.9

Flexion -0.2 -0.2 0.2

Knee

-0.1

0.4

-0.1

-0.3

Internal rotation

-1.1 -4.6

0.1

0.1

0.1 -1

-0.4 1.9

a

-2.7

-0.1

Internal rotation

-0.2 0.8

-0.1

Dorsiflexion

Ankle

Notes: Data is unaffected by the location of the tibial wand marker as a KAD was used for the static trial b Moving the toe marker anteriorly or posteriorly has no effect on outputs as a “foot flat” option was used for the static trial

Marker moved RASI up RASI out SACR up SACR out RTHI up RKAD int RKNE up RKNE ant RTIB up1 RTIB ant1 RANK up RANK ant RTOE out RTOE ant

4.7

0.1

0.1

0 0.1

-0.1

Foot Internal progression

Table 4 Effects of moving a marker 5 mm in the specified direction on the outputs of the CGM. Note that because of the hierarchical basis of the model, movements can only affect segments on or below that to which a given marker is attached. Changes in angle of less than 0.1 are left blank

12 R. Baker et al.

The Conventional Gait Model - Success and Limitations

13

Defining the Coronal Plane of the Femur The first of the papers of Kadaba et al. (1990) highlighted the sensitivity of the CGM to misplacement of the thigh markers leading to erroneous definition of the coronal plane of the femur. This leads to a wellknown artifact in which the coronal plane knee kinematics show cross-talk from knee flexion-extension which is generally of little clinical significance but highlights uncertainty in hip rotation which is a major limitation of the model. Use of the KAD (which is very poorly documented in the literature) led to some improvements but this is still generally regarded as one of the most significant limitations of the model. Over-Simplistic Foot Modeling Modeling the foot as single axis rather than threedimensional segment arose from the difficulty early models had in detecting more than one marker placed on a small foot. While reliable detection of many markers on the foot has been possible for many years now, a formal extension of the model has never been proposed to model the foot more comprehensively. The Oxford Foot Model (Carson et al. 2001), which is probably now the most widely used in clinical and research practice, differs markedly from the CGM in that it allows translations between the forefoot, hind foot, and tibia (rather than the spherical joints that are a characteristic of the CGM). Unconstrained Segment Dimensions The CGM does not require the segments to be of a fixed length and soft-tissue artifact generally acts in such a way that the distance between the hip and knee joint centers can vary by as much as 2 cm over the gait cycle during walking. While this probably has a small effect on kinematic and kinetic outputs, it does prevent the use of the model with more advanced modeling techniques such as muscle length modeling and forward dynamics for which a rigid linked segment model is required. Modern inverse kinematic techniques (Lu and O’Connor 1999) which depend on rigid linked segment models also offer the potential to incorporate modeling of soft-tissue artifact (Leardini et al. 2005) based on data such as fluoroscopy studies (Tsai et al. 2009; Akbarshahi et al. 2010) in a manner that is not possible within the CGM. Inadequate Compensation for Soft Tissues over Pelvic Landmarks While methods have been proposed for measuring and taking into account the soft tissues over pelvic landmarks, none are particularly convincing or validated. As populations, particularly those with limited walking abilities, become increasingly overweight, this becomes a more important problem. Poorly Validated Upper Body Model While Davis et al. (Davis et al. 1991) did suggest placement of markers on the shoulders to give an indication of trunk alignment, this has not been widely implemented. Vicon developed an upper body model for PiG but, despite this being quite widely used, there have been no published validations of its outputs. It is still not clear how important upper limb movements are in relation to clinical gait analysis, but knowledge of trunk alignment and dynamics is clearly important to understand the mechanics of the gait patterns of many people with a range of conditions.

14

R. Baker et al.

Alternatives Perhaps the most commonly used alternatives to the CGM are 6 degree of freedom (6DoF) models. These can be traced back to the work of Cappozzo et al. (1995) and have been popularized through Visual 3D software (C-motion, Kingston, Canada). They track the segments independently (without constraining the joints) and can be based on skin mounted markers (as implied by the illustration in the original paper) or rigid marker clusters (as is more common nowadays). Perhaps the most important limitation of this approach is that it refers to a modeling technique rather than any specific model (CAST is an abbreviation for calibrated anatomical landmark technique) and no specific model has been widely used and rigorously validated. The Cleveland Clinic Marker Set was an early example which achieved popularity when it was implemented in the Orthotrack Software (Motion Analysis Corporation, Santa Rosa, USA) but has never been validated (or even fully described) in the peer-review literature. More recently Leardini et al. (2007) published and validated the IOR model but there are only limited reports of use outside Bologna in the literature (and it is worth noting that the IOR model, in using skin mounted markers, differs quite markedly from most contemporary 6DoF modeling which uses rigid clusters). 6DoF models are sometimes presented as addressing the known limitations of the CGM. Sometimes there is justification in these claims (e.g., the segments are fixed length) but often corresponding issues are overlooked (e.g., nonphysiological translations between the proximal and distal bones at some joints). Soft tissue artifact between markers is certainly eliminated by using rigid clusters but a different form of soft tissue artifact will affect the orientation and position of the whole cluster in relation to the bones (Barre et al. 2013). Other issues such as the difficulty in estimating the hip joint center or knee axis alignment affects all models. One advantage of most 6DoF models is that they use medial and lateral epicondyle markers during a static trial to define the knee joint axis. This may be more repeatable than precise alignment of thigh wands or KADs. It is also worth noting that this is only a difference of knee calibration technique which could easily be incorporated into the CGM. Inverse kinematic (often referred to as kinematic fitting or global optimization) models have also been reported (Lu and O’Connor 1999; Reinbolt et al. 2007; Charlton et al. 2004), and this approach has become more popular since it was incorporated within OpenSim (Seth et al. 2011) as the default technique for tracking marker data. In this, a linked segment rigid body model is defined and an optimization technique is used to fit the model to the measured marker positions, generally using some weighted least-squares fit cost function. As with 6DoF models, this approach has advantages and disadvantages with respect to the CGM. It is also similar to the 6DoF approach in that no single model has received widespread use or been subject to rigorous validation. The approach is inherently compatible with advanced modeling techniques (e.g., muscle length modeling and forward dynamics) and is well suited to either stochastic or predictive approaches to modeling soft tissue artifact. Its most notable weakness is that it is nondeterministic and on occasions artifacts can arise in the data from soft-tissue artifact, marker

The Conventional Gait Model - Success and Limitations

15

misplacement, or erroneous model definition that can be extremely difficult to source. On balance, however, it is likely that future developments will be based on an inverse kinematic approach.

Future Directions Over the lifetime of the CGM, the nature of gait analysis has changed considerably in at least two important ways. The first is the growing importance of clinical governance (Scally and Donaldson 1998) and evidence-based practice within healthcare organizations. This requires increasing standardization of all operations based upon well-validated procedures. The emergence of accreditation schemes such as those now operated by the Clinical Movement Analysis Society (CMAS, UK, and Ireland) or the Committee for Motion Laboratory Accreditation (USA) is a consequence of this. At present the focus is on whether written protocols exist at all, but it is inevitable, as this minimal standard becomes universally implemented, that more attention will be paid to ensuring that any procedures are appropriately validated. This may be reinforced by more rigorous implementation of medical device legislation to gait analysis software which should require manufacturers to ensure that clinically relevant outputs (such as joint angles from a specific biomechanical model) are reproducible (rather than just the technical outputs such as marker locations). The other change, which has implications beyond gait analysis for purely clinical purposes, is that gait analysis systems are getting much cheaper and more user friendly. It can no longer be assumed that laboratories will have a staff member suitably qualified in biomechanics to create and adapt their own models. People using current technology generally want to implement standardized techniques allowing them to focus on the interpretation of data rather than on developing individualized solutions and being distracted by the challenge of their validation. Such users will require a model that is simple enough to be understood conceptually in sufficient detail to guide quality assurance and interpretation of the data produced. In scientific research, it would also be useful to have a widely accepted standardized approach to capturing data to ensure that results from different centers are as comparable as possible. For clinical users and those in other fields who want to focus on the interpretation of data rather than the mechanics of data capture, therefore, there is a real need for a widely accepted, standardized, and validated approach to data capture (including biomechanical modeling) which is efficient and robust in application and sufficiently simple to be understood by the users themselves (rather than relying on biomechanical experts). To be useful in this context, it needs to be widely applicable to all people who are old enough to walk and who have a range of different health conditions (or none). There needs to be a strong evidence base for the reproducibility of measurements, specific training for staff involved in the capture and processing of data, and appropriate metrics to assure the quality of measurements in routine practice.

16

R. Baker et al.

The CGM satisfies all of these requirements at least, as well, and in most cases considerably better than alternatives. Despite this, many users are frustrated by its limitations while potential users are often put off by its commonly perceived weaknesses (some justified, some not). It is clear that if the CGM is to have a future, it will require modifications to address these. A particular issue for the CGM is that many older laboratories have databases stretching back over considerable periods of time (several decades in many cases) and backward compatibility is perceived as extremely important. Ensuring rigorous backwards is incompatible with improving the modeling of course, so a compromise is required. The most obvious is to ensure that any new model uses the same anatomical segment definitions (see Table 1) as the original. It may be that modifications lead to systematic differences with the original CGM, but it will be clear that these are consequences of improvements in the modeling rather than redefinition of what is being measured. It will also be important to quantify any such systematic changes in order that they can be accounted for if data processed using different versions of the model can be compared. Another specific issue with the CGM is the perception of it as a “black box” processing technique which cannot be properly understood. This has persisted despite increasingly good documentation being produced but will be best addressed by publishing the actual computer code through which the model is implemented. Implementing the code in an open source language (such as Python) which is available to all users will also be important. Training and education packages will also be required for those less technically minded. The specific modifications that are indicated would be: • Adoption of a robust inverse kinematic fitting approach based around a linked rigid segment model that is compatible with advanced musculoskeletal modeling techniques. • Replacement of wand markers with a limited number of skin mounted tracking markers on the femur and tibia positioned to minimize sensitivity to soft tissue artifact (Peters et al. 2009) or marker misplacement. • Incorporation of more accurate equations for estimating the hip joint center and techniques for accounting for the depth of soft tissues anatomical landmarks on the pelvis. • Improved methods for determining the orientation of the coronal plane of the femur. Basing this upon the position of medial and lateral femoral epicondyle markers during a calibration trial may be an improvement and functional calibration of the knee should be implemented as a quality assurance measure. • Improvement of foot modeling by formalizing the PiG approach to using the heel marker to give an indication of inversion and eversion of about the long axis of the foot. There is a lack of standardization in where the forefoot (toe) marker is placed. Opting for a more proximal placement (at about the level of the tarsometatarsal joints) would lead to the foot representing movement of the hind foot and open the possibility for some indication of forefoot alignment in relation to this using markers placed on the metarsophalangeal joints.

The Conventional Gait Model - Success and Limitations

17

• Validation of an appropriate trunk model should be regarded as essential. Doing so on the basis of force plate measurements of center of mass displacement during walking (Eames et al. 1999) would be useful to establish just how important measuring upper limb movement is in gait analysis. Future versions should be adequately validated in line with a modern understanding of clinical best practice. At a minimum, this should include evidence of reproducibility of results, but it would also be useful to have accuracy established with reference to a variety of static and dynamic imaging techniques such as threedimensional ultrasound (Peters et al. 2010; Hicks and Richards 2005; Passmore and Sangeux2016), low intensity biplanar x-rays (Pillet et al. 2014; Sangeux et al. 2014; Sauret et al. 2016), or fluoroscopy (Tsai et al. 2009; Akbarshahi et al. 2010). There should also be publication of benchmark data with which services can compare their own to ensure consistency (Pinzone et al. 2014) and streamlined processed for conducting in-house repeatability studies would also be extremely useful.

Cross-References ▶ 3D Dynamic Pose Estimation Using Reflective Markers or Electromagnetic Sensors ▶ 3D Dynamic Probablistic Pose Estimation From Data Collected Using Cameras and Reflective Markers ▶ 3D Kinematics of Human Motion ▶ Next Generation Models Using Optimized Joint Center Location ▶ Observing and revealing the hidden structure of the human form in motion throughout the centuries ▶ Physics-based Models for Human Gait Analysis ▶ Rigid Body Models of the Musculoskeletal System ▶ Variations of Marker-sets and Models for Standard Gait Analysis

References Akbarshahi M, Schache AG, Fernandez JW, Baker R, Banks S, Pandy MG (2010) Non-invasive assessment of soft-tissue artifact and its effect on knee joint kinematics during functional activity. J Biomech 43(7):1292–1301. doi:10.1016/j.jbiomech.2010.01.002 Baker R (2001) Pelvic angles: a mathematically rigorous definition which is consistent with a conventional clinical understanding of the terms. Gait Posture 13(1):1–6. doi:10.1016/S09666362(00)00083-7 Baker R (2011) Globographic visualisation of three dimensional joint angles. J Biomech 44 (10):1885–1891. doi:10.1016/j.jbiomech.2011.04.031 Baker R, Finney L, Orr J (1999) A new approach to determine the hip rotations profile from clinical gait analysis data. Hum Mov Sci 18:655–667. doi:10.1016/S0167-9457(99)00027-5

18

R. Baker et al.

Barre A, Thiran JP, Jolles BM, Theumann N, Aminian K (2013) Soft tissue artifact assessment during treadmill walking in subjects with total knee arthroplasty. IEEE Trans Biomed Eng 60 (11):3131–3140. doi:10.1109/TBME.2013.2268938 Cappozzo A, Catani F, Croce UD, Leardini A (1995) Position and orientation in space of bones during movement: anatomical frame definition and determination. Clin Biomech 10 (4):171–178. doi:10.1016/0268-0033(95)91394-T Carson MC, Harrington ME, Thompson N, O’Connor JJ, Theologis TN (2001) Kinematic analysis of a multi-segment foot model for research and clinical applications: a repeatability analysis. J Biomech 34(10):1299–1307. doi:10.1016/S0021-9290(01)00101-4 Chao EY (1980) Justification of triaxial goniometer for the measurement of joint rotation. J Biomech 13:989–1006. doi:10.1016/0021-9290(80)90044-5 Charlton IW, Tate P, Smyth P, Roren L (2004) Repeatability of an optimised lower body model. Gait Posture 20(2):213–221. doi:10.1016/j.gaitpost.2003.09.004 Clauser C, McConville J, Young J (1969) Weight volume and centre of mass of segments of the human body (AMRL Technical Report). Wright-Patterson Air Force Base, Ohio Davis RB, Ounpuu S, Tyburski D, Gage J (1991) A gait analysis data collection and reduction technique. Hum Mov Sci 10:575–587. doi:10.1016/0167-9457(91)90046-Z Dempster W (1955) Space requirements of the seated operator (WADC Technical Report :55–159). Wright-Patterson Airforce Base, Ohio Eames M, Cosgrove A, Baker R (1999) Comparing methods of estimating the total body centre of mass in three-dimensions in normal and pathological gait. Hum Mov Sci 18:637–646. doi:10.1016/S0167-9457(99)00022-6 Foti T, Davis RB, Davids JR, Farrell ME (2001) Assessment of methods to describe the angular position of the pelvis during gait in children with hemiplegic cerebral palsy. Gait Posture 13:270 Harrington ME, Zavatsky AB, Lawson SE, Yuan Z, Theologis TN (2007) Prediction of the hip joint centre in adults, children, and patients with cerebral palsy based on magnetic resonance imaging. J Biomech 40(3):595–602. doi:10.1016/j.jbiomech.2006.02.003 Hicks JL, Richards JG (2005) Clinical applicability of using spherical fitting to find hip joint centers. Gait Posture 22(2):138–145. doi:10.1016/j.gaitpost.2004.08.004 Hinrichs RN (1985) Regression equations to predict segmental moments of inertia from anthropometric measurements: an extension of the data of Chandler et al. (1975). J Biomech 18 (8):621–624. doi:10.1016/0021-9290(85)90016-8 Kadaba MP, Ramakrishnan HK, Wootten ME, Gainey J, Gorton G, Cochran GV (1989) Repeatability of kinematic, kinetic, and electromyographic data in normal adult gait. J Orthop Res 7 (6):849–860. doi:10.1002/jor.1100070611 Kadaba MP, Ramakrishnan HK, Wootten ME (1990) Measurement of lower extremity kinematics during level walking. J Orthop Res 8(3):383–392. doi:10.1002/jor.1100080310 Leardini A, Cappozzo A, Catani F, Toksvig-Larsen S, Petitto A, Sforza V, Cassanelli G, Giannini S (1999) Validation of a functional method for the estimation of hip joint centre location. J Biomech 32(1):99–103. doi:10.1016/S0021-9290(98)00148-1 Leardini A, Chiari L, Della Croce U, Cappozzo A (2005) Human movement analysis using stereophotogrammetry. Part 3. Soft tissue artifact assessment and compensation. Gait Posture 21(2):212–225. doi:10.1016/j.gaitpost.2004.05.002 Leardini A, Sawacha Z, Paolini G, Ingrosso S, Nativo R, Benedetti MG (2007) A new anatomically based protocol for gait analysis in children. Gait Posture 26(4):560–571. doi:10.1016/j. gaitpost.2006.12.018 Lu TW, O’Connor JJ (1999) Bone position estimation from skin marker co-ordinates using global optimisation with joint constraints. J Biomech 32(2):129–134. doi:10.1016/S0021-9290(98) 00158-4 McGinley JL, Baker R, Wolfe R, Morris ME (2009) The reliability of three-dimensional kinematic gait measurements: a systematic review. Gait Posture 29(3):360–369. doi:10.1016/j. gaitpost.2008.09.003

The Conventional Gait Model - Success and Limitations

19

Ounpuu S, Gage J, Davis R (1991) Three-dimensional lower extremity joint kinetics in normal pediatric gait. J Pediatr Orthop 11:341–349 Ounpuu O, Davis R, Deluca P (1996) Joint kinetics: methods, interpretation and treatment decisionmaking in children with cerebral palsy and myelomeningocele. Gait Posture 4:62–78. doi:10.1016/0966-6362(95)01044-0 Passmore E, Sangeux M (2016) Defining the medial-lateral axis of an anatomical femur coordinate system using freehand 3D ultrasound imaging. Gait Posture 45:211–216. doi:10.1016/j. gaitpost.2016.02.006 Pearsall DJ, Costigan PA (1999) The effect of segment parameter error on gait analysis results. Gait Posture 9(3):173–183 Peters A, Sangeux M, Morris ME, Baker R (2009) Determination of the optimal locations of surface-mounted markers on the tibial segment. Gait Posture 29(1):42–48. doi:10.1016/j. gaitpost.2008.06.007 Peters A, Baker R, Sangeux M (2010) Validation of 3-D freehand ultrasound for the determination of the hip joint centre. Gait Posture 31:530–532. doi:10.1016/j.gaitpost.2010.01.014 Peters A, Baker R, Morris ME, Sangeux M (2012) A comparison of hip joint centre localisation techniques with 3-DUS for clinical gait analysis in children with cerebral palsy. Gait Posture 36 (2):282–286. doi:10.1016/j.gaitpost.2012.03.011 Pillet H, Sangeux M, Hausselle J, El Rachkidi R, Skalli W (2014) A reference method for the evaluation of femoral head joint center location technique based on external markers. Gait Posture 39(1):655–658. doi:10.1016/j.gaitpost.2013.08.020 Pinzone O, Schwartz MH, Thomason P, Baker R (2014) The comparison of normative reference data from different gait analysis services. Gait Posture 40(2):286–290. doi:10.1016/j. gaitpost.2014.03.185 Rao G, Amarantini D, Berton E, Favier D (2006) Influence of body segments’ parameters estimation models on inverse dynamics solutions during gait. J Biomech 39(8):1531–1536. doi:10.1016/j.jbiomech.2005.04.014 Reinbolt JA, Haftka RT, Chmielewski TL, Fregly BJ (2007) Are patient-specific joint and inertial parameters necessary for accurate inverse dynamics analyses of gait? IEEE Trans Biomed Eng 54(5):782–793. doi:10.1109/TBME.2006.889187 Sangeux M, Peters A, Baker R (2011) Hip joint centre localization: Evaluation on normal subjects in the context of gait analysis. Gait Posture 34(3):324–328. doi:10.1016/j.gaitpost.2011.05.019 Sangeux M, Pillet H, Skalli W (2014) Which method of hip joint centre localisation should be used in gait analysis? Gait Posture 40(1):20–25. doi:10.1016/j.gaitpost.2014.01.024 Sauret C, Pillet H, Skalli W, Sangeux M (2016) On the use of knee functional calibration to determine the medio-lateral axis of the femur in gait analysis: Comparison with EOS biplanar radiographs as reference. Gait Posture 50:180–184. doi:10.1016/j.gaitpost.2016.09.008 Scally G, Donaldson L (1998) Clinical governance and the drive for quality improvement in the new NHS in England. Br Med J 317:61–65. doi:10.1136/bmj.317.7150.61 Schwartz MH, Rozumalski A (2005) A new method for estimating joint parameters from motion data. J Biomech 38(1):107–116. doi:10.1016/j.jbiomech.2004.03.009 Seth A, Sherman M, Reinbolt JA, Delp SL (2011) OpenSim: a musculoskeletal modeling and simulation framework for in silico investigations and exchange. Procedia IUTAM 2:212–232. doi:10.1016/j.piutam.2011.04.021 Shoemaker P (1978) Measurements of relative lower body segment positions in gait analysis. University of California, San Diego Sutherland D, Hagy J (1972) Measurement of gait movements from motion picture film. J Bone Joint Surg 54A(4):787–797 Tsai T-Y, Lu T-W, Kuo M-Y, Hsu H-C (2009) Quantification of three-dimensional movement of skin markers realtive to the underlying bones during functional activities. Biomed Eng: Appl Basis Commun 21(3):223–232. doi:10.4015/S1016237209001283 Winter D, Robertson D (1978) Joint torque and energy patterns in normal gait. Biol Cybern 29:137–142. doi:10.1007/BF00337349

Variations of Marker Sets and Models for Standard Gait Analysis Felix Stief

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . State of the Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Anatomical and Technical Markers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Marker Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Definition of a Segment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Prediction Approach or the Conventional Gait Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Functional Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Impact of Marker Set and Joint Angle Calculation on Gait Analysis Results . . . . . . . . . . . . . . . . . . Errors Involved with Marker Placement and Soft-Tissue Artifacts . . . . . . . . . . . . . . . . . . . . . . . . . . Errors Associated with the Regression Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . How to Address the Measurement Error and What is the Extent of This Error? . . . . . . . . . . . Accuracy for Marker-Based Gait Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Comparison of Marker Sets and Models for Standard Gait Analysis . . . . . . . . . . . . . . . . . . . . . . . Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Importance of Repeatability Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Removing the Effects of Marker Misplacement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2 3 3 4 5 6 6 7 7 8 9 10 10 13 13 14 14 15

Abstract

A variety of different approaches is used in 3D clinical gait analysis. This chapter provides an overview of common terms, different marker sets, underlying anatomical models, as well as a fundamental understanding of measurement techniques commonly used in clinical gait analysis and the consideration of possible errors associated with these different techniques. Besides the different marker

F. Stief (*) Movement Analysis Lab, Orthopedic University Hospital Friedrichsheim gGmbH, Frankfurt/Main, Germany e-mail: [email protected]; [email protected] # Springer International Publishing AG 2016 B. Müller, S.I. Wolf (eds.), Handbook of Human Motion, DOI 10.1007/978-3-319-30808-1_26-1

1

2

F. Stief

sets, two main approaches can be used to quantify marker-based joint angles: a prediction approach based on regression equations and a functional approach. The prediction approach uses anatomical assumptions and anthropometric reference data to define the locations of joint centers/axes relative to specific anatomical landmarks. In the functional approach, joint centers are determined via optimization of marker movement. The accuracy of determining skeletal kinematics is limited by ambiguity in landmark identification and soft-tissue artifacts. When the intersubject variability of control data becomes greater than the expected change due to pathology, the clinical usefulness of the data becomes doubtful. To allow a practical interpretation of a comparison of approaches, differences and the measurement error should be quantified in the unit of interest (i.e., degree or percent). The highest reliability indices occurred in the hip and knee in the sagittal plane, with lowest reliability and highest errors for hip and knee rotation in the transverse plane. In addition, knowledge about sources of errors should be known before the approach is applied in practice. Keywords

Marker sets • Anatomical markers • Technical markers • Clusters • Modeling • Segment definition • Prediction approach • Functional approach • Regression equations • Conventional Gait Model • Measurement error • Soft-tissue artifacts • Reliability • Accuracy

Introduction This chapter provides an overview of common terms, different marker sets, underlying anatomical models, as well as a fundamental understanding of measurement techniques commonly used in clinical gait analysis and the consideration of possible errors associated with these different techniques. It is possible for a clinician or physician to subjectively study gait; however, the value and repeatability of this type of assessment is questionable due to poor interand intra-tester reliability. For instance, it is impossible for one individual to study, by observation alone, the movement pattern of all the main joints involved during an activity like walking simultaneously. Therefore, skeletal movements in three dimensions during gait are typically recorded using markers placed on the surface of the skin on various anatomical landmarks to represent body segments. The markerbased analysis of human movement helps to better understand normal and pathological function and results in a detailed and objective clinical assessment of therapeutic and surgical interventions. A variety of different anatomical models and marker sets were used for clinical gait analysis. While a certain amount of standardization could be established in recent years for the marker placement on anatomical points and the definition for most of the rigid body segments (pelvic, thigh, shank, foot), protocols differ in the underlying biomechanical model, the definition of joint centers and axes, and the

Variations of Marker Sets and Models for Standard Gait Analysis

3

number of markers used. These differences have an effect on the outcome measures (e.g., joint angles and moments). The main focus of this chapter is to demonstrate the impact of marker sets and joint angle calculations on gait analysis results.

State of the Art Markers are either described as passive or active (“▶ Estimating Hierarchical Rigid Body Models of the Musculoskeletal System”). Passive markers for camera-based systems are generally made of a retroreflective material. This material is used to reflect light emitted from around the camera back to the camera lens. Some camerabased systems use a stroboscopic light, while others use light from synchronized infrared light-emitting diodes mounted around the camera lens. In contrast, active markers produce light at a given frequency, so these systems do not require illumination, and, as such, the markers are more easily identified and tracked (Chiari et al. 2005). These light-emitting diodes (LED) are attached to a body segment in the same way as passive markers, but with the addition of a power source and a control unit for each LED. Active markers can have their own specific frequency which allows them to be automatically detected. This leads to very stable real-time three-dimensional motion tracking as no markers can be misidentified as adjacent markers. Regardless of whether passive or active, the use of markers should not significantly modify the movement pattern being measured.

Anatomical and Technical Markers Anatomical markers are used to set up the segment reference frame. This is generally done during a static trial with the subject standing still. Anatomical markers may be attached on bony landmarks directly to the skin or fixed to a pointer. These markers are not required for the dynamic trials as long as at least three fixed points are available on each segment. Technical markers have no specific location and are chosen purely to meet the other requirements above. Additional technical markers can be used to create a technical coordinate system from data collected in a static calibration trial during which both anatomical and technical markers are present. In subsequent dynamic trials, absent anatomical markers can be expressed in relation to the technical coordinate system. Technical markers can also be used to avoid areas of adipose tissue in obese patients, to accommodate walking aids, or to replace markers that are obscured dynamically. Two approaches are commonly used. Technical markers may be used to replace only those anatomical markers that cannot be used dynamically. In this case, the majority of anatomical markers remain in place for the walking trials. Alternatively, clusters of technical markers attached to a plate (see “Marker Clusters” below) may be used to provide all the dynamic information needed. Anatomical markers are then only used for the static trial to allow segment reconstruction.

4

F. Stief

Fig. 1 Rigid marker cluster with four retroreflective markers

Marker Clusters Other techniques for minimizing soft-tissue artifacts and in order to reduce intersubject variability are marker clusters (an array of markers) (Cappozzo et al. 1995). They must be in place during the static anatomical calibration. The exact placement of the clusters is less reliant as this technique uses the relative positions to the anatomical landmarks used in the static calibration. The purpose is to define the plane of each segment with 3–5 markers and then track its movement through the basic reference planes. Clusters can be directly attached to the skin or mounted on rigid fixtures (Fig. 1), which are dependent upon the anatomy, the activity, and the nature of the analysis. In a rigid body or cluster, the distance between any two points within the body or cluster does not change. In general, tracking of marker clusters helps to reduce noise within the motion signal and improve accuracy of kinematic data. When the markers are fixed to rigid plates, the markers never move independently with deformation of the skin. It has been shown that the absolute and relative variance in out-of-sagittal plane rotations tended to be higher for the Conventional Gait Model (“▶ The Conventional Gait Model: Success and Limitations”) compared with a cluster technique (Duffell et al. 2014) and that a cluster marker set overcomes a number of theoretical limitations compared to the conventional set (Collins et al. 2009) when both models were compared simultaneously. Much work has been carried out determining the optimal configuration of marker clusters, and it is now widely accepted that a rigid shell with a cluster of four markers is a good practical solution

Variations of Marker Sets and Models for Standard Gait Analysis

5

(Cappozzo et al. 1997; Manal et al. 2000). However, when the cluster markers were fixed to a rigid plate, these methods were not able to address absolute errors and can still result in inaccurate identification of joint centers (Holden and Stanhope 1998). Although an extended version of this method has reported improvements in estimation of the position of the underlying bones (Alexander and Andriacchi 2001), it can only model skin deformations and has limited use in clinical applications due to the number of additional markers required.

The Definition of a Segment In general, three markers are needed to fix a rigid body in space. When using motion capture to define the pelvic segment (“▶ Estimating Hierarchical Rigid Body Models of the Musculoskeletal System”) and measure pelvic motion, the International Society of Biomechanics (ISB) recommends the pelvic anatomical coordinate system be defined by surface markers placed on the right and left anterior superior iliac spines (ASISs) and on the right and left posterior superior iliac spines (PSISs). The pelvic anatomical coordinate system can be described as the origin at the midpoint between the right ASIS and the left ASIS; the Z-axis points from the origin to the right ASIS; the X-axis lies in the plane defined by the right ASIS, left ASIS, and the midpoint of the right PSIS and left PSIS markers and points ventrally orthogonal to the Z-axis; and the Y-axis is orthogonal to these two axes (Wu et al. 2002). These markers would ideally be used to track the pelvis during gait or clinical assessment protocols that involve movement. However, situations in which the ASIS or PSIS markers are obscured from view require that alternative technical marker sets are used. Occlusion of the ASIS markers could be as a result of soft tissue around the anterior abdomen (a common issue in overweight and obese subjects), arm movement, or activities that require high degrees of hip and trunk flexion, such as running, stair climbing, or level walking. It has been shown that pelvic models that include markers placed on the ASISs and the iliac crests (ICs), and PSISs and ICs, are suitable alternatives to the standard pelvic model (ASISs and PSISs) for tracking pelvic motion during gait (Bruno and Barden 2015). Alternatively, the use of a rigid cluster of three orthogonal markers as technical markers attached to the sacrum can be used (Borhani et al. 2013). Using the calibrated anatomical system technique (Benedetti et al. 1998; Cappello et al. 2005) allows the position of ASIS defined relative to the cluster in a static trial, and then during dynamic trial, the position of the ASIS is linked to the cluster and thus affected by the same skin movement artifact that affects the cluster. Another alternative to solve skin artifacts is to use the right and left hip joint centers described in the technical coordinate system of the right and left thighs, together with the right PSIS and left PSIS markers, as technical markers for tracking the pelvis movement (Kisho Fukuchi et al. 2010).

6

F. Stief

Prediction Approach or the Conventional Gait Model Besides the different marker sets, two main approaches can be used to quantify joint angles: a prediction approach based on regression equations and a functional approach. The prediction approach uses anatomical assumptions and anthropometric reference data to define the locations of joint centers/axes relative to specific anatomical landmarks (Isman and Inman 1969; Weidow et al. 2006). In the functional approach, joint centers are determined via optimization of marker movement. The advantages and disadvantages of both approaches were described below in detail. Most biomechanical analysis systems use regression equations based on predictive methods to calculate joint centers. Kadaba et al. (1989), Davis III et al. (1991), and Vaughan et al. (1992) provided detailed descriptions of a marker-based system to calculate joint centers in the lower extremities. This marker setup has become one of the most commonly used models in gait analysis. It is referred to as Helen Hayes Hospital marker setup, and the regression equations are referred to as the Plug-inGait (PiG) model or the Conventional Gait Model (“▶ The Conventional Gait Model: Success and Limitations”).

Functional Approach In general, technical marker sets require data capture in a static standing trial to determine rotation values (offsets) to place these markers into the anatomical coordinate system. If a marker does, for instance, not accurately represent the position of the hip during standing data capture, the technical markers will not be placed into the correct anatomical plane for the dynamic trial. This is particularly problematic if the static and dynamic positions of the hip vary from one another. It has been shown that static standing posture greatly affected the dynamic hip rotation kinematics when using a thigh wand in the typical clinical gait analysis process for the Conventional Gait Model (McMulkin and Gordon 2009). Therefore, if a thigh wand is to be used in clinical practice, it is necessary that patients stand in a hip rotation posture that is equivalent to hip rotation position used in gait. This can be very difficult because it requires clinicians to have a priori knowledge of the gait hip rotation before testing. Also, patients may use different strategies in static standing than with walking posture. One way of addressing this issue is to use functional joint center techniques (Ehrig et al. 2006; Leardini et al. 1999; Schwartz and Rozumalski 2005). This functional approach is considered functional due to the calculation of subject-specific joint centers/axes by using specific movement data of adjacent segments derived from basic motion tasks. With a focus on assessing motion patterns in a subject-specific manner, functional methods rely on the relative motion between the marker clusters of neighboring segments to identify joint centers and axes (Cappozzo et al. 1997; Ehrig et al. 2006). Previously developed functional methods have been demonstrated to be precise (Ehrig et al. 2006; Kornaropoulos et al. 2010; Kratzenstein et al. 2012) as well as rapid and robust (Schwartz and Rozumalski

Variations of Marker Sets and Models for Standard Gait Analysis

7

2005) in estimating joint centers. Nevertheless, in many patient groups, functional calibration has been reported to be difficult (Sangeux et al. 2011) due to the fact that the range of motion (ROM) of affected joints is restricted. In addition, functional methods have not been able to demonstrate consistent advantages over more traditional regression-based approaches (Assi et al. 2016; Bell et al. 1990; Davis III et al. 1991; Harrington et al. 2007), possibly due to issues of marker placement and the nonlinear distribution of soft-tissue artifacts across a segment (Gao and Zheng 2008; Stagni et al. 2005). Kratzenstein et al. (2012) present an approach for understanding the contribution of different regions of marker attachment on the thigh toward the precise determination of the hip joint center. This working group used a combination of established approaches (Taylor et al. 2010) to reduce skin marker artifacts (Taylor et al. 2005), determine joint centers of rotation (Ehrig et al. 2006), and quantify the weighting of each of a large number of markers (Heller et al. 2011) attached to the thigh. Consequently, markers that are suboptimally located and therefore strongly affected by soft-tissue artifacts are assigned a lower weighting compared to markers that follow spherical trajectories around the joint. Based on these methods, six regions of high importance were determined that produced a symmetrical center of rotation estimation (Ehrig et al. 2011) almost as low as using a marker set that covered the entire thigh. Such approaches could be used to optimize marker sets for targeting more accurate and robust motion capture for aiding in clinical diagnosis and improving the reliability of longitudinal studies.

Impact of Marker Set and Joint Angle Calculation on Gait Analysis Results Errors Involved with Marker Placement and Soft-Tissue Artifacts The accuracy of determining skeletal kinematics is limited by ambiguity in landmark identification and soft-tissue artifacts that is the motion of markers over the underlying bones due to skin elasticity, muscle contraction, or synchronous shifting of the soft tissues (Leardini et al. 2005; Taylor et al. 2005). Generally, two types of errors are referred to soft-tissue artifacts. Relative errors are defined as the relative movement between two or more markers that define a rigid segment. Absolute errors are defined as the movement of a marker with respect to the bony landmark it is representing (Richards 2008). Relative and absolute errors are often caused by movement of the soft tissue on which the markers are placed (Cappozzo et al. 1996). The magnitude of these errors has been studied by using pins secured directly into the bone and comparing the data collected from skin-mounted markers to markers attached to bone pins. These data give a direct measure of soft-tissue movement with respect to the skeletal system (Cappozzo 1991; Cappozzo et al. 1996; Reinschmidt et al. 1997a; b). However, the applicability of this method is limited due to their invasive nature. The amount and the effects of soft-tissue artifacts from skin markers are discussed controversial with relative skin to bone marker movements in the range of 3 mm up to 40 mm, dependent upon the specific body

8

F. Stief

segment and soft-tissue coverage (Cappozzo et al. 1996; Holden et al. 1997; Manal et al. 2000, 2003; Reinschmidt et al. 1997b). Differences can be accounted for by variation in marker placement and configuration, differences in techniques, intersubject differences, and differences in the task performed (Leardini et al. 2005). Inaccuracies in lower limb motion and in particular knee kinematics are present mainly because of soft-tissue artifacts at the thigh segment (Alexander and Andriacchi 2001; Cappello et al. 1997; Fuller et al. 1997; Leardini et al. 2005; Lucchetti et al. 1998). Conversely, soft-tissue movement on the shank has only a small effect on three-dimensional kinematics and moments at the knee (Holden et al. 1997; Manal et al. 2002). In addition, mainly in the frontal and transverse planes, substantial angular variabilities were noted (Ferrari et al. 2008; Miana et al. 2009) due to the small ROM in these planes compared to sagittal plane movements. This reasoning agrees with the results of Leardini et al. (2005) who assert that angles outof-sagittal planes should be regarded with much more caution as the soft-tissue artifact produces spurious effects with magnitudes comparable to the amount of motion actually occurring in the joints. In addition, an increase in velocity (for instance, during running) produces an increased variability of the joint centers’ distances and increases the maximum differences between the joint angles when using different protocols (Miana et al. 2009).

Errors Associated with the Regression Equations Besides soft-tissue artifacts and variability of the marker placement, errors associated with the regression equations used to calculate the joint center locations are also considerable (Harrington et al. 2007; Leardini et al. 1999; Sangeux et al. 2011). Clinically, the definition of the joint center is generally achieved by using palpable anatomical landmarks to define the medial-lateral axis of the joint. From these anatomical landmarks, the center of rotation is generally calculated in one of two ways: through the use of regression equations based on standard radiographic evidence or simply calculated as a percentage offset from the anatomical marker based on some kind of anatomical landmarks (Bell et al. 1990; Cappozzo et al. 1995; Davis III et al. 1991; Kadaba et al. 1989). The issue of hip joint center (HJC) identification is one that has been covered in much depth, and there are still many debates around this area. The location of this joint center is one of the most difficult anatomic reference points to define. The center of the femoral head is the center of the hip joint and located within the acetabulum on the obliquely aligned and tilted lateral side of the pelvis. Therefore, common approaches have used landmarks on the pelvis as the anatomical reference (Perry and Burnfield 2010). The regression equations in the Conventional Gait Model are based on the HJC regression equations by Davis et al. (1991) and chord functions to predict the knee and the ankle joint centers. The HJC regression equation was based on 25 male subjects and has been validated in later studies (Harrington et al. 2007; Leardini et al. 1999; Sangeux et al. 2011) showing

Variations of Marker Sets and Models for Standard Gait Analysis

9

significant errors, which were corrected with new regression equations (Sandau et al. 2015). In the chord function, the HJC, thigh wand marker, and the epicondyle marker were used to define a plane. The knee joint center (KJC) was then found so that the epicondyle marker was at a half knee diameter distance from the KJC in a direction perpendicular to the line from the HJC to KJC. The ankle joint center (AJC) was predicted in the same way as the knee, where the chord function was used to predict the joint center based on the KJC, the calf wand marker, and the malleolus marker. The chord functions predict the KJC and the AJC with the assumption that the joint centers are lying on the transepicondylar axis and the transmalleolar axis in the frontal plane, respectively. This assumption seems reasonable for the knee (Asano et al. 2005; Most et al. 2004), but to a lesser extent regarding the ankle joint (Lundberg et al. 1989). The exact position of the joint centers influences the joint angles as well as joint angular velocity and acceleration which are part of inverse dynamics. Likewise, the location of segmental center of mass will influence the inverse dynamics calculations via the moment arms acting together with both proximal and distal joint reaction forces.

How to Address the Measurement Error and What is the Extent of This Error? In general, when addressing the measurement error in marker-based movement analysis, it is helpful to provide an absolute measure of reliability, for instance, the root mean square error or standard error of measurement (SEM). It is thus possible to express the variability in a manner that can be directly related to the measurement itself, in the same measurement units (e.g., degrees). Furthermore, with the transformation of the absolute error into relative error, one can obtain the error expressed as percentage corresponding to the total ROM of the variable to be analyzed. This is of particular importance for the between-plane comparison of the measurement error with different amplitude of the kinematic and kinetic parameters (Stief et al. 2013). In contrast, the commonly reported intraclass correlation coefficient or coefficient of variation and coefficient of multiple correlations allow limited information, as high coefficient values can result from a low mean value of the variable of interest and thus could hide measurement errors of clinical importance (Luiz and Szklo 2005). Furthermore, expressing data variability as a coefficient results in units that are difficult to interpret clinically (Leardini et al. 2007). Regarding the literature, kinematic measurement errors of less than 4 and 6 were reported for the intertrial and intersession variability, respectively (Stief et al. 2013). A systematic review from McGinley et al. (2009) identifies that the highest reliability indices occurred in the hip and knee in the sagittal plane, with lowest reliability and highest errors for hip and knee rotation in the transverse plane. Most studies included in this review article providing estimates of data error reported values of less than 5 , with the exception of hip and knee rotation. Fukaya et al. (2013) investigated the interrater reliability of knee movement analyses during the

10

F. Stief

stance phase using a rigid marker set with three attached markers affixed to the thigh and shank. Each of three testers independently attached the infrared reflective markers to four subjects. The SEM values for reliability ranged from 0.68 to 1.13 for flexion-extension, 0.78 –1.60 for external-internal rotation, and 1.43 –3.33 for abduction-adduction. In general, the measurement errors between testers are considered to be greater than the measurement errors between sessions and within testers (Schwartz et al. 2004).

Accuracy for Marker-Based Gait Analysis The accuracy of body protocols can hardly be assessed in clinical routine since invasive methods such as radiographic imaging (Garling et al. 2007) or bone pins (Taylor et al. 2005) are required in order to provide sufficient access to the skeletal anatomy but are generally not available. Ultrasound assessment of the joint provides one noninvasive opportunity (Sangeux et al. 2011), but assessment of the images can be somewhat subjective. According to Schwartz and Rozumalski (2005), the following indirect indicators of accuracy can be computed instead: 1. Knee varus/valgus ROM during gait: An accurate knee flexion axis alignment minimizes the varus/valgus ROM resulting from cross-talk, that is, one joint rotation (e.g., flexion) being interpreted as another (e.g., adduction or varus) due to axis malalignment (Piazza and Cavanagh 2000). 2. Knee flexion/extension ROM during gait: An accurate knee flexion axis alignment maximizes knee flexion/extension ROM by reducing cross-talk. In general, the knee varus/valgus curve can be evaluated for signs of marker misplacement or Knee Alignment Device misalignment. Moreover, it has been shown that for the stable knee joint, the physiological ROM of knee varus/valgus only varies between 5 and 10 (Reinschmidt et al. 1997a). Minimization of the knee joint angle cross talk can therefore be considered to be a valid criterion to evaluate the relative merits of different protocols and marker sets.

Comparison of Marker Sets and Models for Standard Gait Analysis There is still a variety of different approaches being used in clinical gait analysis. Protocols differ in the underlying biomechanical model, associated marker sets, and data recording and processing. The former defines properties of the modeled joints, the number of involved segments, the definitions of joint centers and axes, the used anatomical and technical reference frames, and the angular decomposition technique to calculate joint angles. Despite apparent differences of the outcome measures derived from different gait protocols (Ferrari et al. 2008; Gorton et al. 2009), specifically for out-of-sagittal plane rotations (Ferrari et al. 2008), data of different studies are compared and interpreted.

Variations of Marker Sets and Models for Standard Gait Analysis

11

Any protocol for movement analysis will only prove useful if it displays adequate reliability (Cappozzo 1984). Moreover, and as stated before, the placement of the markers has considerable influence on the accuracy of gait studies (Gorton et al. 2009). One of the first protocols proposed by Davis et al. (1991), and known as Conventional Gait Model or PiG model, is still used by a vast majority of gait laboratories (Schwartz and Rozumalski 2005). Although the protocol is practicable and has been established over the years, some main disadvantages exist. It has been shown that intersession and interexaminer reliability are low for this protocol, especially at the hip and knee joint in the frontal and transverse plane (McGinley et al. 2009). The errors in the PiG protocol, for example, knee varus/valgus ROM up to 35 (Ferrari et al. 2008), are very likely caused by inconsistent anatomical landmark identification and marker positioning by the examiner. This leads to well-documented errors of skin movement (Leardini et al. 2005) and kinematic cross talk. Moreover, accurate placement of the wand markers on the shank and the thigh is difficult (Karlsson and Tranberg 1999). Wands on the lateral aspect of the thighs and shanks are also likely to enlarge skin motion artifact effects (Manal et al. 2000) and variability of the gait results (Gorton et al. 2009). One way of addressing these errors is the usage of additional medial malleolus and medial femoral condyle markers to determine joint centers. This eliminates the reliance on the difficult, subjective palpation of the thigh and tibia wand markers necessary for the PiG model, which has been shown to have large variability (Gorton et al. 2009) between laboratories and to enlarge skin motion artifact effects (Manal et al. 2002), especially when placed proximally where the greatest soft-tissue artifact of any lower-limb segment is found (Stagni et al. 2005). Besides that, it has been shown that thigh wand markers capture approximately half of actual femoral axial rotation (Schache et al. 2008; Schulz and Kimmel 2010; Wren et al. 2008). The reason for this may be that substantial proportions of hip external-internal rotations were being detected as knee motions by the marker sets using thigh markers (Schulz and Kimmel 2010). Wren et al. (2008) have suggested using a patella marker (placed in the center of the patella), which was reported to detect 98% of the actual hip rotation ROM. And indeed, dynamic hip rotation during gait when utilizing a patella marker in lieu of a thigh wand was not effected by static hip posture (McMulkin and Gordon 2009). In a comparative study, the reliability and accuracy of the PiG model and an advanced protocol (MA) with additional medial malleolus and medial femoral condyle markers were estimated (Stief et al. 2013) (Fig. 2). For the MA, neither anthropometric measurements nor joint alignment devices are necessary. Knowledge of anatomical landmarks spatial location enables automatic calculation of anthropometric measurements necessary for joint center determination. In both protocols, the center of the hip joint was calculated using a geometrical prediction method (Davis III et al. 1991). The PiG model derived the rotational axis of the knee joint from the position of the pelvic, knee, and thigh markers and the rotational axis of the ankle joint from the position of the knee, ankle, and tibia markers. In contrast to the PiG model, the centers of the knee and ankle joints using the MA were statically defined as the midpoint between the medial and

12

F. Stief SACR

LASI

RASI

LTRO

RTRO

RTHI

LTHI

RKNEL

LKNEM

LKNEL

RKNEM

RTIB

LTIB

RHEE RANKL RANKM

LHEE

LANKL

LANKM LTOE

RTOE

Abbreviation SACR LASI (RASI) LTRO (RTRO) LTHI (RTHI)

Placement On the skin mid-way between the posterior superior iliac spines On the left (right) anterior superior iliac spine On the prominent point of the left (right) trochanter major Rigid wand marker mounted on the skin over the distal and lateral aspect of the left (right) thigh aligned in the plane that contains the hip and knee joint centers and the knee flexion/extension axis LKNEL (RKNEL) On the left (right) lateral femoral condyle LKNEM (RKNEM) On the left (right) medial femoral condyle Rigid wand marker mounted on the skin over the distal and lateral aspect LTIB (RTIB) of the left (right) shank aligned in the plane that contains the knee and ankle joint centers and the ankle flexion/extension axis LANKL (RANKL) On the left (right) lateral malleolus aligned with the bimalleolar axis LANKM (RANKM) On the left (right) medial malleolus aligned with the bimalleolar axis Ont the left (right) second metatarsal head, on the mid-foot sides LTOE (RTOE) of the equinus break between fore-foot and mid-foot On the left (right) aspect of the Achilles tendon insertion, on the calcaneous LHEE (RHEE) at the same height above the plantar surface of the foot as the LTOE (RTOE) marker

Required for protocol PiG / MA PiG / MA MA PiG

PiG / MA MA PiG

PiG / MA MA PiG / MA PiG / MA

Fig. 2 Marker set of both lower body protocols. The markers indicated by circles are part of the standard Plug-in-Gait (PiG) marker set (Conventional Gait Model); those indicated by triangles are the additional markers used in the custom made protocol (MA)

lateral femoral condyle and malleolus markers. The anatomical medial malleolus and femoral condyle markers can then be removed for the dynamic trials. The results of this comparative study (PiG model vs. MA) show for both protocols and healthy subjects a good intersession reliability for all ankle, knee, and hip joint angles in the sagittal plane. Nevertheless, the lower intersession errors for the

Variations of Marker Sets and Models for Standard Gait Analysis

13

MA compared to the PiG model regarding frontal plane knee angles and moments and transverse plane motion in the knee and hip joint suggest that the error in repeated palpation of the landmarks is lower using the MA. Moreover, the MA significantly reduced the knee axis cross talk phenomenon, suggesting improved accuracy of knee axis alignment compared to the PiG model. These results are comparable to those reported by Schwartz and Rozumalski (2005) using a functional approach in comparison with the PiG model. The MA eliminates the reliance on the subjective palpation of the thigh and tibia wand markers and the application of a Knee Alignment Device method (Davis and DeLuca 1996), which is difficult to handle and less reliable within or between therapists than manual palpation, especially in non-experienced investigators (Serfling et al. 2009). Nevertheless, a correct marker placement based on the exact identification of the characteristic anthropological points of the body (bony landmarks) is required. Especially the position of the knee markers is very important, because it influences not only knee joint kinematics, but also hip and ankle joints. It has been shown that simultaneous knee hyperextension, internal hip rotation, and external ankle rotation can be caused by back lateral knee marker misplacement, and simultaneous knee overflexion, external hip rotation, and internal ankle rotation may be influenced by forward knee marker misplacement (Szczerbik and Kalinowska 2011). Therefore, if such phenomena are represented by kinematic graphs, their presence should be confirmed by video registration prior to the formulation of clinical conclusions.

Future Directions There are many anatomical models and marker sets reported in the literature. The increase in complexity in the models relates not only to the ability of movement analysis systems to track more and more markers but also in the increase in the knowledge of modeling human movement.

The Importance of Repeatability Studies Some of the theoretical aspects of marker placement have been presented in this chapter. The practical implications are best explored in the gait laboratory by repeat marker placement. Repeated testing of a single subject will give some insight into the variability for a single person placing markers (intrasubject reliability) and between different people placing markers (intersubject reliability). Intersubject variability would additionally be affected by differences in each subject’s walking style and between-subject differences in marker placement and motion relative to bony landmarks. When the intersubject variability of control data becomes greater than the expected change due to pathology, the clinical usefulness of the data becomes doubtful. To allow a practical interpretation of a comparison of approaches, differences and their variability should be quantified in the unit of interest (i.e., degree or percent).

14

F. Stief

Removing the Effects of Marker Misplacement The placement of markers is not easy, and there is a limit to the accuracy we can realistically achieve. Even if the markers are in the right place, the effects of skin movement and oscillation will introduce errors once the subject is walking. One possibility is that the marker placement is “corrected” as part of the data processing. Complex algorithms are now becoming available for performing such corrections. A simpler approach has been used for some time to increase the accuracy in joint center determination. In addition to the Conventional Gait Model, at least the use of medial malleolus and medial femoral condyle markers is recommended when analyzing frontal and transverse plane gait data. This should lead to lower measurement errors for most of the gait variables and to a more accurate determination of the knee joint axis. Nevertheless, gait variables in the transverse plane are poorly reproducible (Ferber et al. 2002; Krauss et al. 2012), and their variability associated with the underlying biomechanical protocol is substantial (Ferrari et al. 2008; Krauss et al. 2012; Noonan et al. 2003). In future, approaches that combine key characteristics of proven methods (functional and/or predictive methods) for the assessment of skeletal kinematics could be used to optimize marker sets for targeting more accurate and robust motion capture for aiding in clinical diagnosis and improving the reliability of longitudinal studies. On the other hand, procedural distress should be minimized. Especially children cannot always stand still for a long time, walk wearing a large number of markers, and perform additional motion trials. The marker set and possible associated anatomical landmark calibration or anthropometric measurement procedures, therefore, must be minimized to contain the time taken for subject preparation and data collection (Leardini et al. 2007).

Conclusion When comparing movement data, it is worth noting that care must be taken where different marker sets have been used. Whatever approach is used, the problem is separating patterns produced by errors from those produced by pathology. Till this day, it is, for instance, not clear how different marker configurations impact hip rotation for the typical clinical gait analysis process. For this reason, the “true” values for rotation often remain unknown. Therefore, gait protocols have to be described precisely, and comparison with other studies should be done critically. In addition, knowledge about sources of errors should be known before the approach is applied in practice. Learning and training of the examiners, which is considered to be a critical issue (Gorton et al. 2009), is important to ensure exact anatomical landmark locations which may also reduce intra- and inter-examiner variability. Moreover, the graphs from instrumented gait analysis should be confirmed by video registration prior to the formulation of clinical conclusions.

Variations of Marker Sets and Models for Standard Gait Analysis

15

References Alexander EJ, Andriacchi TP (2001) Correcting for deformation in skin-based marker systems. J Biomech 34(3):355–361 Asano T, Akagi M, Nakamura T (2005) The functional flexion-extension axis of the knee corresponds to the surgical epicondylar axis: in vivo analysis using a biplanar image-matching technique. J Arthroplast 20(8):1060–1067. doi:10.1016/j.arth.2004.08.005 Assi A, Sauret C, Massaad A, Bakouny Z, Pillet H, Skalli W, Ghanem I (2016) Validation of hip joint center localization methods during gait analysis using 3D EOS imaging in typically developing and cerebral palsy children. Gait Posture 48:30–35. doi:10.1016/j. gaitpost.2016.04.028 Bell AL, Pedersen DR, Brand RA (1990) A comparison of the accuracy of several hip center location prediction methods. J Biomech 23(6):617–621 Benedetti MG, Catani F, Leardini A, Pignotti E, Giannini S (1998) Data management in gait analysis for clinical applications. Clin Biomech (Bristol, Avon) 13(3):204–215 Borhani M, McGregor AH, Bull AM (2013) An alternative technical marker set for the pelvis is more repeatable than the standard pelvic marker set. Gait Posture 38(4):1032–1037. doi:10.1016/j.gaitpost.2013.05.019 Bruno P, Barden J (2015) Comparison of two alternative technical marker sets for measuring 3D pelvic motion during gait. J Biomech 48(14):3876–3882. doi:10.1016/j.jbiomech.2015.09.031 Cappello A, Cappozzo A, La Palombara PF, Lucchetti L, Leardini A (1997) Multiple anatomical landmark calibration for optimal bone pose estimation. Hum Mov Sci 16(2–3):259–274. doi:10.1016/S0167-9457(96)00055-3 Cappello A, Stagni R, Fantozzi S, Leardini A (2005) Soft tissue artifact compensation in knee kinematics by double anatomical landmark calibration: performance of a novel method during selected motor tasks. IEEE Trans Biomed Eng 52(6):992–998. doi:10.1109/tbme.2005.846728 Cappozzo A (1984) Gait analysis methodology. Hum Mov Sci 3(1–2):27–50. doi:10.1016/01679457(84)90004-6 Cappozzo A (1991) Three-dimensional analysis of human walking: experimental methods and associated artifacts. Hum Mov Sci 10(5):589–602. doi:10.1016/0167-9457(91)90047-2 Cappozzo A, Catani F, Croce UD, Leardini A (1995) Position and orientation in space of bones during movement: anatomical frame definition and determination. Clin Biomech (Bristol, Avon) 10(4):171–178 Cappozzo A, Catani F, Leardini A, Benedetti MG, Croce UD (1996) Position and orientation in space of bones during movement: experimental artefacts. Clin Biomech (Bristol, Avon) 11(2): 90–100 Cappozzo A, Cappello A, Della Croce U, Pensalfini F (1997) Surface-marker cluster design criteria for 3-D bone movement reconstruction. IEEE Trans Biomed Eng 44(12):1165–1174. doi:10.1109/10.649988 Chiari L, Della Croce U, Leardini A, Cappozzo A (2005) Human movement analysis using stereophotogrammetry. Part 2: instrumental errors. Gait Posture 21(2):197–211. doi:10.1016/j. gaitpost.2004.04.004 Collins TD, Ghoussayni SN, Ewins DJ, Kent JA (2009) A six degrees-of-freedom marker set for gait analysis: repeatability and comparison with a modified Helen Hayes set. Gait Posture 30(2): 173–180. doi:10.1016/j.gaitpost.2009.04.004 Davis RB, DeLuca PA (1996) Clinical gait analysis: current methods and future directions. In: Harris GF, Smith PA (eds) Human motion analysis: current applications and future directions. The Institute of Electrical and Electronic Engineers Press, New York, pp 17–42 Davis RB III, Õunpuu S, Tyburski D, Gage JR (1991) A gait analysis data collection and reduction technique. Hum Mov Sci 10(5):575–587. doi:10.1016/0167-9457(91)90046-Z Duffell LD, Hope N, McGregor AH (2014) Comparison of kinematic and kinetic parameters calculated using a cluster-based model and Vicon’s plug-in gait. Proc Inst Mech Eng H 228 (2):206–210. doi:10.1177/0954411913518747

16

F. Stief

Ehrig RM, Taylor WR, Duda GN, Heller MO (2006) A survey of formal methods for determining the centre of rotation of ball joints. J Biomech 39(15):2798–2809. doi:10.1016/j. jbiomech.2005.10.002 Ehrig RM, Heller MO, Kratzenstein S, Duda GN, Trepczynski A, Taylor WR (2011) The SCoRE residual: a quality index to assess the accuracy of joint estimations. J Biomech 44 (7):1400–1404. doi:10.1016/j.jbiomech.2010.12.009 Ferber R, McClay Davis I, Williams DS 3rd, Laughton C (2002) A comparison of within- and between-day reliability of discrete 3D lower extremity variables in runners. J Orthop Res 20 (6):1139–1145. doi:10.1016/s0736-0266(02)00077-3 Ferrari A, Benedetti MG, Pavan E, Frigo C, Bettinelli D, Rabuffetti M, Crenna P, Leardini A (2008) Quantitative comparison of five current protocols in gait analysis. Gait Posture 28(2):207–216. doi:10.1016/j.gaitpost.2007.11.009 Fukaya T, Mutsuzaki H, Wadano Y (2013) Interrater reproducibility of knee movement analyses during the stance phase: use of anatomical landmark calibration with a rigid marker set. Rehabil Res Pract 2013:692624. doi:10.1155/2013/692624 Fuller J, Liu LJ, Murphy MC, Mann RW (1997) A comparison of lower-extremity skeletal kinematics measured using skin- and pin-mounted markers. Hum Mov Sci 16(2–3):219–242. doi:10.1016/S0167-9457(96)00053-X Gao B, Zheng NN (2008) Investigation of soft tissue movement during level walking: translations and rotations of skin markers. J Biomech 41(15):3189–3195. doi:10.1016/j. jbiomech.2008.08.028 Garling EH, Kaptein BL, Mertens B, Barendregt W, Veeger HE, Nelissen RG, Valstar ER (2007) Soft-tissue artefact assessment during step-up using fluoroscopy and skin-mounted markers. J Biomech 40(Suppl 1):S18–S24. doi:10.1016/j.jbiomech.2007.03.003 Gorton GE 3rd, Hebert DA, Gannotti ME (2009) Assessment of the kinematic variability among 12 motion analysis laboratories. Gait Posture 29(3):398–402. doi:10.1016/j. gaitpost.2008.10.060 Harrington ME, Zavatsky AB, Lawson SE, Yuan Z, Theologis TN (2007) Prediction of the hip joint centre in adults, children, and patients with cerebral palsy based on magnetic resonance imaging. J Biomech 40(3):595–602. doi:10.1016/j.jbiomech.2006.02.003 Heller MO, Kratzenstein S, Ehrig RM, Wassilew G, Duda GN, Taylor WR (2011) The weighted optimal common shape technique improves identification of the hip joint center of rotation in vivo. J Orthop Res 29(10):1470–1475. doi:10.1002/jor.21426 Holden JP, Stanhope SJ (1998) The effect of variation in knee center location estimates on net knee joint moments. Gait Posture 7(1):1–6 Holden JP, Orsini JA, Siegel KL, Kepple TM, Gerber LH, Stanhope SJ (1997) Surface movement errors in shank kinematics and knee kinetics during gait. Gait & Posture 5(3):217–227. doi:10.1016/S0966-6362(96)01088-0 Isman RE, Inman VT (1969) Anthropometric studies of the human foot and ankle. Bull Prosthet Res 10(11):97–219 Kadaba MP, Ramakrishnan HK, Wootten ME, Gainey J, Gorton G, Cochran GV (1989) Repeatability of kinematic, kinetic, and electromyographic data in normal adult gait. J Orthop Res 7 (6):849–860. doi:10.1002/jor.1100070611 Karlsson D, Tranberg R (1999) On skin movement artefact-resonant frequencies of skin markers attached to the leg. Hum Mov Sci 18(5):627–635. doi:10.1016/S0167-9457(99)00025-1 Kisho Fukuchi R, Arakaki C, Veras Orselli MI, Duarte M (2010) Evaluation of alternative technical markers for the pelvic coordinate system. J Biomech 43(3):592–594. doi:10.1016/j. jbiomech.2009.09.050 Kornaropoulos EI, Taylor WR, Duda GN, Ehrig RM, Matziolis G, Muller M, Wassilew G, Asbach P, Perka C, Heller MO (2010) Frontal plane alignment: an imageless method to predict the mechanical femoral-tibial angle (mFTA) based on functional determination of joint centres and axes. Gait Posture 31(2):204–208. doi:10.1016/j.gaitpost.2009.10.006

Variations of Marker Sets and Models for Standard Gait Analysis

17

Kratzenstein S, Kornaropoulos EI, Ehrig RM, Heller MO, Popplau BM, Taylor WR (2012) Effective marker placement for functional identification of the centre of rotation at the hip. Gait Posture 36(3):482–486. doi:10.1016/j.gaitpost.2012.04.011 Krauss I, List R, Janssen P, Grau S, Horstmann T, Stacoff A (2012) Comparison of distinctive gait variables using two different biomechanical models for knee joint kinematics in subjects with knee osteoarthritis and healthy controls. Clin Biomech (Bristol, Avon) 27(3):281–286. doi:10.1016/j.clinbiomech.2011.09.013 Leardini A, Cappozzo A, Catani F, Toksvig-Larsen S, Petitto A, Sforza V, Cassanelli G, Giannini S (1999) Validation of a functional method for the estimation of hip joint centre location. J Biomech 32(1):99–103 Leardini A, Chiari L, Della Croce U, Cappozzo A (2005) Human movement analysis using stereophotogrammetry. Part 3. Soft tissue artifact assessment and compensation. Gait Posture 21(2):212–225. doi:10.1016/j.gaitpost.2004.05.002 Leardini A, Sawacha Z, Paolini G, Ingrosso S, Nativo R, Benedetti MG (2007) A new anatomically based protocol for gait analysis in children. Gait Posture 26(4):560–571. doi:10.1016/j. gaitpost.2006.12.018 Lucchetti L, Cappozzo A, Cappello A, Della Croce U (1998) Skin movement artefact assessment and compensation in the estimation of knee-joint kinematics. J Biomech 31(11):977–984 Luiz RR, Szklo M (2005) More than one statistical strategy to assess agreement of quantitative measurements may usefully be reported. J Clin Epidemiol 58(3):215–216. doi:10.1016/j. jclinepi.2004.07.007 Lundberg A, Svensson OK, Nemeth G, Selvik G (1989) The axis of rotation of the ankle joint. J Bone Joint Surg (Br) 71(1):94–99 Manal K, McClay I, Stanhope S, Richards J, Galinat B (2000) Comparison of surface mounted markers and attachment methods in estimating tibial rotations during walking: an in vivo study. Gait Posture 11(1):38–45 Manal K, McClay I, Richards J, Galinat B, Stanhope S (2002) Knee moment profiles during walking: errors due to soft tissue movement of the shank and the influence of the reference coordinate system. Gait Posture 15(1):10–17 Manal K, McClay Davis I, Galinat B, Stanhope S (2003) The accuracy of estimating proximal tibial translation during natural cadence walking: bone vs. skin mounted targets. Clin Biomech (Bristol, Avon) 18(2):126–131 McGinley JL, Baker R, Wolfe R, Morris ME (2009) The reliability of three-dimensional kinematic gait measurements: a systematic review. Gait Posture 29(3):360–369. doi:10.1016/j. gaitpost.2008.09.003 McMulkin ML, Gordon AB (2009) The effect of static standing posture on dynamic walking kinematics: comparison of a thigh wand versus a patella marker. Gait Posture 30(3):375–378. doi:10.1016/j.gaitpost.2009.06.010 Miana AN, Prudencio MV, Barros RM (2009) Comparison of protocols for walking and running kinematics based on skin surface markers and rigid clusters of markers. Int J Sports Med 30 (11):827–833. doi:10.1055/s-0029-1234054 Most E, Axe J, Rubash H, Li G (2004) Sensitivity of the knee joint kinematics calculation to selection of flexion axes. J Biomech 37(11):1743–1748. doi:10.1016/j.jbiomech.2004.01.025 Noonan KJ, Halliday S, Browne R, O’Brien S, Kayes K, Feinberg J (2003) Interobserver variability of gait analysis in patients with cerebral palsy. J Pediatr Orthop 23(3):279–287 discussion 288-291 Perry J, Burnfield JM (2010) Gait Analysis. Normal and pathological function, 2nd edn. SLACK Incorporated, Thorofare Piazza SJ, Cavanagh PR (2000) Measurement of the screw-home motion of the knee is sensitive to errors in axis alignment. J Biomech 33(8):1029–1034 Reinschmidt C, van den Bogert AJ, Lundberg A, Nigg BM, Murphy N, Stacoff A, Stano A (1997a) Tibiofemoral and tibiocalcaneal motion during walking: external vs. skeletal markers. Gait Posture 6(2):98–109. doi:10.1016/S0966-6362(97)01110-7

18

F. Stief

Reinschmidt C, van den Bogert AJ, Nigg BM, Lundberg A, Murphy N (1997b) Effect of skin movement on the analysis of skeletal knee joint motion during running. J Biomech 30 (7):729–732 Richards J (2008) Biomechanics in clinic and research. Elsevier, Philadelphia Sandau M, Heimburger RV, Villa C, Jensen KE, Moeslund TB, Aanaes H, Alkjaer T, Simonsen EB (2015) New equations to calculate 3D joint centres in the lower extremities. Med Eng Phys 37 (10):948–955. doi:10.1016/j.medengphy.2015.07.001 Sangeux M, Peters A, Baker R (2011) Hip joint centre localization: evaluation on normal subjects in the context of gait analysis. Gait Posture 34(3):324–328. doi:10.1016/j.gaitpost.2011.05.019 Schache AG, Baker R, Lamoreux LW (2008) Influence of thigh cluster configuration on the estimation of hip axial rotation. Gait Posture 27(1):60–69. doi:10.1016/j.gaitpost.2007.01.002 Schulz BW, Kimmel WL (2010) Can hip and knee kinematics be improved by eliminating thigh markers? Clin Biomech (Bristol, Avon) 25(7):687–692. doi:10.1016/j.clinbiomech.2010.04.002 Schwartz MH, Rozumalski A (2005) A new method for estimating joint parameters from motion data. J Biomech 38(1):107–116. doi:10.1016/j.jbiomech.2004.03.009 Schwartz MH, Trost JP, Wervey RA (2004) Measurement and management of errors in quantitative gait data. Gait Posture 20(2):196–203. doi:10.1016/j.gaitpost.2003.09.011 Serfling DM, Hooke AW, Bernhardt KA, Kaufman KR, 2009 Comparison of techniques for finding the knee joint center. In: Proceedings of the gait and clinical movement analysis society. p 43 Stagni R, Fantozzi S, Cappello A, Leardini A (2005) Quantification of soft tissue artefact in motion analysis by combining 3D fluoroscopy and stereophotogrammetry: a study on two subjects. Clin Biomech (Bristol, Avon) 20(3):320–329. doi:10.1016/j.clinbiomech.2004.11.012 Stief F, Bohm H, Michel K, Schwirtz A, Doderlein L (2013) Reliability and accuracy in threedimensional gait analysis: a comparison of two lower body protocols. J Appl Biomech 29 (1):105–111 Szczerbik E, Kalinowska M (2011) The influence of knee marker placement error on evaluation of gait kinematic parameters. Acta Bioeng Biomech 13(3):43–46 Taylor WR, Ehrig RM, Duda GN, Schell H, Seebeck P, Heller MO (2005) On the influence of soft tissue coverage in the determination of bone kinematics using skin markers. J Orthop Res 23 (4):726–734. doi:10.1016/j.orthres.2005.02.006 Taylor WR, Kornaropoulos EI, Duda GN, Kratzenstein S, Ehrig RM, Arampatzis A, Heller MO (2010) Repeatability and reproducibility of OSSCA, a functional approach for assessing the kinematics of the lower limb. Gait Posture 32(2):231–236. doi:10.1016/j.gaitpost.2010.05.005 Vaughan CL, Davis BL, O’Conner JC (1992) Dynamics of human gait. Human Kinetics Publishers, Champaign Weidow J, Tranberg R, Saari T, Karrholm J (2006) Hip and knee joint rotations differ between patients with medial and lateral knee osteoarthritis: gait analysis of 30 patients and 15 controls. J Orthop Res 24(9):1890–1899. doi:10.1002/jor.20194 Wren TA, Do KP, Hara R, Rethlefsen SA (2008) Use of a patella marker to improve tracking of dynamic hip rotation range of motion. Gait Posture 27(3):530–534. doi:10.1016/j. gaitpost.2007.07.006 Wu G, Siegler S, Allard P, Kirtley C, Leardini A, Rosenbaum D, Whittle M, D’Lima DD, Cristofolini L, Witte H, Schmid O, Stokes I (2002) ISB recommendation on definitions of joint coordinate system of various joints for the reporting of human joint motion – part I: ankle, hip, and spine. International Society of Biomechanics. J Biomech 35(4):543–548

Next-Generation Models Using Optimized Joint Center Location Ayman Assi, Wafa Skalli, and Ismat Ghanem

Contents State of the Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Motion Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Motion Capture Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Joint Kinematics and Kinetics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hip Joint Center . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Predictive Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Functional Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Knee Joint Center . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ankle Joint Center . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Glenohumeral Joint Center . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Validation of the Joint Center Localization Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . X-Rays and Stereophotogrammetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Magnetic Resonance Imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3D Ultrasound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Low-Dose Biplanar X-Rays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Effect of Errors on JC Localization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2 3 3 4 4 6 6 8 10 10 11 11 11 12 12 13 14

A. Assi (*) Laboratory of Biomechanics and Medical Imaging, Faculty of Medicine, University of Saint-Joseph, Mar Mikhael, Beirut, Lebanon Institut de Biomécanique Humaine Georges Charpak, Arts et Métiers ParisTech, Paris, France e-mail: [email protected] W. Skalli Institut de Biomécanique Humaine Georges Charpak, Arts et Métiers ParisTech, Paris, France e-mail: [email protected] I. Ghanem Laboratory of Biomechanics and Medical Imaging, Faculty of Medicine, University of Saint-Joseph, Mar Mikhael, Beirut, Lebanon Hôtel-Dieu de France Hospital, University of Saint-Joseph, Beirut, Lebanon e-mail: [email protected] # Springer International Publishing AG 2016 B. Müller, S.I. Wolf (eds.), Handbook of Human Motion, DOI 10.1007/978-3-319-30808-1_27-1

1

2

A. Assi et al.

Errors on Kinematics and Kinetics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Errors on Musculoskeletal Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Correction of 3D Positioning of the JC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Registration Techniques for the Use of Exact Joint Center Location . . . . . . . . . . . . . . . . . . . . . . . . Estimation from External Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

14 15 15 16 16 16 17

Abstract

Joint center location is essential in order to define anatomical axes of skeletal segments and is therefore clinically significant for the calculation of joint kinematics during motion analysis. Different methods exist to localize joint centers using either predictive methods, based on anthropometric measurements, or functional methods, based on the relative movement of the segments adjacent to the joint. Validations of these methods using medical imaging have been extensively studied in the literature on different groups of subjects. Consequently, methods of correction between the calculated location of the joint center and the exact one, found by medical imaging, were suggested by several authors. Recent studies showed that new age-specific predictive methods could be computed in order to better locate joint coordinate systems. In the future, new techniques could use the exact locations of joint centers, which would be localized by medical imaging, in combination with motion capture techniques using registration techniques; thus, exact kinematics and kinetics of the joints could be computed. Keywords

Joint center • Predictive • Functional • Medical imaging • Validation

State of the Art Joint center location is essential in order to calculate anatomical 3D joint kinematics, kinetics, and muscle lever arms during musculoskeletal simulations. Several methods can be used in order to localize the joint center. These methods can be either predictive or functional. While the predictive methods are mainly based on regression equations that use anthropometric measurements, the functional ones require the performance of a joint movement in order to estimate the center of rotation between the two adjacent segments. Most of the softwares used in motion capture systems are based on the predictive methods. These methods have the advantage of being more rapid and easier to use, especially when used on subjects with disabilities who require assistance in performing the ranges of motion needed for the use of functional methods. Several authors have attempted to compare different methods or to validate these methods using 3D medical imaging as a gold standard. The CT scan has been used as a validation method since it is known to be precise for 3D reconstruction; however, its use is critical because of the high dose of

Next-Generation Models Using Optimized Joint Center Location

3

radiation which it entails. Magnetic resonance imaging could solve the problem of radiation, but this technique is known to be time-consuming because of acquisition time and the necessity of image segmentation during the post-processing phase. More recently, new techniques, using 3D ultrasound to obtain the joint center location, have been explored and were found to have a precision of 4 mm (Peters et al. 2010). The major inconvenience of this technique is the calibration needed prior to acquisition. Moreover, the skeletal segments, joint center, and external markers needed for motion analysis cannot be captured in the same image. In the past years, the low-dose biplanar X-ray technique has shown very high potential in clinical diagnosis of skeletal deformities through accurate 3D reconstructions of the spine and the lower limbs (Dubousset et al. 2005; Humbert et al. 2009; Chaibi et al. 2012). In addition to this feature, it allows 3D reconstruction of points of interest, such as joint centers. This technique has been shown to be precise (2.9 mm) in localizing external markers and joint centers that appear in the same image along with the skeletal segments. This technique was recently used in the validation of joint center localization techniques which are commonly used in the literature (Sangeux et al. 2014; Assi et al. 2016). The validation studies have shown that the functional methods are more precise than the predictive methods in localizing the joint center in the adult population. However, special attention should be given to the range of motion performed by the subject during the calibration trial (i.e., flexion-extension, abduction-adduction, and internal-external rotation should be > 30 ). It was surprising to find that this was not the case when it comes to children, where the predictive methods were found to be more precise compared to the functional ones (Peters et al. 2012; Assi et al. 2016). This chapter will first develop the need of joint center localization in motion analysis and then the current techniques of joint center localization that can be used during motion analysis processing. The validation process of these techniques will be reviewed at a later stage along with the effect of errors of misplacement of the joint center on kinematics, kinetics, and model simulations. The future directions will be discussed at the end of this chapter.

Motion Analysis Purpose Medical imaging is widely used in the diagnosis of musculoskeletal diseases through the use of images of human body anatomy. While different medical images can be used in order to visualize the musculoskeletal system, such as X-rays, CT scan, or MRI, these modalities allow images only in a static position. Apparent dynamic images can be obtained when the patient or subject is asked to perform a certain motion (i.e., shoulder) and then hold still in a given position during image acquisition. Thereafter, images are collected from different joint positions in order to obtain pseudo-dynamic images. The same images can be obtained using fluoroscopy, but

4

A. Assi et al.

this technique is known to be highly irradiant (since it is an X-ray video) and has small image dimensions. The technique of motion analysis has been widely used since the early 1990s in order to assess the joint motions of the musculoskeletal system, especially in patients with orthopedic disorders such as cerebral palsy (Gage 1993). This technique is based on 3D reconstruction by stereophotogrammetry of external markers positioned on the skin of a subject (Cappozzo et al. 2005).

Motion Capture Techniques Different motion capture techniques exist (check section on “Methods and Models: Dynamic Pose Estimation”). In this chapter, we will be focusing on infrared wavebased systems. The markers that are fixed on the subject’s skin could either be active or passive. Active markers send waves to the cameras fixed in the acquisition room, whereas passive markers only reflect waves to the transmissive-receptive cameras. These cameras send their waves at a high frequency (i.e., 50 Hz and more) in order to reconstruct the movement.

Joint Kinematics and Kinetics Since a 3D coordinate system can be obtained using three nonlinear points, marker placement on the skin respects this rule by placing at least three markers on each skeletal segment. Then, a local coordinate system is calculated for each skeletal segment at each frame of movement. These local coordinate systems are expressed in a global coordinate system defined in the acquisition room during a calibration process, performed prior to motion trials (Cappozzo et al. 1995, 2005) (check section on “Methods and Models: Data Analysis”). In a second step, the angles between adjacent local coordinate systems are calculated by applying either the Euler method or the Cardan method, which require the specification of the axes sequence. A consensus of joint angle definitions was set by the International Society of Biomechanics in 2005 (Wu et al. 2002, 2005). An illustration of motion capture and kinematic curves is presented in Fig. 1.

Need for a Joint Center The markers are usually placed on palpable bony landmarks and this is for two main reasons: (1) to avoid displacement of the marker during motion because of the underlying soft tissue movement and (2) to ensure repeatability of placement of markers between operators and between subjects. Moreover, the skeletal coordinate system obtained from these reflective markers should be anatomically relevant in order to obtain anatomical angles between adjacent segments, i.e., a local coordinate system of the humerus should have one axis that represents the diaphysis and a second axis in relation with the line joining the two epicondyles, and the third axis derives from the two others.

Next-Generation Models Using Optimized Joint Center Location

5

Fig. 1 Motion analysis: (a) subject equipped with reflective markers, (b) 3D capture of subject’s movement, (c) example of kinematic curves

6

A. Assi et al.

In some cases, a rigid body on which three or more markers are fixed, called cluster, is attached to the segment. This method serves to reduce soft tissue artifacts. The local coordinate system obtained from a cluster is not anatomically representative of the segment. Thus, the performance of a static calibration trial, in which a transformation matrix is calculated between the anatomical and cluster coordinate systems, is recommended prior to the movement trials. Since the local coordinate system of a skeletal segment should be anatomically relevant and representative of the geometry of the bone, in some cases the placement of a marker on the joint center is required, i.e., hip joint center for the femoral segment, knee joint center for the tibial segment, and glenohumeral joint center for the humeral segment. Since it is impossible to place a marker in the joint center, several techniques exist to approximate the location of this point. Kinematics calculated during gait analysis usually includes nine graphs. These graphs represent the joint angular waveforms during a gait cycle in the three planes: sagittal, frontal, and horizontal. Six of these nine graphs, the hip and the knee joints in the three planes each, are based on the use of the local coordinate system of the femur, which necessitates the hip joint center localization. Thus, the hip joint center is one of the most important joint centers to be localized in gait analysis. In the following section, we discuss the hip joint center in details followed by the methods used in other joints.

Hip Joint Center Different methods are available for the hip joint center localization. These methods could be predictive, based on cadaveric or anthropometric measurements, or functional, based on the movement of a segment relatively to the adjacent one.

Predictive Methods The predictive methods usually use anthropometric regression-based equations established from cadaveric specimens or in vivo medical imaging of the skeletal segments. Some examples of predictive methods are exposed in this section (more information can be found in previous chapters in section “Medical Application: Assessment of Kinematics”).

The Davis Method This is the most used model both in the literature and in many laboratories around the world since it is implemented in most of the motion analysis softwares (Davis et al. 1991). It is based on the examination of 25 hips. The predictive equations use the inter-ASIS (anterior superior iliac spine) distance, the anteroposterior distance between the ASIS, marker radius, and leg length. Thus, the location of the hip joint center is obtained in the pelvic coordinate system.

Next-Generation Models Using Optimized Joint Center Location

7

Fig. 2 Example of predictors (pelvic depth, pelvic width, and lower limb length) used in regression equations to localize the hip joint center

The Bell Method In 1982, Tylkowski defined a method of prediction of the hip joint center in the lateral plane, while Andriacchi predicted the position of the hip joint center in the frontal plane (Tylkowski et al. 1982; Andriacchi and Strickland 1985). Bell et al. used these two techniques to develop a new predictive method to localize the hip joint center (Bell et al. 1989). The method was based on anterior-posterior (AP) radiographs for children and adults; however, validation was only performed using AP and lateral radiographs on dry pelvises of adults. The Harrington Method In 2007, Harrington suggested an image-based validation of previous predictive methods (Davis, Bell, and motion analysis software) using MRI and a new predictive method (Harrington et al. 2007). The new method is derived from 8 adults, 14 healthy children, and 10 children with CP. The hip joint center location in a pelvic coordinate system was found by fitting a sphere to points identified on the femoral head. The new predictive method was based on pelvic width, pelvic depth, and leg length (Fig. 2).

8

A. Assi et al.

Functional Methods The functional methods derive from the movement of a skeletal segment relatively to the adjacent one. The subject is asked to perform a movement of the joint, i.e., for the hip joint, a movement of flexion-extension and/or abductionadduction is performed. Since the hip is approximated as a ball and socket joint, each marker attached to the moving segment (i.e., the thigh) moves on a sphere surface where its center is the center of the hip joint. Thus, the joint center is assumed to be the center of rotation (CoR) defined by the movement between two adjacent segments. The first to describe this technique was Cappozzo in 1984 (Cappozzo 1984). Several algorithms or mathematical techniques of computing the CoR exist; we will be discussing the most common ones in the following section.

Sphere Fit Methods This technique assumes that the CoR is stationary; this assumption can be true if one segment (segment 1) is at rest. The markers on the adjacent segment (segment 2) move on the surface of a sphere with specific radii around one common CoR. The frequent approach uses the minimization of the sum of the squared Euclidean distances between the sphere and the marker positions. A selected cost function determines whether the optimal solution can be calculated exactly or only approximately by successive iterative steps toward the optimal CoR. Some authors use the least squares method which gives an exact CoR estimation (Pratt 1987; Gamage and Lasenby 2002). An example is displayed in Fig. 3. Alternative approaches are iterative (Halvorsen 2003). In the geometric method, an initial guess of the CoR is required, whereas in the algebraic methods do not require a starting estimate. The major disadvantage of these techniques is in the convergence to local minima of the cost function and in their poor accuracy of estimating the CoR when a reduced ROM is performed. Chang et al. (2007) proposed a new numerical technique of sphere fit methods that can be used for reduced ROM. Center Transformation Technique The center transformation technique (CTT) assumes that at least three markers on the moving segment are present; it is then possible to define a rigid-body transformation (rotations and translations) which transforms a given reference marker at one frame into another frame (Piazza et al. 2004). The appropriate transformation of these local systems for all time frames into a common reference system enables the approximation of the joint center at a fixed position. Another approach, called the two-sided approach that does not require the assumption of a stationary CoR, can be alternatively used (Schwartz and Rozumalski 2005). Score Technique This algorithm is a continuation of the CTT method, with the assumption that the coordinates of the CoR must remain constant relative to both segments, without requiring the assumption that one segment remain at rest (Ehrig et al. 2006).

Next-Generation Models Using Optimized Joint Center Location

9

Fig. 3 Sphere fit method: trajectories of four different markers (each marker with a different color) during calibration trials. The star points represent the center of the spheres. The black star point represents the middle of the different points

Calibration Trial While different movements can be performed especially in a ball and socket joint (i.e., the hip or the glenohumeral joint), such as flexion-extension, abduction-adduction, internal-external rotation, or circumduction, Camomilla et al. in 2006 showed that the movement that gives the most accurate results on calculating the hip joint center using the functional technique is the star arc (Camomilla et al. 2006). This calibration trial consisted of several flexion-extension/abduction-adduction movements performed on vertical planes of different orientations, followed by a circumduction movement. Uncertainties Related to Low ROM It has been shown that the errors on the localization of a joint center increase when the range of motion (ROM) performed during the calibration method decreases. In Ehrig et al.’s study, the approaches of CTT, score, and sphere fit were tested in a simulation model, while noise was added to the markers (Ehrig et al. 2006). The RMS between the calculated CoR and the exact one was calculated. RMS errors decreased exponentially with increasing ROM. Theoretical accuracy on the position of the CoR was within 1 cm using all approaches when the ROM increased beyond 20 . Accuracy was within 0.3 cm as long as the ROM of the joint was 45 or more. Some of these simulations have taken into account the skin-marker movement which is an additional source of uncertainty.

10

A. Assi et al.

Therefore, some authors prefer to assist the patients/subjects when they are performing the functional calibration movement, in order to make sure that the ROM is adequate for functional localization of the joint center (Peters et al. 2012). Piazza in 2001 used a mechanical model to simulate the errors on the localization of the hip joint center using functional methods (Piazza et al. 2001). Significant increases in the magnitude of HJC location errors (4–9 mm) were noted when the range of hip motion was reduced from 30 to 15 . The same result was found by Camomilla et al. (2006). The accuracy of the HJC estimate improved, with an increasing rate, as a function of the amplitude of the performed movements in the hip.

Knee Joint Center In the particular case of the knee, a center of rotation is usually calculated using the predictive method, using external markers located on the condyles. The Davis model, presented above, also predicts the knee joint center which is approximated as being in the middle of the knee width (measured during clinical examination): from the external to the internal condyle and defined relatively to the thigh marker (Davis et al. 1991). In the functional method, a knee axis is usually calculated that represents the complexity of the flexion-extension movement. The knee flexion is a combination of the femoral condyles rolling (rotation) over the tibial plateau and the posterior gliding (translation) of the condyles along the plateau (Ramsey and Wretenberg 1999). Usually, two mathematical methods exist to calculate the axis of rotation (AoR) of the knee. The first method fits cylindrical arcs to the moving segment, while assuming that the adjacent segment is at rest (Gamage and Lasenby 2002; Halvorsen 2003). The second method is based on the transformation techniques (CTT) exposed above, where the helical axes technique is used based on the work of Woltring et al. (1985). More recently, another algorithm for the localization of the AoR was presented by Ehrig et al. (2007); the symmetrical axis of rotation approach determines a unique axis of rotation that considers the movement of two dynamic segments simultaneously.

Ankle Joint Center The most frequently used methods to localize the ankle joint center are predictive, using external markers located on the malleoli. The Davis method (Davis et al. 1991) is the most common. A similar strategy to the knee localization method is applied to obtain the ankle joint center. While the Davis model is the most commonly used for clinical gait analysis, the foot is represented as a single segment, and only ankle joint motion is quantified. In order to quantify the dynamic adaptability of the different foot segments, several

Next-Generation Models Using Optimized Joint Center Location

11

models have been described. The most commonly used model is the Oxford foot model, where three segments are defined in the foot (hindfoot, forefoot, hallux) in addition to the tibial segment (Stebbins et al. 2006). Other models described in the literature use four segments, such as the Leardini (calcaneus, midfoot, first metatarsal, and the hallux) (Leardini et al. 1999b) and Jenken models (hindfoot, midfoot, medial forefoot, lateral forefoot) (Jenkyn and Nicol 2007). Further information on this topic can be found in the preceding chapter (▶ Variations of Marker-Sets and Models for Standard Gait Analysis).

Glenohumeral Joint Center Different predictive and functional methods exist to localize the glenohumeral joint center. In the predictive Meskers’ method, a linear regression is used to predict the glenohumeral joint center based on specific points on the scapula, the acromioclavicular joint and the processus coracoideus (Meskers et al. 1997). This method was elaborated by digitizing 36 sets of cadaver scapulae and adjacent humeri. The functional methods are based on the movement of the humerus relatively to the scapula or the thorax; the same algorithms as the one used for the hip joint center can be applied. The same result, as for the hip joint center, was also found regarding the low ROM when performed during the calibration trial; Lempereur et al. showed that high amplitude of movement should be performed (>60 ) in order to improve reliability when functional methods are used for the localization of the glenohumeral joint center (Lempereur et al. 2011). Further information on this topic can be found in the chapter on “▶ Upper Extremity Models for Clinical Movement Analysis.”

Validation of the Joint Center Localization Methods Several authors have attempted to assess the accuracy of both predictive and functional methods by localizing the joint center obtained by medical imaging as a gold standard. The technique consists of obtaining the joint center in 3D, while being expressed in the local coordinate system of the adjacent segment. The latter is built based on the external markers placed on the skin. The joint centers calculated through predictive and functional methods would also be expressed in the same local coordinate system of the adjacent segment. Thus, when all calculated joint centers are expressed in the same coordinate system, distances from each joint center to the gold standard could be calculated, and a comparison between methods could be performed.

X-Rays and Stereophotogrammetry The method developed by Bell in 1989 was based on the use of digitized AP radiographs with localization of specific bony landmarks on the radiograph as well

12

A. Assi et al.

as digitization of the center of a circle that matches the size of the femoral head (Bell et al. 1989). In 1990, Bell et al. were the first to use pairs of orthogonal radiographs (Bell et al. 1990); by knowing the exact distances between X-ray sources and film cassette locations, it was possible to estimate in 3D the location of the bony landmarks and the pelvic skin markers. The accuracy of HJC localization methods were thus assessed for methods such as the functional method described by Cappozzo in 1984 but also the predictive methods described by Tylkowski and Andriacchi (Tylkowski et al. 1982; Andriacchi and Strickland 1983; Cappozzo 1984). In another study, Leardini et al. assessed the validity of functional and predictive methods in calculating the HJC on 11 healthy adults (Leardini et al. 1999a). The average root mean square (RMS) distance to the gold standard was 25–30 mm for predictive methods and 13 mm for functional method. The technique of stereoradiography has the major disadvantage of being irradiant to the patient.

Magnetic Resonance Imaging In a study lead by Harrington et al., MRI was used as the gold standard in obtaining the hip joint center for a population of healthy adults, healthy children, and children with cerebral palsy (Harrington et al. 2007). The validation of existing predictive methods was assessed in addition to a new method presented by the authors. In a study performed by Lempereur et al., the authors used MRI acquisition to validate several functional methods for the localization of the glenohumeral JC (Lempereur et al. 2010). This technique required the coverage of the scapula with 120 reflective markers in order to perform the matching between the surface of the scapula obtained by motion analysis capture and MRI reconstruction. This technique is time-consuming since it requires manual segmentation. The major disadvantage of the MRI technique is the time of acquisition and the time of image processing.

3D Ultrasound The ultrasound technique was widely used in order to validate the JC localization methods since it is not irradiant and easier to perform compared to the MRI technique. However, the US method requires a calibration process in order to obtain 3D reconstructions of the JC. In a study performed by Peters et al., the authors described the required calibration process in order to obtain 3D US reconstructions (Peters et al. 2010). The repeatability of the technique was assessed as well as the accuracy of the localization of a reference object within a water basin. The accuracy was about 4  2 mm. After the validation of this technique, the same authors performed different studies on the validation of both predictive and functional HJC localization techniques in adults (Sangeux et al. 2011) and both typically

Next-Generation Models Using Optimized Joint Center Location

13

developing children and children with cerebral palsy (Peters et al. 2012). In the study on adults, it was shown that the functional method and more precisely the geometric sphere fitting method was the most precise in localizing the HJC (mean absolute distance error of about 15 mm) followed by the Harrington predictive method. In the study on TD and CP children, the Harrington method was the closest method to the 3D US technique (14  8 mm), whereas the functional techniques performed much worse (22–33 mm). It should be noted that the functional calibration trials of the hip had been assisted by an external operator. The 3D ultrasound technique has also been used for the localization of the glenohumeral joint (Lempereur et al. 2013).

Low-Dose Biplanar X-Rays More recently, the low-dose biplanar X-ray technique (Dubousset et al. 2005; Humbert et al. 2009; Chaibi et al. 2012) was applied in order to validate JC localization techniques. The EOS system was used as an image-based reference. The localization of external markers was reliable within 0.15 mm for trained operators, and the mean accuracy for HJC localization was 2.9  1.3 mm (Pillet et al. 2014), even less than the values obtained by the 3D US method. The EOS system allows the acquisition in the same image of external markers, skeletal segments, and joint centers. Thus, a joint center can be located directly in the local coordinate system of the adjacent segment, based on the location of the external markers (Fig. 4). The EOS system was used to compare the accuracy of several predictive and functional techniques in localizing the HJC in healthy adults (Sangeux et al. 2014). Different scenarios were applied when functional methods were assessed: different algorithms, different ranges of motion of the hips (30 ), and selfperformance or assisted performance. The best results were obtained for the comfortable ROM when they were self-performed by the subjects. The best method was the functional geometrical sphere fitting method which localized the hips 1.1 cm from the EOS reference. It was shown that the worst results were obtained for functional methods when the ROM was reduced. In the latter case, the best method was the Harrington predictive method which localizes the HJC at 1.7 cm from the EOS reference. In a more recent study, the EOS system was used to evaluate the accuracy of both predictive and functional methods in TD and CP children (Assi et al. 2016). Contrarily to the findings in adults, the functional methods performed much worse (>60 mm) compared to the predictive methods, where the Harrington method showed the best results (18  9 mm). The authors explained the differences in results between adults and children as being due to the shorter length of the thigh segment in children, which could increase the noise when the algorithms of functional methods are applied to locate the CoR. It was also shown that children with CP performed significantly lower ROM of hip movements during calibration compared to TD children. However, average ROM in both groups was >30 , and the ROM was not a confounding factor on the errors on the HJC calculated by the functional methods.

14

A. Assi et al.

Fig. 4 Frontal and lateral X-rays of the lower limbs obtained by low-dose biplanar X-rays, with the external markers fixed on the skin as well as the 3D reconstruction of the femur and the hip joint center, expressed in the local coordinate system of the pelvis

In a study performed by Lempereur et al., different functional methods as well as the 3D ultrasound technique were compared in localizing the glenohumeral joint center relatively to the one obtained by the 3D EOS reconstruction, considered as the reference (Lempereur et al. 2013). The 3D ultrasound technique placed the glenohumeral joint center at 14 mm from the EOS image-based reference, while functional methods varied from 15.4 mm (the helical axis method) to 34 mm using iterative methods (Halvorsen 2003).

Effect of Errors on JC Localization Errors on Kinematics and Kinetics Misplacement errors of joint centers can distort kinematics and kinetics of the hip and knee in the case of gait analysis, since the thigh local coordinate system is affected. The effects of hip JC misplacement on gait analysis were studied by Stagni et al. (2000). The latter found an error on the joint moment that can reach 22% of

Next-Generation Models Using Optimized Joint Center Location

15

flexion-extension and 15% on abduction-adduction with a delay of 25% of the flexion to extension timing of the stride duration. In a study performed by Kiernan et al., the authors assessed the clinical agreement of the Bell, Davis, and Orthotrak methods in localizing the HJC compared to the Harrington method as a gold standard (Kiernan et al. 2015). This was applied on 18 healthy children. Kinematics, kinetics, Gait Profile Score, and Gait Deviation Index were calculated. The authors found that errors, when the Davis or Orthotrak methods were used, are clinically meaningful especially on kinetics. The results on the glenohumeral joint were different from those obtained on the hip. Lempereur et al. showed that misplacement of the glenohumeral joint center will propagate to the kinematics of the shoulder, but errors do not exceed 4.8 on the elevation angle during shoulder flexion and 4.3 on the elevation plane during shoulder abduction (Lempereur et al. 2014). The authors related this difference of propagated errors between the movements of the arm and the thigh to the difference in mass between the two segments.

Errors on Musculoskeletal Simulations Inaccurate localization of JC can also influence the results obtained in musculoskeletal simulations. In a study performed by Scheys et al., moment-arm and muscletendon lengths were computed using three kinds of musculoskeletal models: a personalized model based on MRI data, an isotropic rescaled generic model, and an anisotropic rescaled generic model (Scheys et al. 2008). Different hip joint center techniques were used in each of the models. These simulations were applied on the gait of an asymptomatic adult. The generic model simulations showed large offsets of moment arm and muscle-tendon lengths when compared to the personalized model for most of the major muscles of the lower limbs.

Future Directions Joint center localization is essential in order to obtain anatomically accurate kinematics and kinetics. Joint center localization techniques could be either predictive or functional. While predictive techniques are based on regression equations that use anthropometric measurements, the functional techniques require the performance by the subject of ranges of motion in the joint of interest in order to calculate the center of rotation between the two adjacent segments of the joint. Several authors have validated these techniques in children and adults by comparing the location of the joint center obtained by these methods to the joint center obtained by 3D medical imaging. The most frequently used medical imaging system in the validation processes were stereoradiography, CT scan, MRI, 3D ultrasound, and more recently low-dose biplanar X-rays. It was shown that the functional methods were more accurate in locating the joint center compared to the predictive ones in the adult population, which was not the

16

A. Assi et al.

case in children. This could be due to the shorter segment in children which renders the markers in movement closer to the joint center, thus increasing the noise during the calculation of the joint center. Moreover, the amount of ROM performed during functional calibration should be within certain limits: a low ROM could not be sufficient for the calibration process and a high ROM could induce more soft tissue artifacts. The errors on the localization of the joint center have been shown to directly affect both kinematic and kinetic calculation. They also affect the computation of muscle lever arms when running musculoskeletal simulations.

Correction of 3D Positioning of the JC Several authors have shown the deviation of the joint center calculated by either predictive or functional methods compared to the exact joint center obtained by 3D medical imaging in each direction: anterior-posterior, medial-lateral, and superiorinferior. A first solution could be the correction of this location prior to the calculation of kinematics or kinetics or computation of musculoskeletal simulation (Sangeux et al. 2014; Assi et al. 2016).

Registration Techniques for the Use of Exact Joint Center Location It was shown in the validation methods that both predictive and functional methods localize the joint center at 11–18 mm from its exact location. In an ideal setting, when a medical imaging tool is present in the same laboratory along with the motion capture equipment, the exact location of the joint center should be used in the calculation process of kinematics and kinetics, even for musculoskeletal simulations. Markers would be placed on the patient, and an image acquisition, such as the EOS biplanar X-rays, will be performed in order to obtain the exact 3D location of the joint center in the local coordinate system of the adjacent segment (i.e., location of the hip joint center expressed in the local coordinate system of the pelvis, glenohumeral joint center expressed in the local coordinate system of the scapula or thorax). These 3D coordinates would be used after motion analysis acquisition for either kinematic/kinetic calculations or musculoskeletal model simulations.

Estimation from External Information Another solution could be to optimize regression equations of joint center localization techniques. Since the validation methods showed different results depending on the population type, new regression equations could be tailored to each population (i.e., children and adults). The new biplanar low-dose X-ray technique could allow the acquisition in the same image of the 3D reconstruction of both the external markers and the joint center. Thus, a large cohort of subjects/patients of different age

Next-Generation Models Using Optimized Joint Center Location

17

Fig. 5 Estimation of the hip joint center using the 3D reconstruction of the skin and the skeleton and based on morphological and barycentermetric predictors (Nerot et al. 2016)

intervals could allow the attainment of age-specific regression equations based on anthropometric measurements. The possibility to get both the external envelope and the internal skeleton (Nérot et al. 2015a, b) opens the way for a large-scale analysis and improvement on the regression equations for joint center localization by combining morphological and barycentermetric predictors (Fig. 5).

References Andriacchi T, Strickland A (1985) Gait analysis as a tool to assess joint kinetics. In: Berme N, Engin A, Correia Da Silva K, (eds). Biomechanics of Normal and Pathological Human Articulating Joints. Martinus Nijhoff, Dordrecht: NATO SI Series. pp. 83–102. Assi A, Sauret C, Massaad A, Bakouny Z, Pillet H, Skalli W, et al (2016) Validation of hip joint center localization methods during gait analysis using 3D EOS imaging in typically developing and cerebral palsy children. Gait Posture [Internet] 42:30–5. Available

18

A. Assi et al.

from http://dx.doi.org/10.1016/j.gaitpost.2016.04.028%5Cn, http://linkinghub.elsevier.com/ retrieve/pii/S0966636216300455%5Cn, http://dx.doi.org/10.1016/j.gaitpost.2015.06.089 Bell AL, Brand RA, Pedersen DR (1989) Prediction of hip joint centre location from external landmarks. Hum Mov Sci [Internet] 8(1):3–16. Available from: http://www.sciencedirect.com/ science/article/pii/0167945789900201. [cited 2015 Oct 27] Bell AL, Pedersen DR, Brand RA (1990) A comparison of the accuracy of several hip center. J Biomech 23:6–8 Camomilla V, Cereatti A, Vannozzi G, Cappozzo A (2006) An optimized protocol for hip joint centre determination using the functional method. J Biomech [Internet] 39(6):1096–1106. Available from: http://linkinghub.elsevier.com/retrieve/pii/S0021929005001004 Cappozzo A (1984) Gait analysis methodology. Hum Mov Sci 3:27–50 Cappozzo A, Catani F, Della Croce U, Leardini A (1995) Position and orientation in space of bones during movement: anatomical frame definition and determination. Clin Biomech [Internet] 10(4):171–178. Available from: http://www.sciencedirect.com/science/article/pii/ 026800339591394T Cappozzo A, Della Croce U, Leardini A, Chiari L (2005) Human movement analysis using stereophotogrammetry. Part 1: theoretical background. Gait Posture [Internet] 21(2):186–196. Available from: http://www.sciencedirect.com/science/article/pii/S0966636204000256. [cited 2015 Nov 4] Chaibi Y, Cresson T, Aubert B, Hausselle J, Neyret P, Hauger O et al (2012) Fast 3D reconstruction of the lower limb using a parametric model and statistical inferences and clinical measurements calculation from biplanar X-rays. Comput Methods Biomech Biomed Eng 15(5):457–466 Davis RB, Ounpuu S, Tyburski D, Gage JR (1991) A gait analysis data collection and reduction technique. Hum Mov Sci 10(5):575–587 Dubousset J, Charpak G, Dorion I, Skalli W, Lavaste F, Deguise J et al (2005) A new 2D and 3D imaging approach to musculoskeletal physiology and pathology with low-dose radiation and the standing position: the EOS system. Bull Acad Natl Med [Internet]. 189(2):287–297. Available from: http://www.ncbi.nlm.nih.gov/pubmed/16114859 Ehrig RM, Taylor WR, Duda GN, Heller MO (2006) A survey of formal methods for determining the centre of rotation of ball joints. J Biomech [Internet] 39(15):2798–2809. Available from: http://linkinghub.elsevier.com/retrieve/pii/S002192900500446X Ehrig RM, Taylor WR, Duda GN, Heller MO (2007) A survey of formal methods for determining functional joint axes. J Biomech 40(10):2150–2157 Gage JR (1993) Gait analysis. An essential tool in the treatment of cerebral palsy. Clin Orthop Relat Res [Internet] (288):126–134. Available from: http://www.ncbi.nlm.nih.gov/pubmed/8458125 Gamage SSHU, Lasenby J (2002) New least squares solutions for estimating the average centre of rotation and the axis of rotation. J Biomech 35(1):87–93 Halvorsen K (2003) Bias compensated least squares estimate of the center of rotation. J Biomech [Internet] 36(7):999–1008. Available from: http://www.sciencedirect.com/science/article/pii/ S0021929003000708. [cited 2016 Jun 21] Harrington ME, Zavatsky AB, Lawson SEM, Yuan Z, Theologis TN (2007) Prediction of the hip joint centre in adults, children, and patients with cerebral palsy based on magnetic resonance imaging. J Biomech [Internet] 40(3):595–602. Available from: http://linkinghub.elsevier.com/ retrieve/pii/S0021929006000583 Humbert L, De Guise JA, Aubert B, Godbout B, Skalli W (2009) 3D reconstruction of the spine from biplanar X-rays using parametric models based on transversal and longitudinal inferences. Med Eng Phys 31(6):681–687 Jenkyn TR, Nicol AC (2007) A multi-segment kinematic model of the foot with a novel definition of forefoot motion for use in clinical gait analysis during walking. J Biomech 40(14):3271–3278 Kiernan D, Malone A, O’Brien T, Simms CK (2015) The clinical impact of hip joint centre regression equation error on kinematics and kinetics during paediatric gait. Gait Posture [Internet] 41(1):175–179. Available from: http://www.sciencedirect.com/science/article/pii/ S0966636214007255

Next-Generation Models Using Optimized Joint Center Location

19

Leardini A, Cappozzo A, Catani F, Toksvig-Larsen S, Petitto A, Sforza V et al (1999a) Validation of a functional method for the estimation of hip joint centre location. J Biomech 32(1):99–103 Leardini A, O’Connor JJ, Catani F, Giannini S (1999b) Kinematics of the human ankle complex in passive flexion; a single degree of freedom system. J Biomech 32(2):111–118 Lempereur M, Leboeuf F, Brochard S, Rousset J, Burdin V, Rémy-Néris O (2010) In vivo estimation of the glenohumeral joint centre by functional methods: accuracy and repeatability assessment. J Biomech 43(2):370–374 Lempereur M, Brochard S, Rémy-Néris O (2011) Repeatability assessment of functional methods to estimate the glenohumeral joint centre. Comput Methods Biomech Biomed Eng 5842:1–6 Lempereur M, Kostur L, Leboucher J, Brochard S, Rémy-Néris O (2013) 3D freehand ultrasound to estimate the glenohumeral rotation centre. Comput Methods Biomech Biomed Eng [Internet] 16(Suppl 1):214–215. Available from: http://www.ncbi.nlm.nih.gov/pubmed/23923914 Lempereur M, Leboeuf F, Brochard S, Rémy-Néris O (2014) Effects of glenohumeral joint centre mislocation on shoulder kinematics and kinetics. Comput Methods Biomech Biomed Eng [Internet] 17(Suppl 1):130–131. Available from: http://www.ncbi.nlm.nih.gov/pubmed/25074199 Meskers CGM, Van Der Helm FCT, Rozendaal LA, Rozing PM (1997) In vivo estimation of the glenohumeral joint rotation center from scapular bony landmarks by linear regression. J Biomech 31(1):93–96 Nérot A, Choisne J, Amabile C, Travert C, Pillet H, Wang X, et al (2015a) A 3D reconstruction method of the body envelope from biplanar X-rays: evaluation of its accuracy and reliability. J Biomech [Internet] 48(16):4322–4326. Available from: http://dx.doi.org/10.1016/j. jbiomech.2015.10.044 Nérot A, Wang X, Pillet H, Skalli W (2015b) Estimation of hip joint center from the external body shape: a preliminary study. Comput Methods Biomech Biomed Eng [Internet] 5842:1–2. Available from: http://www.tandfonline.com/doi/full/10.1080/10255842.2015.1069603 Peters A, Baker R, Sangeux M (2010) Validation of 3-D freehand ultrasound for the determination of the hip joint centre. Gait Posture [Internet] 31(4):530–2. Available from: http://linkinghub. elsevier.com/retrieve/pii/S0966636210000299 Peters A, Baker R, Morris ME, Sangeux M (2012) A comparison of hip joint centre localisation techniques with 3-DUS for clinical gait analysis in children with cerebral palsy. Gait Posture [Internet] 36(2):282–286. Available from: http://linkinghub.elsevier.com/retrieve/pii/ S0966636212000999 Piazza SJ, Okita N, Cavanagh PR (2001) Accuracy of the functional method of hip joint center location: effects of limited motion and varied implementation. J Biomech 34(7):967–973 Piazza SJ, Erdemir A, Okita N, Cavanagh PR (2004) Assessment of the functional method of hip joint center location subject to reduced range of hip motion. J Biomech 37:349–356 Pillet H, Sangeux M, Hausselle J, El Rachkidi R, Skalli W (2014) A reference method for the evaluation of femoral head joint center location technique based on external markers. Gait Posture [Internet] 39(1):655–658. Available from: http://linkinghub.elsevier.com/retrieve/pii/ S096663621300578X Pratt V (1987) Direct least-squares fitting of algebraic surfaces. Comput Graph (ACM) 21:145–152 Ramsey DK, Wretenberg PF (1999) Biomechanics of the knee: methodological considerations in the in vivo kinematic analysis of the tibiofemoral and patellofemoral joint. Clin Biomech 14 (9):595–611 Sangeux M, Peters A, Baker R (2011) Hip joint centre localization: evaluation on normal subjects in the context of gait analysis. Gait Posture [Internet] 34(3):324–328. Available from: http://dx.doi. org/10.1016/j.gaitpost.2011.05.019 Sangeux M, Pillet H, Skalli W (2014) Which method of hip joint centre localisation should be used in gait analysis? Gait Posture [Internet] 40(1):20–25. Available from: http://linkinghub.elsevier. com/retrieve/pii/S0966636214000642 Scheys L, Spaepen A, Suetens P, Jonkers I (2008) Calculated moment-arm and muscle-tendon lengths during gait differ substantially using MR based versus rescaled generic lower-limb musculoskeletal models. Gait Posture 28(4):640–648

20

A. Assi et al.

Schwartz MH, Rozumalski A (2005) A new method for estimating joint parameters from motion data. J Biomech [Internet] 38(1):107–116. Available from: http://linkinghub.elsevier.com/ retrieve/pii/S002192900400137X Stagni R, Leardini A, Cappozzo A, Grazia Benedetti M, Cappello A (2000) Effects of hip joint centre mislocation on gait analysis results. J Biomech 33(11):1479–1487 Stebbins J, Harrington M, Thompson N, Zavatsky A, Theologis T (2006) Repeatability of a model for measuring multi-segment foot kinematics in children. Gait Posture. 23(4):401–410 Tylkowski C, Simon S, Mansour J (1982) Internal rotation gait in spastic cerebral palsy. In: Nelson JP (ed) Proceedings of the 10th Open Scientific Meeting of the Hip Society. C. V. Mosby, St Louis, pp 89–125 Woltring H, Huiskes R, de Lange A, Veldpaus F (1985) Finite centroid and helical axis estimation from noisy landmark measurements in the study of human joint kinematics. J Biomech 18 (5):379–389 Wu G, Siegler S, Allard P, Kirtley C, Leardini A, Rosenbaum D et al (2002) ISB recommendation on definitions of joint coordinate system of various joints for the reporting of human joint motion – Part I: ankle, hip, and spine. J Biomech [Internet] 35(4):543–548. Available from: http://www.sciencedirect.com/science/article/pii/S0021929001002226 Wu G, Van Der Helm FCT, Veeger HEJ, Makhsous M, Van Roy P, Anglin C et al (2005) ISB recommendation on definitions of joint coordinate systems of various joints for the reporting of human joint motion – Part II: shoulder, elbow, wrist and hand. J Biomech 38(5):981–992

Kinematic Foot Models for Instrumented Gait Analysis Alberto Leardini and Paolo Caravaggi

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . State of the Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Validation and Application of Foot Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Validation Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Musculoskeletal Multi-Segment Foot Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kinetic Analysis Including Foot Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2 6 11 11 12 13 15 16 17 18

Abstract

In many clinical and biomechanical contexts of human motion analysis the model assumption of the foot as a single rigid segment is no longer acceptable. This has given rise to a large number of proposals for multi-segment foot models. The relevant experimental and analytical techniques differ for many aspects: the number of foot segments; the bony landmarks involved; the type of marker clusters; the definition of the anatomical frames; and the convention for the calculation of joint rotations. Different definitions of neutral reference posture have also been adopted, along with their utilization to offset kinematic data. Following previous partial review papers, the present chapter aims at introducing the current methodological studies for in vivo analysis of multi-segment foot kinematics. The survey has found more than 30 different techniques; however, only a limited number of these have reported convincing validation activities and A. Leardini (*) • P. Caravaggi Movement Analysis Laboratory and Functional-Clinical Evaluation of Prostheses, Istituto Ortopedico Rizzoli, Bologna, Italy e-mail: [email protected]; [email protected]; [email protected] # Springer International Publishing AG 2016 B. Müller, S.I. Wolf (eds.), Handbook of Human Motion, DOI 10.1007/978-3-319-30808-1_28-1

1

2

A. Leardini and P. Caravaggi

have been exploited in a clinical context. A number of papers have also compared the experimental performance of different multi-segment foot models and highlighted the main advantages and disadvantages of each of them. Important biomechanical applications of musculoskeletal models for reliable estimation of biomechanical parameters are also discussed. In addition, we report on the feasibility and limitations of kinetic analysis applied to multi-segment foot models from ground reaction force data. The chapter ends with recommendations both for the selection of a most suitable technique from those already available, as well as for the design of an original one suitable to address the needs of the specific application.

Keywords

Foot joint mobility • 3D joint motion • Multi-segment kinematics • Ankle complex • Chopart joint • Lisfranc joint • Metatarsophalangeal joint • Foot arches • Stereophotogrammetry • Marker clusters • Skin motion artifact

Introduction In standard clinical gait analysis body segments are tracked in three-dimensions (3D) by a stereophotogrammetric system, while their relative positions are calculated to assess patterns of joint rotations during the execution of motor activities. Full-body gait analysis requires several passive reflective markers to be fixated to the trunk, pelvis, thigh, shank, and foot. Kinematics of segments are assessed with respect to the laboratory co-ordinate system, i.e., the absolute motion, and with respect to any adjoining segments for calculation of relative joint rotations. Frequently, motion of the trunk and pelvis is reported in the laboratory reference frame, in case with respect to the line of progression, together with hip, knee, and ankle joint motion. The moment of the external forces can also be calculated at these joints as the product of the joint center distance and the ground reaction force recorded by the force plate. Together with the spatiotemporal parameters (e.g., walking speed and stride length), these are the standard kinematic parameters necessary to assess, and characterize, most of the pathological conditions investigated in gait analysis. The use of skinbased reflective markers to track body segments comprised of single bones (e.g., the femur or the humerus) can result in a fairly accurate representation of their real motion. From a kinematic prospective, these segments can reasonably be assumed to move as rigid bodies, thus the position of at least three nonaligned points is required for their motion to be tracked in the 3D space. In gait analysis, these points are mostly palpable bony landmarks, the temporal position of which is tracked via reflective markers attached to the skin. With skin markers, however, the rigid body assumption is violated and accuracy is lower when multiple small bones are connected in a reduced volume (Nester et al. 2010). On the other hand, prospects to continuously track different segments are of particular relevance in the evaluation of pathologies regarding the shank and foot. Although the tibia and fibula bones present very small relative motion and thus can be considered reasonably as a single segment for

Kinematic Foot Models for Instrumented Gait Analysis

3

kinematic analysis (Arndt et al. 2007), the foot is made up of 26 bones and several joints connecting them. Therefore, standard kinematic protocols based on three markers only appear inadequate in describing the complex foot biomechanics. In addition, foot bones are rather small and some of them, e.g., the talus, have no clear palpable landmarks, thus making it very difficult for those to be tracked in 3D space. The importance of multi-segment foot models (MFMs) rather than singlesegment foot tracking has been largely discussed in the literature. Benedetti et al. (2011) demonstrated the value of 3D motion analysis of the ankle joint in the clinical context. De Ridder et al. (2015) compared results from a MFM (“Ghent” in Table 1, by De Mits et al. 2012) and a single-segment foot model and showed the value of distal factors in chronic ankle instability, in particular the deviation in kinematics at the midfoot, which simply cannot be detected with a rigid foot model. Pothrat et al. (2015) reported significant differences and even opposite results for the same variables when the multi-segment Oxford Foot Model (OFM, see Table 1) and the Plugin-Gait (modeling the foot as a single segment) were used to characterize normal and flat feet, concluding that the type of foot model strongly affects the measured ankle joint kinematics. Dixon et al. (2012) performed a similar study, i.e., OFM versus Plug-in-Gait albeit on kinetic data, and revealed that the latter overestimated ankle power by about 40% with respect to OFM and neglected the important midfoot loading. The authors of these papers shared the same recommendation of using caution when foot and ankle kinematics are measured with a single-segment foot model. Interestingly, the value of multi-segment foot kinematic analysis has been praised also in studies related to the analysis of more proximal lower limb joints (Arnold et al. 2014b). In 2003, a special technical session of the Gait and Clinical Movement Analysis Society (GCMAS) agreed on and recommended that shank, rearfoot, forefoot, and hallux are clinically meaningful foot segments to be tracked. These segments are in fact to be found in most of the multi-segment foot techniques reported in the literature. Many basic foot biomechanic studies and clinical investigations employing various MFMs can be found in the literature. A number of relevant review papers have also been published, which represent valuable sources for an overview of foot modeling in kinematic analysis. Rankine et al. (2008) reported first a systematic analysis of 25 papers on foot kinematic modeling, thoroughly classified in terms of number of bony segments and joint rotations. All major technical and exploitation related issues were discussed systematically. Later, Deschamps et al. (2011) reported and assessed many of these techniques in relation to their exploitation in the clinical context. It was shown that whereas many foot joint rotations can be tracked in a consistent and repeatable way, some measures are still critical, and several of these techniques have yet to be used to address clinical problems. According to Bishop et al. (2012) this is the consequence of poorly described or flawed methodologies, preventing the readers from obtaining the same algorithms and programs to replicate the analysis. A minimum of five reporting standards were proposed in this paper; this aimed at guaranteeing full access to the most relevant modeling concepts and at providing a common platform for sharing and comparing foot kinematic data and as to improve their interpretation and usability. The association between foot posture

4

A. Leardini and P. Caravaggi

Table 1 Papers on multi-segment foot techniques and models (i.e., methodological studies). The column Number of segments counts all foot and shank segments. The model name is indicated when it was recognized somehow in the following literature or cited frequently in that way. Some of these studies were taken from a previous review paper (Rankine et al. 2008). Papers that reported further assessments of the technique/model are cited in the last column. Models and marker sets designed for bone pin analysis are excluded from this Table

Year 1991

Number Number Model name of segments of subjects (best known as) 3 3

1990 1996 1996

3 3 5

5 14 1

Cornwall and McPoil

1999

3

43

Milwaukee Foot Model Cornwall I

Woodburn et al. Rattanaprasert et al. Leardini et al.

1999

3

10

Woodburn I

1999

5

10

Rattanaprasert

1999

6

9

Leardini Foot Model I

Wu et al. Hunt et al. Carson et al.

2000 2001 2001

4 4 5

10 18 1

Arampatzis et al. MacWilliams et al. Hwang et al. Davis et al.

2002

7

6

2003

10

18

2004 2006

10 3

5 1

Pohl et al. Kitaoka et al.

2006 2006

12 4

3 20

Authors Scott and Winter Kepple et al. Moseley et al. Kidder et al.

Following papers, with developments and technical assessments

Myers et al. (2004), Long et al. (2010) Cornwall and McPoil (1999b), Cornwall and McPoil (2002)

Hetsroni et al. (2011)

Atkinson et al. (2010) Oxford Foot Model (OFM)

Stebbins et al. (2006), Curtis et al. (2009), Levinger et al. (2010), Wright et al. (2011), van Hoeve et al. (2015), Carty et al. (2015), Lucareli et al. (2016), Milner and Brindle (2016), Halstead et al. (2016)

Kinfoot

Shriners Hospital for Children Greenville Foot Model (SHCG)

Maurer et al. (2013) Saraswat et al. (2013)

(continued)

Kinematic Foot Models for Instrumented Gait Analysis

5

Table 1 (continued)

Authors Rao et al. Simon et al.

Year 2006 2006

Number of segments 4 11

Tome et al. Jenkyn and Nicol Leardini et al.

2006 2007

5 6

Number Model name of subjects (best known as) 10 10 Heidelberg Foot Measurement Method l 14 12

2007

5

10

Wolf et al. Sawacha et al. Cobb et al. Hyslop et al. Oosterwaal et al.

2008a 2009 2009 2010 2011

4 4 4 6 26

6 10 11 9 25

Bruening et al. De Mits et al.

2012a 4 2012 6

10 10

Saraswat et al. Bishop et al. Nester et al.

2012 2013 2014

4 4 6

15 18 100

Seo et al. Souza et al.

2014 2014

5 3

20 10

Following papers, with developments and technical assessments Kalkum et al. (2016)

Jenkyn et al. 2010

Rizzoli Foot Model (RFM)

Caravaggi et al. (2011), Deschamps et al. (2012a, b), Arnold et al. (2013), Portinaro et al. (2014), Van den Herrewegen et al. (2014)

GlasgowMaastricht foot model

Oosterwaal et al. (2016) Bruening et al. (2012b)

Ghent Foot Model Saraswat

Saraswat et al. (2013)

Salford Foot Model

and lower limb kinematics has been the objective of another interesting review analysis of twelve papers (Buldt et al. 2013). Evidence was found for increased frontal plane motion of the rearfoot during walking in individuals with pes planus. The latest review thus far by Novak et al. (2014) has highlighted the strengths and weaknesses of the most widely used and known MFMs, including an insight on their kinetic analyses. While joint rotations have been thoroughly addressed in the literature, joint translations have been studied and discussed very rarely: generally these are within 2 mm (Bruening et al. 2012a) in any anatomical direction. Because this is in the order of magnitude of skin motion artifact, this topic would not be further discussed in this chapter. For foot joint kinematics reconstruction, also the so-called “global

6

A. Leardini and P. Caravaggi

Table 2 Multi-segment foot models most used in clinical context. For each model (first column) the relevant clinical papers are reported (second column) Model Milwaukee Foot Model (1996)

Oxford Foot Model (2001)

Heidelberg Foot Measurement Method (2006) Rao et al. (2006) Rizzoli Foot Model (2007 and 2014)

Clinical papers Khazzam et al. (2007), Ness et al. (2008), Canseco et al. (2008), Marks et al. (2009), Brodsky et al. (2009), Canseco et al. (2009), Graff et al. (2010), Canseco et al. (2012), Krzak et al. (2015) Theologis et al. (2003), Woodburn et al. (2004), Turner et al. (2006), Turner and Woodburn (2008), Alonso-Vázquez et al. (2009), Wang et al. (2010), Deschamps et al. (2010), Stebbins et al. (2010), Bartonet al. (2011a, b, c), Hösl et al. (2014), Merker et al. (2015) Houck et al. (2009), Twomey et al. (2010), Dubbeldam et al. (2013) Nawoczenski et al. (2008), Neville et al. (2009), Rao et al. (2009) Chang et al. (2008), Deschamps et al. (2013), Portinaro et al. (2014), Chang et al. (2014), Deschamps et al. (2016), Arnold et al. (2014a, b), Lin et al. (2013), Hsu et al. (2014), Kelly et al. (2014), Deschamps et al. (2016)

optimization” has been used recently (Arnold et al. 2013; Bishop et al. 2016). This basically entails with an iterative search of the best estimation of foot segment position and orientation, all together also called “pose”. This procedure starts from skin marker trajectories, but the optimal poses must be compatible also with predetermined kinematic models for all the joints, i.e. global, this according to an original technique for the lower limbs (Lu and O’Connor 1999). The present chapter aims at introducing the current full series of methodological studies on this topic, in order to provide the basic knowledge for either the selection or the design of the most appropriate technique, according to the specific populations and hypotheses of the foot kinematic study to be performed.

State of the Art An extensive survey of the currently available multi-segment foot techniques and models is reported in Table 1. Several differences can be found between multisegment foot techniques in the following factors: – – – – – – –

Foot segments Bony landmarks Type of marker clusters Definition of the anatomical frames Joint convention – including 2D versus 3D measurements Neutral reference posture Offsets

Kinematic Foot Models for Instrumented Gait Analysis

7

The major difference between MFMs is found in the number and selection of foot segments (Fig. 1). While tibia, rearfoot, and forefoot are tracked by most techniques, the hallux – or the first metatarso-phalangeal joint – is seldom tracked, and the midfoot is tracked only by few models (MacWilliams et al. 2003; Leardini et al. 2007; Rouhani et al. 2011; Portinaro et al. 2014). Medial and lateral forefoot subdivisions have also been proposed (MacWilliams et al. 2003; Hwang et al. 2004; Buczek et al. 2006; Rouhani et al. 2011). The current models available include up to 12 segments (Table 1); even a 26 segment foot model has been proposed (Oosterwaal et al. 2011, 2016), but its application is limited to advanced musculoskeletal modeling studies. The number and selection of foot segments to be tracked, somehow the resolution of the model, is usually defined according to the field of application, the clinical interest, but also to the number, quality, and location of available cameras of the stereophotogrammetric system. While kinematic analysis of foot segments has been devised mostly for barefoot gait analysis, a number of techniques were explicitly designed for the analysis of shod feet (Wolf et al. 2008b; Cobb et al. 2009; Shultz et al. 2011b; Bishop et al. 2013). Moreover, the effect of foot and ankle orthoses has been investigated by established models (Lin et al. 2013; Leardini et al. 2014). The overall results, in terms of patterns of foot joint kinematics, can be confusing and difficult to interpret because of the differences mentioned above. Also, the varying populations analyzed, as highlighted in Table 1, in terms also of physical status, size, age, gender, etc., make it difficult to compare data across different studies. The process to include and track a segment within a MFM for kinematic analysis requires a profound knowledge of foot biomechanics and of the limits and accuracy of the measuring system. For example, the actual joint motion to be recorded should be much larger than the accuracy of the stereophotogrammetric instrumentation used for the analysis and than any other source of error (particularly, the skin motion artifact). In addition to the known large rotations occurring at the tibiotalar and metatarso-phalangeal joints, and between metatarsus and calcaneus, in vivo and in vitro studies have demonstrated that significant and consistent rotations are experienced in normal feet also at the Chopart (talo-calcaneo-navicular and calcaneo-cuboid joints) and Lisfranc joints (tarso-metatarsal joints). These studies have confirmed that midfoot motion during gait is significant and its assessment should be included in relevant foot kinematic studies. The subtalar joint is also subjected to relatively large motion; however, this is very difficult to track in vivo with skin-based markers. Both skin-based and plate-mounted marker clusters (Leardini et al. 1999; Carson et al. 2001; Houck et al. 2006; Hyslop et al. 2010; Nester et al. 2014; Raychoudhury et al. 2014; Souza et al. 2014; Buldt et al. 2015) on relevant foot and shank bony landmarks have been used to track foot segments. The differences between skinmarkers and plate-mounted markers in measured joint motion were found to be small (Nester et al. 2007a). The markers used for motion analysis are usually passive, i.e., reflecting IR (infrared) light emitted by LEDS embedded in the motion cameras, or active, i.e., emitting IR light. While the latter usually provide a more accurate 3D location, they also require a wired external power which can result in uncomfortable

8

A. Leardini and P. Caravaggi

Fig. 1 Diagrammatic representation of the foot segment subdivisions (different grey tones) for the main MFMs

setups also restraining the movement of the subject. The number of markers used in MFM can be as high as 35, as in Oosterwaal et al. (2011, 2016) and Raychoudhury et al. (2014). A compromise must always be found between the required degrees of freedom of the model, which is related also to the number of segments tracked in 2D or 3D, and the number, quality, and location of the available cameras. These are arranged usually to collect motion data for other anatomical districts and motor tasks in the same laboratory, and therefore compromising layouts must be found, as explicitly discussed for one widely used MFM (Leardini et al. 2007). As mentioned above, at least three markers need to be fixated to each segment for a complete 3D representation of its motion. This setup is technically suitable for establishing a local reference frame on each segment and for calculation of triplanar joint rotations using the Euler or the Joint-Coordinate-System convention (Grood and Suntay 1983) (see typical results in Fig. 2). Anatomical landmarks are necessary to establish anatomical based reference frames. However, the paucity of bony landmarks and the small size of several foot bones limit the application of the three-marker tracking for foot segments kinematics. While most techniques for the kinematic analysis of foot segments use the 3D approach, i.e., three independent rotations about three different axes, 2D projection angles can also be used to measure relative rotations of a joint, with respect to anatomical planes. In the latter, line segments determined by the position of two markers are projected at each time sample onto an anatomical or other relevant planes, for the planar rotation to be calculated during motion (Simon et al. 2006; Leardini et al. 2007; Portinaro et al. 2014). 2D planar angles have been largely used to track motion of metatarsal bones, as well as for motion representations of the arches of the foot, particularly the medial

Kinematic Foot Models for Instrumented Gait Analysis

9

Plantarflexion Dorsiflexion angle [deg]

Shank-Calcaneus

Calcaneus-Metatarsus

0

0

–10

–10 –20

–20

–30 –30

y

–40

–40 –50

x sagittal

1

20

40

60

100

80

y

–50 –60

x sagittal

1

20

40

60

80

100

Eversion Invesion angle [deg]

40 20

30

10

20

0

10 0

–10 z frontal

1 Abduction Adduction angle [deg]

x

–20 20

40

60

80

–10

x z frontal

–20 1

100

30

40

20

30

20

40

60

80

100

20 10 10 0

y

–10

0

z

y

–10 z transverse

transverse

–20

1

20

load mid-stance response

40

60

late-stance

% gait cycle

80

swing

100

–20

1

20

load mid-stance response

40

60

late-stance

80

100

swing

% gait cycle

Fig. 2 Typical mean ( one standard deviation) temporal profiles of foot joint rotations over the full gait cycle from a control population of normal subjects. In the left and right columns, respectively: motion of the calcaneus in the shank reference frame and of the metatarsals in the calcaneus reference frame. From top to bottom rows: rotations in the sagittal, frontal, and transverse anatomical planes

longitudinal arch, and the varus/valgus inclination of the calcaneus. With this approach, however, very erroneous and misleading values can be obtained in extreme conditions, particularly in case of large ranges of joint motion and in case of large deviations between the line segment and the projection plane. Another important question is whether to use a reference neutral position for the foot and ankle joints. Most frequently, a double leg standing posture is recorded to provide reference orientations of the foot and lower limb segments. The neutral orientation can be used as offset and subtracted from the corresponding temporal profile of joint rotation. The so-called “subtalar neutral” is also sought (Rao et al.

10

A. Leardini and P. Caravaggi

2009) to establish the correct initial alignment of the foot and ankle. Plaster molds have also been exploited to control the foot resting position (Saraswat et al. 2012, 2013), ensuring foot placement reproducibility and segment neutral orientation. This procedure is intended to compensate for differing anatomical frame definitions and foot static deformities, in order to establish a common “zero reference level” for inter-subject comparisons. The use of a neutral posture has the advantage of removing the bias associated to the anatomical frame definitions, thus allowing to focus the analysis and all relevant measurements on the “dynamic” pattern of the joint rotations. Unfortunately, it also removes any joint misalignments due to bone and/or joint deformity, which are frequently included in the clinical evidence of each patient and therefore should not be removed from the analysis. The choice of offsetting joint rotations by using a neutral posture is thus related to the specific study and its hypotheses and should take into consideration, for example, if there is any ongoing treatment to correct a foot deformity. Regardless of its application to offset the kinematic data, the inter-segmental orientations with the subject in the neutral posture represent extremely valuable information that should always be analyzed and assessed, in relation to the corresponding temporal profiles of joint rotations. In order to help final users to identify which MFM is more reliable, repeatable, and/or best fitting the aims of their investigation, few studies have been published which compare the performance of the most popular MFM. Mahaffey et al. (2013) have used intra-class correlation coefficients to analyze the OFM, the Rizzoli Foot Model (RFM), and the Kinfoot (MacWilliams et al. 2003) in 17 children on two testing sessions. Although some variability has been found between segments, multisegment foot kinematics were shown to be quite repeatable even in pediatric feet. A standard error of measurement greater than 5 was found in 26%, 15%, and 44% of the kinematic parameters, respectively, for the OFM, RFM, and the Kinfoot model. The latter showed the lowest repeatability and the highest errors. The OFM demonstrated moderate repeatability and reasonable errors in all segments except for the hindfoot in the transverse plane. The RFM resulted in moderate repeatability and reasonable test-retest error similar to that of the OFM, but with original additional data also on midfoot kinematics. In another paper by Powell et al. (2013), the OFM and RFM were assessed in the context of foot function and alignment as possible predisposition factors for overuse and traumatic injury in athletes. Both models helped detect significant differences in frontal plane motion between high- and low-arched footed athletes. However, the RFM was suggested to be the more appropriate because it allows to track also midfoot motion. While it was not the main scope of the study, a comparison between the Shriners Hospital for Children Greenville Foot Model (Davis et al. 2006) and the OFM can be found also in Maurer et al. (2013). The former model was shown to be more effective in quantifying the presence and severity of midfoot break deformity in the sagittal plane and in monitoring the progression over time. Di Marco et al. (2015a, b) performed the most comprehensive comparative analysis to date of the OFM, RFM, the Sawacha et al. (2009), and Saraswat et al. (2012) models. The best coefficient of multiple correlation between-sessions of the kinematic parameters during ground and treadmill walking was observed for the RFM (range 0.83–0.95).

Kinematic Foot Models for Instrumented Gait Analysis

11

Perhaps an overabundance of multi-segment foot techniques and models has been proposed to date. Some of these have been made available to the motion analysis community also via simple-to-use software codes. New users are free to choose the most appropriate model/technique for their needs according to the experimental conditions. In particular, the visibility and traceability of the relevant markers must be considered, both in relation to their dimension and location, together with its applicability on the clinical population under investigation, and to the motor activities to be analyzed. Moreover, foot and leg deformities should be carefully assessed before starting the data collection campaign. The advantages and disadvantages of existing techniques should be considered and analyzed before developing and validating a novel MFM suitable to the aims of the investigation.

Validation and Application of Foot Models Validation Studies New motion analysis procedures always require proper validation, but this is particularly challenging for the kinematic analysis of foot segments via skin-markers. Usually, MFMs are only assessed for repeatability of measurements (see Table 1) (Mahaffey et al. 2013; Di Marco et al. 2015a, b). Videofluoroscopy has been employed to estimate the error in the measurements due to the skin motion artifacts (Leardini et al. 2005; Wrbaskić and Dowling 2007; Shultz et al. 2011a). Skin motion artifacts were shown to be as large as 16 mm in very strenuous foot conditions. The largest errors were measured in the hindfoot and midfoot clusters at toe-off, likely because of the large deformations experienced by the foot bones and skin in this phase of stance. Still, the skin-to-bone relative motion at the foot was found to be smaller than that of typical markers on the shank and thigh (Leardini et al. 2005), thus it has been deemed sufficiently reliable for foot bone tracking. However, the most convincing evidence of skeletal motion is from in-vitro and in-vivo bone pins measurements. In vitro, robotic gait simulators are used to replicate the biomechanical conditions of the stance phase of walking on foot cadaver specimens (Whittaker et al. 2011; Peeters et al. 2013), and kinematics of foot bones can be accurately tracked via bone pins instrumented with markers. This data helped verify a promising consistency in foot joint kinematic patterns, for most of the foot joints, between skin-markers and bone pin measurements. Moreover, it has been possible to detect motion in a number of joints that are difficult to analyze in-vivo. In-vitro kinematic data should always be critically evaluated in relation to the fidelity of the replication of the real in-vivo conditions. Validation of MFMs has been performed also by tracking real bone motion invivo (Nester et al. 2007a, b, 2010; Arndt et al. 2007; Lundgren et al. 2008; Wolf et al. 2008a; Okita et al. 2009). This required bone pins to be instrumented with marker clusters and fixated to a number of foot segments in volunteers under a small dose of local anesthesia. In this condition, the motion pattern of the main foot joints during walking and running can be established very accurately. It has been shown that the

12

A. Leardini and P. Caravaggi

motion patterns with and without the inserted pins compare well, indicating that the subjects had little motion restriction due to such invasive intervention. Motion of major joints was revealed to be very complex, and that of small joints, such as the talo-navicular, to be larger than what expected – about 10 in the three anatomical planes – and also larger than that of the talo-calcaneal joint. Motion larger than 3 , therefore non negligible, was also measured between tibia and fibula. These studies also showed the kinematic differences between multibone segments, as measured by external skin clusters, and single bone pins. These experiments are limited by the small number of subjects and are hardly replicable for technical and ethical reasons. The relevant data published so far must serve as reference for other investigations on normal and pathological feet.

Musculoskeletal Multi-Segment Foot Modeling MFMs can be used also to develop and validate complex musculoskeletal computer models for forward and inverse dynamic analysis. Typically, medical imaging is used to define geometrical models of the anatomical structures and in vivo recorded kinematics, whereas ground reaction forces provide the data to perform inversedynamics. This allows measurement of bone segment kinematics and estimes of loading conditions, at the joints, muscle-tendon units, and ligaments. These models are particularly valuable for an insight into pathological conditions, understanding disease mechanisms, and simulating the effects of possible treatments, whether surgical, pharmacological or physical. Saraswat et al. (2010) proposed a generic musculoskeletal model of an adult foot, including the intrinsic muscles and ligaments of the foot and ankle, configured and scaled by skin marker trajectories and an optimization routine. The predicted muscle activation patterns were assessed against corresponding EMG measurements from the literature. It was shown that small marker placement error may result in large differences in model outcomes. Another large investigation (Oosterwaal et al. 2011, 2016) has proposed a more complex musculoskeletal model which implies 26 segments and 26 idealized joints, either cylindrical, universal, or spherical, for a total of 39 degrees of freedom. The model geometry can be customized using CT and MRI data, and dynamic simulations can be performed by using bone kinematic data from the 34 skin markers. Both forward and inverse dynamic modeling were claimed to be exploitable, with the former integrating a multibody approach with FE analysis and the latter allowing to describe the interactions of the musculoskeletal structures. Computer models designed to estimate the mechanics of single structures in the foot, which can hardly be measured non invasively have also been developed and reported. One of these models has allowed to estimate the effect of walking speed on the tension in the plantar aponeurosis (Caravaggi et al. 2010). A most recent study has exploited the modified OFM to develop a method for configuring personalized foot models to patients suffering of juvenile idiopathic arthritis (Prinold et al. 2016). This has highlighted the criticality of patient-specific

Kinematic Foot Models for Instrumented Gait Analysis

13

definition of the ankle joint axes and the location of the Achilles tendon insertions. These models have great potential for the analysis of loading conditions in healthy and pathological feet, and is beneficial to predicting the effects, and thus improving the efficacy, of surgical treatments and foot and ankle orthoses. However: (a) the relevant data sets are hardly available or difficult to create, with real personalization nearly impossible; (b) the mechanical parameters of soft tissues (ligaments, tendons, retinacula, etc.) are difficult to attain and not fully available in the literature; (c) the external (so-called “boundary”) conditions depend on many factors not all measurable; and (d) the collection of relevant marker trajectories is demanding and requires a cameras setup with special arrangements. Confidence in interpretation diminishes because of the many unknowns and the relevant assumptions, optimization criteria, conventions, calculations, etc. The final concern is about validation, which is difficult to obtain for these bulky computational models.

Kinetic Analysis Including Foot Models The moment and power at the ankle joint have been reported extensively in standard gait analysis with traditional single-segment foot models. However, MFMs offer specific insight also into muscle performance which would enlighten their function, particularly in physiological and pathological conditions. This assessment must take advantage of foot joint kinetics, where ground reaction force and joint center location are combined to obtain joint moment, work, and power. A number of studies (Dixon et al. 2012; MacWilliams et al. 2003) have demonstrated that the contribution of the midfoot to the overall power is important for forward propulsion during gait; power generation at rearfoot would instead be overestimated by a singlesegment foot model. Detection of abnormal kinetic patterns, even prior to significant worsening of the prognosis, may help with the formulation of early specific interventions, aimed at reducing the progression of deformity and disability (DiLiberto et al. 2015). In kinetic analysis, in addition to meaningful references associated to the segments of interest, the complete ground reaction forces, comprising normal and shear components, must be measured. The inertial properties of segments and the location of joint centers must also be determined. The latter is particularly critical at the Chopart and Lisfranc joints, but also at the metatarso-phalangeal joints since these encompass a number of anatomical articulations. Usually additional virtual points are defined on these joint lines, however these do not precisely represent the exact position of anatomical elements. Several attempts have been made to determine an optimum suitable technique for measuring reaction forces under each segment, i.e., subarea ground reaction. In order to estimate these regional forces, a miniature force platform, requiring the superposition of many targeted trials to create a full analysis (Scott and Winter 1991), and combined pressure and force plates with assumptions of proportionality between the two (Giacomozzi and Macellari 1997; MacWilliams et al. 2003; Giacomozzi et al. 2006) have been proposed.

14

A. Leardini and P. Caravaggi

Early kinetic measures of the foot (Scott and Winter 1991; MacWilliams et al. 2003) have been limited by intricate assumptions and equipment restrictions. The former was an 2D eight-joint model and relied on the superposition of multiple trials interacting with a small custom-built force sensor. The latter paper proposed a first 3D model with eight segments but implied many assumptions on joint motion, pressure, as well as force data for a final estimation of joint moments, which neglected the mediolateral component of the force. A theoretical extension of the MacWilliams model, addressing kinetic-based calculations, has been proposed a little later (Buczek et al. 2006). An extensive investigation undergone in a clinical and research setting (Bruening et al. 2012a, b), involving respectively 17 and 10 healthy pediatric subjects, explicitly aimed at exploiting the feasibility and relevance of a proposed technique for kinetic analysis applied to a MFM. A three-segment kinetic model was first characterized and assessed. In the second paper, kinetic parameters, joint moments and powers during level walking were reported. Three submodels of the shank and foot complex were created and two adjacent force platforms employed for calculation of rotation, net moment, and total power at the ankle, midtarsal, and 1st metatarso-phalangeal joints. Unfortunately, the protocol required visual targeting of the force platforms, whose confounding contribution, together with that of the inertial parameters, was assessed separately. The study confirmed that not only sagittal plane motion but also generated peak power are generally overestimated at the ankle (35% on average) when using a single-segment, rather than a MFM. This “split force platform approach” can be an alternative to ad-hoc hardware, but it is critical when applied to pathological populations for the special targeting of the force plates. DiLiberto et al. (2015) have reported, for the first time, multi-joint foot kinetics in subjects with diabetes mellitus, peripheral neuropathy, and in a healthy adult control group, by using an electromagnetic sensor motion capture system. The model consisted of the tibia, rearfoot, and forefoot, with ideal hinge joints connecting the segments. It has been shown that (i) positive peak power and work of the forefoot with respect to rearfoot was smaller in the patients group with respect to the control, (ii) negative peak powers of both the forefoot to rearfoot and rearfoot to tibia was larger, and (iii) a greater proportion of negative work was present at both these joints. While, the value of joint kinematics from MFMs has been repeatedly demonstrated, relevant joint kinetics are still controversial because motion occurs at multiple articulations between bony segments (Nester et al. 2007a), particularly for the critical estimation of position and orientation of the axes of rotation, which would significantly affect kinetic measurements. It is no doubt that kinetic measurements can increase the knowledge of foot and ankle function, as well as influence the evolution of MFMs. Though some preliminary attempts in this direction might have been too complex (Wang et al. 2016), the currently available simple representations here discussed seem a good compromise between comprehensiveness of foot kinetics and comprehension of the results for clinical use. These techniques, however, are hardly exploitable by the large scientific community, due to the modeling and the implementation issues.

Kinematic Foot Models for Instrumented Gait Analysis

15

Summary Stereophotogrammetry has made enormous progress in the last few years, with cheaper motion analysis systems achieving high performance in markers tracking. Today, multi-segment foot kinematic analysis is easier than what it was years ago. A large number of multi-segment foot techniques for in vivo gait analysis based on stereophotogrammetric systems have been published over the last three decades. The large variety of currently available marker sets and protocols is of great value for anyone interested in the analysis of foot motion in normal or pathological conditions, following treatments, in the evaluation of sport performance, etc. Following this long survey of the current techniques for kinematic analysis of foot segments, the reader interested in the application of an existing MFM, or in the design of a new one suitable to the aims of his/her specific investigation, shall find valuable recommendations for either options here below. In case of selection of an existing MFM, the researcher must be careful in deciding on the most appropriate model from the literature; this must be done according to the number of segments and degrees of freedom, the technique used for the calculation of joint rotations, the position and visibility of the markers, etc. As far as the latter, the number of cameras and their positions should be checked against the motor tasks analyzed. Using a camera setup closer to the acquisition volume, and better optics, normally will result in a better quality of the measurements. However, the cameras are often arranged for full body gait analysis, and the necessary frequent changes in the cameras configuration can be a tedious operation requiring repeated calibrations. The consistency with existing single-segment foot and lower limb models should also be discussed with the motion analysis team. First of all the user must comprehend which segments should be tracked for the study, thus excluding unnecessarily complex marker sets. Another important decision regards the reference neutral orientation and whether this shall be used as an off-set to compensate for anatomical frame definitions and foot deformities. As already stated above, the decision should be taken according to the scope of the study and the population analyzed. Relevant literature supporting the model should also be sought, in particular in relation to the repeatability of measurements and the validation against gold standards. Calculation and analysis of foot joint rotations can be supported by commercial software or freeware. While this allows users to save time and resources in writing ad-hoc analysis programs, the consistency of the calculations should always be assessed against corresponding data published by the authors. Unfortunately, many papers do not provide complete instructions on exact marker mounting, marker trajectory smoothing, anatomical frame definition, joint conventions, etc. (Bishop et al. 2012); even small changes to the original definitions may affect the final measurements. The involvement of experienced operators is therefore recommended, and a program of continuous training activities should be implemented. Software tools which have been developed and/or verified by the original model designers are preferable. In case it is required to design new specific models, a consistent and careful identification of specific single or combined bony segments, which are originally tracked according to the aims of the study, must be performed. From a theoretical

16

A. Leardini and P. Caravaggi

perspective, an extremely large number of models can be devised and formalized simply by mixing and matching the 26 foot bones. As long as a reference system is clearly defined and a sufficient number of markers can be fixated to each segment, the model can be considered original and reproducible. However, many of the papers reported here have failed to properly describe the relevant details of the model, including origin and orientation of the reference frames. Moreover, some of these allegedly novel MFMs are merely a subset or small variations of previously existing models, whereby a fewer number of segments has been tracked and/or only minor modifications to the location of the markers have been implemented. In this respect, we recommend that for a new, in vivo multi-segment technique for kinematic analysis of the foot to be considered as a novel model, the following criteria should be followed: (1) The rationale for proposing a new segmentation of the foot for kinematic analysis should be explicit and clear, e.g., for biomechanical or clinical purposes. The authors should be able to address the question: why any of the existing MFMs is not appropriate to address the aims of my study? (2) The description of the new model should be exhaustive as far as the following information: foot segments; relevant anatomical landmarks; position and orientation of the local reference co-ordinate system; and number and location of the skin markers for motion tracking. (3) A validation study should be performed – possibly by a different research group – to demonstrate that the repeatability of the kinematic outcome is comparable or better than that of the current most common and widely used foot models. (4) At least one clinical or biomechanical application of the model should be reported to demonstrate the value and efficacy of the new model. If the proposed criteria were strictly followed, only a few of the foot segmentation techniques found in the literature and here reported could be considered as appropriate MFMs. In fact, even following these criteria would not necessarily guarantee the new model to become popular in everyday gait analyses and to be widely used. Many other factors influence the “popularity” and diffusion of gait analysis protocols in the scientific and clinical contexts, such as their comprehension (as far as the basic concepts, its application, the data processing and reporting), their usability and utility, the availability of relevant processing software, the support provided by the inventors, the distribution and actions of motion capture systems vendors, etc. The scientific community of human motion analysis would however be pleased to welcome any such robust contribution in the discipline.

Future Directions While foot kinematics can now be described with multiple degrees of freedom foot models, a number of issues still remain to be addressed in the future, particularly for in vivo studies. The complex procedures for validation of the models and the

Kinematic Foot Models for Instrumented Gait Analysis

17

unavoidable bias from the soft tissue artifacts are known to affect these measures (Leardini et al. 2005). Nevertheless single or plate-mounted skin markers are necessary, the alternatives in routine clinical analyses having several limitations. While videofluoroscopy and bone pins provide more accurate measures of foot joints motion, these are also excessively invasive for patients. Inertial sensors (Rouhani et al. 2012, 2014) and marker-less dynamic 3D scanning (Van den Herrewegen et al. 2014) are definitely less invasive, but accuracy and anatomical based analysis fall short. Due to the variety of motion analysis systems and protocols for data acquisition, a normative database for foot kinematic data should be available in every single gait analysis facility (Deschamps et al. 2012a). Moreover, kinematic analysis of foot joints is highly sensitive to errors in marker placement due to the small size of foot bones in comparison to other body segments. Even a small deviation from the recommended marker location can result in a large error within the position and orientation of the reference co-ordinate system. In order to decrease the error rate and improve the protocol repeatability, only experienced operators with extensive knowledge of foot anatomy and practice in multi-segment foot analysis should be in charge of mounting the markers (Caravaggi et al. 2011; Deschamps et al. 2012a). To ensure for an increased and coherent use of these techniques, standardization of the references shall be sought, in terms of anatomical landmark and frame definitions. In this respect, another fundamental step forward would imply establishing a common terminology, which can avoid confusion and inconsistency. The utilization of these models is still critical in the presence of shoes or foot orthotics (Lin et al. 2013; Bishop et al. 2013; Leardini et al. 2014; Bishop et al. 2015; Halstead et al. 2016), hence the limitation of available literature currently. All these issues perhaps can account for the limited number of relevant clinical applications (Deschamps et al. 2011).

Cross-References ▶ 3D Dynamic Pose Estimation Using Reflective Markers or Electromagnetic Sensors ▶ 3D Kinematics of Human Motion ▶ Ankle Foot Orthoses and Their Influence on Gait ▶ Conventional Gait Model: Success and Limitations ▶ EMG Activity in Gait: The Influence of Motor Disorders ▶ Foot and Ankle Motion in Cerebral Palsy ▶ Functional Effects of Ankle Sprain ▶ Functional Effects of Foot Orthoses ▶ Functional Effects of Shoes ▶ Integration of Foot Pressure and Foot Kinematics Measurements for Medical Applications ▶ Interpreting Joint Moments and Powers in Gait ▶ Interpreting Spatiotemporal Parameters, Symmetry and Variability in Clinical Gait Analysis

18

A. Leardini and P. Caravaggi

▶ Rigid Body Models of the Musculoskeletal System ▶ The Effects of Ankle Joint Replacement on Gait ▶ Three-Dimensional Human Kinematics Estimation Using Magneto and Inertial Measurement Units ▶ Variations of Marker-Sets and Models for Standard Gait Analysis

References Alonso-Vázquez A, Villarroya MA, Franco MA, Asín J, Calvo B (2009) Kinematic assessment of paediatric forefoot varus. Gait Posture 29(2):214–219 Arampatzis A, Brüggemann GP, Klapsing GM (2002) A three-dimensional shank-foot model to determine the foot motion during landings. Med Sci Sports Exerc 34(1):130–138 Arndt A, Wolf P, Liu A, Nester C, Stacoff A, Jones R, Lundgren P, Lundberg A (2007) Intrinsic foot kinematics measured in vivo during the stance phase of slow running. J Biomech 40 (12):2672–2678 Arnold JB, Mackintosh S, Jones S, Thewlis D (2013) Repeatability of stance phase kinematics from a multi-segment foot model in people aged 50 years and older. Gait Posture 38(2):349–351 Arnold JB, Mackintosh S, Jones S, Thewlis D (2014a) Differences in foot kinematics between young and older adults during walking. Gait Posture 39(2):689–694 Arnold J, Mackintosh S, Jones S, Thewlis D (2014b) Altered dynamic foot kinematics in people with medial knee osteoarthritis during walking: a cross-sectional study. Knee 21(6):1101–1106 Atkinson HD, Daniels TR, Klejman S, Pinsker E, Houck JR, Singer S (2010) Pre- and postoperative gait analysis following conversion of tibiotalocalcaneal fusion to total ankle arthroplasty. Foot Ankle Int 31(10):927–932 Barton CJ, Levinger P, Crossley KM, Webster KE, Menz HB (2011a) Relationships between the Foot Posture Index and foot kinematics during gait in individuals with and without patellofemoral pain syndrome. J Foot Ankle Res 4:10 Barton CJ, Levinger P, Webster KE, Menz HB (2011b) Walking kinematics in individuals with patellofemoral pain syndrome: a case–control study. Gait Posture 33(2):286–289 Barton CJ, Menz HB, Levinger P, Webster KE, Crossley KM (2011c) Greater peak rearfoot eversion predicts foot orthoses efficacy in individuals with patellofemoral pain syndrome. Br J Sports Med 5(9):697–701 Benedetti MG, Manca M, Ferraresi G, Boschi M, Leardini A (2011) A new protocol for 3D assessment of foot during gait: application on patients with equinovarus foot. Clin Biomech 26(10):1033–1038 Bishop C, Paul G, Thewlis D (2012) Recommendations for the reporting of foot and ankle models. J Biomech 45(13):2185–2194 Bishop C, Paul G, Thewlis D (2013) The reliability, accuracy and minimal detectable difference of a multi-segment kinematic model of the foot-shoe complex. Gait Posture 37(4):552–557 Bishop C, Arnold JB, Fraysse F, Thewlis D (2015) A method to investigate the effect of shoe-hole size on surface marker movement when describing in-shoe joint kinematics using a multisegment foot model. Gait Posture 41(1):295–299 Bishop C, Arnold JB, May T (2016) Effects of taping and orthoses on foot biomechanics in adults with flat arched feet. Med Sci Sports Exerc 48(4):689–696 Brodsky JW, Charlick DA, Coleman SC, Pollo FE, Royer CT (2009) Hindfoot motion following reconstruction for posterior tibial tendon dysfunction. Foot Ankle Int 30(7):613–618 Bruening DA, Cooney KM, Buczek FL (2012a) Analysis of a kinetic multi-segment foot model. Part I: model repeatability and kinematic validity. Gait Posture 35(4):529–534 Bruening DA, Cooney KM, Buczek FL (2012b) Analysis of a kinetic multi-segment foot model. Part II: kinetics and clinical implications. Gait Posture 35(4):535–540

Kinematic Foot Models for Instrumented Gait Analysis

19

Buczek FL, Walker MR, Rainbow MJ, Cooney KM, Sanders JO (2006) Impact of mediolateral segmentation on a multi-segment foot model. Gait Posture 23(4):519–522 Buldt AK, Murley GS, Butterworth P, Levinger P, Menz HB, Landorf KB (2013) The relationship between foot posture and lower limb kinematics during walking: a systematic review. Gait Posture 38(3):363–372 Buldt AK, Levinger P, Murley GS, Menz HB, Nester CJ, Landorf KB (2015) Foot posture is associated with kinematics of the foot during gait: a comparison of normal, planus and cavus feet. Gait Posture 42(1):42–48 Canseco K, Long J, Marks R, Khazzam M, Harris G (2008) Quantitative characterization of gait kinematics in patients with hallux rigidus using the Milwaukee foot model. J Orthop Res 26 (4):419–427 Canseco K, Long J, Marks R, Khazzam M, Harris G (2009) Quantitative motion analysis in patients with hallux rigidus before and after cheilectomy. J Orthop Res 7(1):128–134 Canseco K, Long J, Smedberg T, Tarima S, Marks RM, Harris GF (2012) Multisegmental foot and ankle motion analysis after hallux valgus surgery. Foot Ankle Int 33(2):141–147 Caravaggi P, Pataky T, Gunther M et al (2010) Dynamics of longitudinal arch support in relation to walking speed: contribution of the plantar aponeurosis. J Anat 217:254–261 Caravaggi P, Benedetti MG, Berti L, Leardini A (2011) Repeatability of a multi-segment foot protocol in adult subjects. Gait Posture 33(1):133–135 Carson MC, Harrington ME, Thompson N, O’Connor JJ, Theologis TN (2001) Kinematic analysis of a multi-segment foot model for research and clinical applications: a repeatability analysis. J Biomech 34(10):1299–1307 Carty CP, Walsh HP, Gillett JG (2015) Sensitivity of the Oxford foot model to marker misplacement: a systematic single-case investigation. Gait Posture 42(3):398–401 Chang R, Van Emmerik R, Hamill J (2008) Quantifying rearfoot-forefoot coordination in human walking. J Biomech 41(14):3101–3105 Chang R, Rodrigues PA, Van Emmerik RE, Hamill J (2014) Multi-segment foot kinematics and ground reaction forces during gait of individuals with plantar fasciitis. J Biomech 47 (11):2571–2577 Cobb SC, Tis LL, Johnson JT, Wang YT, Geil MD, McCarty FA (2009) The effect of low-mobile foot posture on multi-segment medial foot model gait kinematics. Gait Posture 30(3): 334–339 Cornwall MW, McPoil TG (1999a) Effect of ankle dorsiflexion range of motion on rearfoot motion during walking. J Am Podiatr Med Assoc 89(6):272–277 Cornwall MW, McPoil TG (1999b) Three-dimensional movement of the foot during the stance phase of walking. J Am Podiatr Med Assoc 89(2):56–66 Cornwall MW, McPoil TG (2002) Motion of the calcaneus, navicular, and first metatarsal during the stance phase of walking. J Am Podiatr Med Assoc 92(2):67–76 Curtis DJ, Bencke J, Stebbins JA, Stansfield B (2009) Intra-rater repeatability of the Oxford foot model in healthy children in different stages of the foot roll over process during gait. Gait Posture 30(1):118–121 Davis RB, Eugene G, Jameson E, Davids JR, Christopher LM, Benjamin M, Rogozinski B, Anderson JP (2008) The design, development, and initial evaluation of a multi-segment foot model for routine clinical gait analysis. In: Harris GF, Smith PA, Marks RM (eds) Foot and ankle motion analysis: clinical treatment and technology. CRC Press, Taylor and Francis Group, pp 425–444 De Mits S, Segers V, Woodburn J, Elewaut D, De Clercq D, Roosen P (2012) A clinically applicable six-segmented foot model. J Orthop Res 30(4):655–661 De Ridder R, Willems T, Vanrenterghem J, Robinson MA, Palmans T, Roosen P (2015) Multisegment foot landing kinematics in subjects with chronic ankle instability. Clin Biomech 30 (6):585–592 Deschamps K, Birch I, Desloovere K, Matricali GA (2010) The impact of hallux valgus on foot kinematics: a cross-sectional, comparative study. Gait Posture 32(1):102–106

20

A. Leardini and P. Caravaggi

Deschamps K, Staes F, Roosen P, Nobels F, Desloovere K, Bruyninckx H, Matricali GA (2011) Body of evidence supporting the clinical use of 3D multi-segment foot models: a systematic review. Gait Posture 33(3):338–349 Deschamps K, Staes F, Bruyninckx H, Busschots E, Jaspers E, Atre A, Desloovere K (2012a) Repeatability in the assessment of multi-segment foot kinematics. Gait Posture 35(2):255–260 Deschamps K, Staes F, Bruyninckx H, Busschots E, Matricali GA, Spaepen P, Meyer C, Desloovere K (2012b) Repeatability of a 3D multi-segment foot model protocol in presence of foot deformities. Gait Posture 36(3):635–638 Deschamps K, Matricali GA, Roosen P, Nobels F, Tits J, Desloovere K, Bruyninckx H, Flour M, Deleu PA, Verhoeven W, Staes F (2013) Comparison of foot segmental mobility and coupling during gait between patients with diabetes mellitus with and without neuropathy and adults without diabetes. Clin Biomech 28(7):813–819 Deschamps K, Dingenen B, Pans F, Van Bavel I, Matricali GA, Staes F (2016) Effect of taping on foot kinematics in persons with chronic ankle instability. J Sci Med Sport 19(7):541–546 Di Marco R, Rossi S, Racic V, Cappa P, Mazzà C (2015a) A comparison between four foot model protocols: the effect of walking on a treadmill. Conference paper 2015: XXV congress of the international society of biomechanics, Glasgow Di Marco R, Rossi S, Racic V, Cappa P, Mazzà C (2015b) Concurrent reliability assessment of four foot models for gait analysis. Conference paper 2016: XVI congress of the Società Italiana di Analisi del Movimento in Clinica, Padova DiLiberto FE, Tome J, Baumhauer JF, Quinn JR, Houck J, Nawoczenski DA (2015) Multi-joint foot kinetics during walking in people with diabetes mellitus and peripheral neuropathy. J Biomech 48(13):3679–3684 Dixon PC, Böhm H, Döderlein L (2012) Ankle and midfoot kinetics during normal gait: a multisegment approach. J Biomech 45(6):1011–1016 Dubbeldam R, Nester C, Nene A, Hermens H, Buurke J (2013) Kinematic coupling relationships exist between non-adjacent segments of the foot and ankle of healthy subjects. Gait Posture 37:159–164 Giacomozzi C, Macellari V (1997) Piezo-dynamometric platform for a more complete analysis of foot-to-floor interaction. IEEE Trans Rehabil Eng 5(4):322–330 Giacomozzi C, Benedetti MG, Leardini A, Macellari V, Giannini S (2006) Gait analysis with an integrated system for functional assessment of talocalcaneal coalition. J Am Podiatr Med Assoc 96(2):107–115 Graff A, Hassani S, Krzak J, Long J, Caudill A, Flanagan A, Eastwood D, Kuo KN, Harris G, Smith P (2010) Long-term outcome evaluation in young adults following clubfoot surgical release. J Pediatr Orthop 30(4):379–385 Grood ES, Suntay WJ (1983) A joint coordinate system for the clinical description of threedimensional motions: application to the knee. J Biomech Eng 105(2):136–144 Halstead J, Keenan AM, Chapman GJ, Redmond AC (2016) The feasibility of a modified shoe for multi-segment foot motion analysis: a preliminary study. J Foot Ankle Res 9:7 Hetsroni I, Nyska M, Ben-Sira D, Arnson Y, Buksbaum C, Aliev E, Mann G, Massarwe S, Rozenfeld G, Ayalon M (2011) Analysis of foot and ankle kinematics after operative reduction of high-grade intra-articular fractures of the calcaneus. J Trauma 70(5):1234–1240 Hösl M, Böhm H, Multerer C, Döderlein L (2014) Does excessive flatfoot deformity affect function? A comparison between symptomatic and asymptomatic flatfeet using the Oxford foot model. Gait Posture 39(1):23–28 Houck JR, Tome JM, Nawozensky DA (2006) Subtalar neutral position as an offset for a kinematic model of the foot during walking. Gait Analy 28:29–37 Houck JR, Neville C, Tome J, Flemister AS (2009) Foot kinematics during a bilateral heel rise test in participants with stage II posterior tibial tendon dysfunction. J Orthop Sports Phys Ther 39 (8):593–603 Houdijk H, Doets HC, van Middelkoop M, DirkjanVeeger HE (2008) Joint stiffness of the ankle during walking after successful mobile-bearing total ankle replacement. Gait Posture 27(1):115–119

Kinematic Foot Models for Instrumented Gait Analysis

21

Hsu WH, Lewis CL, Monaghan GM, Saltzman E, Hamill J, Holt KG (2014) Orthoses posted in both the forefoot and rearfoot reduce moments and angular impulses on lower extremity joints during walking. J Biomech 47(11):2618–2625 Hunt AE, Smith RM, Torode M (2001) Extrinsic muscle activity, foot motion and ankle joint moments during the stance phase of walking. Foot Ankle Int 22(1):31–41 Hwang SJ, Choi HS, Kim YH (2004) Motion analysis based on a multi-segment foot model in normal walking. Conf Proc IEEE Eng Med Biol Soc 7:5104–5106 Hyslop E, Woodburn J, McInnes IB, Semple R, Newcombe L, Hendry G, Rafferty D, De Mits S, Turner DE (2010) A reliability study of biomechanical foot function in psoriatic arthritis based on a novel multi-segmented foot model. Gait Posture 32(4):619–626 Jenkyn TR, Nicol AC (2007) A multi-segment kinematic model of the foot with a novel definition of forefoot motion for use in clinical gait analysis during walking. J Biomech 40(14):3271–3278 Jenkyn TR, Shultz R, Giffin JR, Birmingham TB (2010) A comparison of subtalar joint motion during anticipated medial cutting turns and level walking using a multi-segment foot model. Gait Posture 31(2):153–158 Kalkum E, van Drongelen S, Mussler J, Wolf SI, Kuni B (2016) A marker placement laser device for improving repeatability in 3D-foot motion analysis. Gait Posture 44:227–230 Kelly LA, Cresswell AG, Racinais S, Whiteley R, Lichtwark G (2014) Intrinsic foot muscles have the capacity to control deformation of the longitudinal arch. J R Soc Interface 11(93):20131188 Kepple TM, Stanhope SJ, Lohmann KN, Roman NL (1990) A video-based technique for measuring ankle-subtalar motion during stance. J Biomed Eng 12(4):273–280 Khazzam M, Long JT, Marks RM, Harris GF (2007) Kinematic changes of the foot and ankle in patients with systemic rheumatoid arthritis and forefoot deformity. J Orthop Res 25 (3):319–329 Kidder SM, Abuzzahab FS Jr, Harris GF, Johnson JE (1996) A system for the analysis of foot and ankle kinematics during gait. IEEE Trans Rehabil Eng 4(1):25–32 Kitaoka HB, Crevoisier XM, Hansen D, Katajarvi B, Harbst K, Kaufman KR (2006) Foot and ankle kinematics and ground reaction forces during ambulation. Foot Ankle Int 27(10):808–813 Krzak JJ, Corcos DM, Damiano DL, Graf A, Hedeker D, Smith PA, Harris GF (2015) Kinematic foot types in youth with equinovarus secondary to hemiplegia. Gait Posture 41(2):402–408 Leardini A, Benedetti MG, Catani F, Simoncini L, Giannini S (1999) An anatomically based protocol for the description of foot segment kinematics during gait. Clin Biomech 14 (8):528–536 Leardini A, Chiari L, Della Croce U, Cappozzo A (2005) Human movement analysis using stereophotogrammetry. Part 3. Soft tissue artifact assessment and compensation. Gait Posture 21(2):212–225 Leardini A, Benedetti MG, Berti L, Bettinelli D, Nativo R, Giannini S (2007) Rear-foot, mid-foot and fore-foot motion during the stance phase of gait. Gait Posture 25(3):453–462 Leardini A, Aquila A, Caravaggi P, Ferraresi C, Giannini S (2014) Multi-segment foot mobility in a hinged ankle-foot orthosis: the effect of rotation axis position. Gait Posture 40(1):274–277 Levinger P, Murley GS, Barton CJ, Cotchett MP, McSweene SR, Menz HB (2010) A comparison of foot kinematics in people with normal- and flat-arched feet using the Oxford foot model. Gait Posture 32:519–523 Lin SC, Chen CP, Tang SF, Wong AM, Hsieh JH, Chen WP (2013) Changes in windlass effect in response to different shoe and insole designs during walking. Gait Posture 37(2):235–241 Long JT, Eastwood DC, Graf AR, Smith PA, Harris GF (2010) Repeatability and sources of variability in multi-center assessment of segmental foot kinematics in normal adults. Gait Posture 31(1):32–36 Lu TW, O’Connor JJ (1999) Bone position estimation from skin marker co-ordinates using global optimisation with joint constraints. J Biomech 32(2):129–134 Lucareli PR, Contani LB, Lima B, Rabelo ND, Ferreira CL, Lima FP, Correa JC, Politti F (2016) Repeatability of a 3D multi-segment foot model during anterior and lateral step down tests. Gait Posture 43:9–16

22

A. Leardini and P. Caravaggi

Lundgren P, Nester C, Liu A, Arndt A, Jones R, Stacoff A, Wolf P, Lundberg A (2008) Invasive in vivo measurement of rear-, mid- and forefoot motion during walking. Gait Posture 28(1):93–100 MacWilliams BA, Cowley M, Nicholson DE (2003) Foot kinematics and kinetics during adolescent gait. Gait Posture 17(3):214–224 Mahaffey R, Morrison SC, Drechsler WI, Cramp MCJ (2013) Evaluation of multi-segmental kinematic modelling in the paediatric foot using three concurrent foot models. J Foot Ankle Res 6(1):43 Marks RM, Long JT, Ness ME, Khazzam M, Harris GF (2009) Surgical reconstruction of posterior tibial tendon dysfunction: prospective comparison of flexor digitorumlongus substitution combined with lateral column lengthening or medial displacement calcaneal osteotomy. Gait Posture 29(1):17–22 Maurer JD, Ward V, Mayson TA, Davies KR, Alvarez CM, Beauchamp RD, Black AH (2013) A kinematic description of dynamic midfoot break in children using a multi-segment foot model. Gait Posture 38(2):287–292 Merker J, Hartmann M, Kreuzpointner F, Schwirtz A, Haas JP (2015) Pathophysiology of juvenile idiopathic arthritis induced pesplanovalgus in static and walking condition: a functional view using 3D gait analysis. J Pediatr Rheumatol Online 13:21 Milner CE, Brindle RA (2016) Reliability and minimal detectable difference in multi-segment foot kinematics during shod walking and running. Gait Posture 43:192–197 Moseley L, Hunt A, Grant R (1996) Three-dimensional kinematics of the rearfoot during the stance phase of walking in normal young adult males. Clin Biomech 11(1):39–45 Myers KA, Wang M, Marks RM, Harris GF (2004) Validation of a multi-segment foot and ankle kinematic model for pediatric gait. IEEE Trans Neural Syst Rehabil Eng 12(1):122–130 Nawoczenski DA, Ketz J, Baumhauer JF (2008) Dynamic kinematic and plantar pressure changes following cheilectomy for hallux rigidus: a mid-term followup. Foot Ankle Int 29(3):265–272 Ness ME, Long J, Marks R, Harris G (2008) Foot and ankle kinematics in patients with posterior tibial tendon dysfunction. Gait Posture 27(2):331–339 Nester C, Jones RK, Liu A, Howard D, Lundberg A, Arndt A, Lundgren P, Stacoff A, Wolf P (2007a) Foot kinematics during walking measured using bone and surface mounted markers. J Biomech 40(15):3412–3423 Nester CJ, Liu AM, Ward E, Howard D, Cocheba J, Derrick T, Patterson P (2007b) In vitro study of foot kinematics using a dynamic walking cadaver model. J Biomech 40(9):1927–1937 Nester CJ, Liu AM, Ward E, Howard D, Cocheba J, Derrick T (2010) Error in the description of foot kinematics due to violation of rigid body assumptions. J Biomech 43(4):666–672 Nester CJ, Jarvis HL, Jones RK, Bowden PD, Liu A (2014) Movement of the human foot in 100 pain free individuals aged 18–45: implications for understanding normal foot function. J Foot Ankle Res 7(1):51 Neville C, Flemister AS, Houck JR (2009) Effects of the AirLift PTTD brace on foot kinematics in subjects with stage II posterior tibial tendon dysfunction. J Orthop Sports Phys Ther 39(3):201–209 Novak AC, Mayich DJ, Perry SD, Daniels TR, Brodsky JW (2014) Gait analysis for foot and ankle surgeons – topical review, part 2: approaches to multi-segment modeling of the foot. Foot Ankle Int 35(2):178–191 Okita N, Meyers SA, Challis JH, Sharkey NA (2009) An objective evaluation of a segmented foot model. Gait Posture 30(1):27–34 Oosterwaal M, Telfer S, Torholm S, Carbes S, van Rhijn LW, Macduff R, Meijer K, Woodburn J (2011) Generation of subject-specific, dynamic, multi-segment ankle and foot models to improve orthotic design: a feasibility study. BMC Musculoskelet Disord 12:256 Oosterwaal M, Carbes S, Telfer S, Woodburn J, Tørholm S, Al-Munajjed A, van Rhijn L, Meijer K (2016) The Glasgow-Maastricht foot model, evaluation of a 26 segment kinematic model of the foot. J Foot Ankle Res 9:19 Peeters K, Natsakis T, Burg J, Spaepen P, Jonkers I, Dereymaeker G, Vander Sloten J (2013) An in vitro approach to the evaluation of foot-ankle kinematics: performance evaluation of a custom-built gait simulator. Proc Inst Mech Eng H 227(9):955–967

Kinematic Foot Models for Instrumented Gait Analysis

23

Pohl MB, Messenger N, Buckley JG (2006) Changes in foot and lower limb coupling due to systematic variations in step width. Clin Biomech 21(2):175–183 Portinaro N, Leardini A, Panou A, Monzani V, Caravaggi P (2014) Modifying the Rizzoli foot model to improve the diagnosis of pes-planus: application to kinematics of feet in teenagers. J Foot Ankle Res 7(1):754 Pothrat C, Authier G, Viehweger E, Berton E, Rao G (2015) One- and multi-segment foot models lead to opposite results on ankle joint kinematics during gait: implications for clinical assessment. Clin Biomech 30(5):493–499 Powell DW, Williams DS, Butler RJ (2013) A comparison of two multi-segment foot models in high-and low-arched athletes. J Am Podiatr Med Assoc 103(2):99–105 Prinold JA, Mazzà C, Di Marco R, Hannah I, Malattia C, Magni-Manzoni S, Petrarca M, Ronchetti AB, Tanturri de Horatio L, van Dijkhuizen EH, Wesarg S, Viceconti M, MD-PAEDIGREE Consortium Ann (2016) A patient-specific foot model for the estimate of ankle joint forces in patients with juvenile idiopathic arthritis. Ann Biomed Eng 44(1):247–257 Rankine L, Long J, Canseco K, Harris GF (2008) Multisegmental foot modeling: a review. Crit Rev Biomed Eng 36(2–3):127–181 Rao S, Saltzman C, Yack HY (2006) Segmental foot mobility in individuals with and without diabetes and neuropathy. Clin Biomech 22:464–471 Rao S, Baumhauer JF, Tome J, Nawoczenski DA (2009) Comparison of in vivo segmental foot motion during walking and step descent in patients with midfoot arthritis and matched asymptomatic control subjects. J Biomech 42(8):1054–1060 Rattanaprasert U, Smith R, Sullivan M, Gilleard W (1999) Three-dimensional kinematics of the forefoot, rearfoot, and leg without the function of tibialis posterior in comparison with normals during stance phase of walking. Clin Biomech 14(1):14–23 Raychoudhury S, Hu D, Ren L (2014) Three-dimensional kinematics of the human metatarsophalangeal joint during level walking. Front Bioeng Biotechnol 2:73 Rouhani H, Favre J, Crevoisier X, Jolles BM, Aminian K (2011) Segmentation of foot and ankle complex based on kinematic criteria. Comput Methods Biomech Biomed Engin 4(9):773–781 Rouhani H, Favre J, Aminian K, Crevoisier X (2012) Multi-segment foot kinematics after total ankle replacement and ankle arthrodesis during relatively long-distance gait. Gait Posture 36 (3):561–566 Rouhani H, Favre J, Crevoisier X, Aminian K (2014) A wearable system for multi-segment foot kinetics measurement. J Biomech 47(7):1704–1711 Saraswat P, Andersen MS, MacWilliams BA (2010) A musculoskeletal foot model for clinical gait analysis. J Biomech 43:1645–1652 Saraswat P, MacWilliams BA, Davis RB (2012) A multi-segment foot model based on anatomically registered technical coordinate systems: method repeatability in pediatric feet. Gait Posture 35 (4):547–555 Saraswat P, MacWilliams BA, Davis RB, D’Astous JL (2013) A multi-segment foot model based on anatomically registered technical coordinate systems: method repeatability and sensitivity in pediatric planovalgus feet. Gait Posture 37(1):121–125 Sawacha Z, Cristoferi G, Guarneri G, Corazza S, Donà G, Denti P, Facchinetti A, Avogaro A, Cobelli C (2009) Characterizing multi-segment foot kinematics during gait in diabetic foot patients. J Neuroeng Rehabil 6:37 Scott SH, Winter DA (1991) Talocrural and talocalcaneal joint kinematics and kinetics during the stance phase of walking. J Biomech 24(8):743–752 Seo SG, Lee DY, Moon HJ, Kim SJ, Kim J, Lee KM, Chung CY, Choi IH (2014) Repeatability of a multi-segment foot model with a 15-marker set in healthy adults. J Foot Ankle Res 7:24 Shultz R, Kedgley AE, Jenkyn TR (2011a) Quantifying skin motion artifact error of the hindfoot and forefoot marker clusters with the optical tracking of a multi-segment foot model using single-plane fluoroscopy. Gait Posture 34(1):44–48 Shultz R, Birmingham TB, Jenkyn TR (2011b) Differences in neutral foot positions when measured barefoot compared to in shoes with varying stiffnesses. Med Eng Phys 33(10):1309–1313

24

A. Leardini and P. Caravaggi

Simon J, Doederlein L, McIntosh AS, Metaxiotis D, Bock HG, Wolf SI (2006) The Heidelberg foot measurement method: development, description and assessment. Gait Posture 23(4):411–424 Souza TR, Fonseca HL, Vaz AC, Antero JS, Marinho CS, Fonseca ST (2014) Between-day reliability of a cluster-based method for multi-segment kinematic analysis of the foot-ankle complex. J Am Podiatr Med Assoc 104(6):601–609 Stebbins J, Harrington M, Thompson N, Zavatsky A, Theologis T (2006) Repeatability of a model for measuring multi-segment foot kinematics in children. Gait Posture 23(4):401–410 Stebbins J, Harrington M, Thompson N, Zavatsky A, Theologis T (2010) Gait compensations caused by foot deformity in cerebral palsy. Gait Posture 32(2):226–230 Theologis TN, Harrington ME, Thompson N, Benson MK (2003) Dynamic foot movement in children treated for congenital talipesequinovarus. J Bone Joint Surg Br 85(4):572–577 Tome J, Nawoczenski DA, Flemister A, Houck J (2006) Comparison of foot kinematics between subjects with posterior tibialis tendon dysfunction and healthy controls. J Orthop Sports Phys Ther 36(9):635–644 Turner DE, Helliwell PS, Emery P, Woodburn J (2006) The impact of rheumatoid arthritis on foot function in the early stages of disease: a clinical case series. BMC Musculoskelet Disord 7:102 Turner DE, Woodburn J (2008) Characterising the clinical and biomechanical features of severely deformed feet in rheumatoid arthritis. Gait Posture 28(4):574–580 Twomey D, McIntosh AS, Simon J, Lowe K, Wolf SI (2010) Kinematic differences between normal and low arched feet in children using the Heidelberg foot measurement method. Gait Posture 32 (1):1–5 Van den Herrewegen I, Cuppens K, Broeckx M, Barisch-Fritz B, Vander Sloten J, Leardini A, Peeraer L (2014) Dynamic 3D scanning as a markerless method to calculate multi-segment foot kinematics during stance phase: methodology and first application. J Biomech 47 (11):2531–2539 vanHoeve S, de Vos J, Weijers P, Verbruggen J, Willems P, Poeze M, Meijer K (2015) Repeatability of the Oxford foot model for kinematic gait analysis of the foot and ankle. Clin Res Foot Ankle 3:171 Wang R, Thur CK, Gutierrez-Farewik EM, Wretenberg P, Broström E (2010) One year follow-up after operative ankle fractures: a prospective gait analysis study with a multi-segment foot model. Gait Posture 31(2):234–240 Wang Y, Wong DW, Zhang M (2016) Computational models of the foot and ankle for pathomechanics and clinical applications: a review. Ann Biomed Eng 44(1):213–221 Whittaker EC, Aubin PM, Ledoux WR (2011) Foot bone kinematics as measured in a cadaveric robotic gait simulator. Gait Posture 33(4):645–650 Wolf P, Stacoff A, Liu A, Nester C, Arndt A, Lundberg A, Stuessi E (2008a) Functional units of the human foot. Gait Posture 28(3):434–441 Wolf S, Simon J, Patikas D, Schuster W, Armbrust P, Döderlein L (2008b) Foot motion in children shoes: a comparison of barefoot walking with shod walking in conventional and flexible shoes. Gait Posture 27(1):51–59 Woodburn J, Turner DE, Helliwell PS, Barker S (1999) A preliminary study determining the feasibility of electromagnetic tracking for kinematics at the ankle joint complex. Rheumatology (Oxford) 38(12):1260–1268 Woodburn J, Nelson KM, Siegel KL, Kepple TM, Gerber LH (2004) Multisegment foot motion during gait: proof of concept in rheumatoid arthritis. J Rheumatol 31(10):1918–1927 Wrbaskić N, Dowling JJ (2007) An investigation into the deformable characteristics of the human foot using fluoroscopic imaging. Clin Biomech 22(2):230–238 Wright CJ, Arnold BL, Coffey TG, Pidcoe PE (2011) Repeatability of the modified Oxford foot model during gait in healthy adults. Gait Posture 33(1):108–112 Wu WL, Su FC, Cheng YM, Huang PJ, Chou YL, Chou CK (2000) Gait analysis after ankle arthrodesis. Gait Posture 11(1):54–61

Trunk and Spine Models for Instrumented Gait Analysis Robert Needham, Aoife Healy, and Nachiappan Chockalingam

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 State of the Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Optoelectronic Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Soft Tissue Artifact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Two-Dimensional Modelling of the Spine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Three-Dimensional Modelling of the Spine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Kinematic Modeling of the Pelvis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Data Analysis Techniques and Clinical Outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

Abstract

There are several types of motion capture systems which can measure trunk and spine movement as a part of gait analysis. These range from wearable sensors to optoelectronic systems. This chapter focuses on models used within optoelectronic systems and covers both two- and three-dimensional models. This chapter while providing an outline of the current thorax and pelvis models highlights novel concepts in terms of 3-dimensional clusters. Latest methods on data analysis techniques using vector coding have been outlined which will facilitate comprehensive reporting of the movement data. Keywords

Spine models • Gait Analysis • Thorax model

R. Needham (*) • A. Healy • N. Chockalingam Life Sciences & Education, Staffordshire University, Stoke On Trent, UK e-mail: [email protected]; [email protected]; [email protected] # Springer International Publishing AG 2016 B. Müller, S.I. Wolf (eds.), Handbook of Human Motion, DOI 10.1007/978-3-319-30808-1_29-1

1

2

R. Needham et al.

Introduction Imaging modalities such as X-rays and computer tomography scans are considered by clinicians as a criterion reference for the assessment of spine posture. While it is possible to gain a three-dimensional representation of the spine from conventional scans, two-dimensional imaging is considered a suitable assessment technique for clinical practice. As both techniques cannot be applied in a dynamic situation such as gait due to a restriction in the capture volume, these assessment methods are not suitable for routine clinical use. Also, repeated exposure to radiation means these techniques could be harmful to a patient. In addition, planning and assessing treatment outcomes based on postural observations and movement assessments of static standing posture disregards the coordination between body segments, movement variability, and the motor control strategies chosen to perform activities of daily living. Due to strong evidence to suggest that structural alterations of the spine may not necessarily lead to a condition such as low back pain, recent clinical guidelines strongly recommend against using imaging techniques (Chou et al. 2007). It is proposed that unless a serious condition is suspected, clinicians should focus on assessing and understanding the severity of functional deficits that individuals experience when performing activities of daily living (Chou et al. 2007). An alternative noninvasive method to assess function of the spine and back is therefore highly desired; one that can accurately and reliably track movement allows for the physiological loading that is experienced during dynamic movements and has an established practicality to support clinical management strategies. Several types of motion capture systems exist which can measure trunk and spine movement over time. These range from wearable sensors (electromagnetic tracking systems, electro-goniometers, inertial sensors, and ultrasound transmitters) to optoelectronic systems (please refer to the section “Medical Application: Assessment of Kinematics” for further information). While there are known limitations for all types of motion capture systems (Don et al. 2012), advances in optoelectronic technology and improved calibration protocols offer a measurement accuracy to less than 1 mm (Levine et al. 2012). Optoelectronic marker-based systems are considered to be the gold standard for movement analysis (Cappozzo et al. 2005) and are often used as a criterion measure to validate wearable sensors. Marker-based systems may not be suitable for standard clinical assessments, due to the time required for participant preparation (marker application) and for data collection, processing, and analysis (Lee 2002). Nevertheless, when integrated with other movement analysis technologies (i.e., force plates, electromyography), marker-based systems can detail the complexity of human gait (please refer to the section “Methods and Models: Data Analysis” for further information). Thus, the research that is conducted in gait analysis laboratories that employ marker-based systems can inform clinical practice strategies and aid in the development, design,

Trunk and Spine Models for Instrumented Gait Analysis

3

and implementation of rehabilitation interventions (McGinley et al. 2009; Baker 2006). The design and validation of kinematic models for the assessment of pelvis and lower limb movement are evident in the research literature and demonstrate progression toward an advanced understanding of human movement (Manal et al. 2000; Cappozzo et al. 1995, 1997) On the other hand, a kinematic model/technique to quantitatively measure movement of the trunk and regions within the trunk has yet to be agreed upon in the scientific community. The trunk has been termed the “passenger unit” (Perry 1992; Perry and Burnfield 2010) that moves in response to actions of the lower limbs (Crosbie et al. 1997a; Thorstensson et al. 1984). Nonetheless, since the trunk, head, and upper limbs can account for two-thirds of an individual’s body mass, lower limb movement patterns associated with pathological gait can induce compensatory movements in the trunk region that can have a major influence on gait dynamics. There is empirical evidence to suggest that the coordinated interaction between the spine, pelvis, and lower limbs is essential to maintain balance and to achieve a smooth and efficient gait (MacWilliams et al. 2013; Cappozzo 1983; Van Emmerik et al. 2005).

State of the Art Recently, there have been several advances in terms of technologies and techniques involved in motion capture resulting in sophisticated and accurate gait analysis. However, most of these have been on lower limb analysis, and currently there is no specific trunk and spine model which is widely adopted for use within gait and posture analysis during gait. While there are several reported models, there is little agreement between them, and their reproducibility is questionable because of the lack of detail. Recent paper by Needham et al. (2016b) has produced a review which explains the state of the art and outlines the requirements for a good marker set. Another paper outlines a marker set which has been shown to be repeatable and valid (Needham et al. 2016a). In terms of the thorax model, this research provides evidence and informs that we should consider moving away from using the conventional thorax model. Improvements in motion capture technology should allow for more complex spine models. This in turn will provide further insight into intersegmental movement. One should also consider alternative approaches to models such as the use of 3D clusters.

Optoelectronic Systems Optoelectronic systems capture human movement using markers that are placed on anatomical landmarks. In biomechanics, markers are usually attached to the skin and can be referred to as either passive or active. Passive markers are covered in

4

R. Needham et al.

retroreflective material that is tracked by infrared cameras. Active markers produce an infrared signal (light-emitting diodes). This chapter will focus on the application of passive markers.

Soft Tissue Artifact The primary concern when using marker-based systems is soft tissue artifact (STA), whereby skin-mounted markers can displace away from identified bony landmarks during dynamic movements, thus having an impact on segmental angle calculations. There are two types of error that can have an impact on marker position as a result of STA. The first is relative error, which observes the relative movement between markers that describe a rigid segment. The second is absolute error, which examines the movement of a marker against the anatomical landmark it is overlying (Richards 2008). A comparison between angle data from indwelling bone pins and skinmounted markers has provided an estimate of the degree relative, and absolute error influences segment/joint kinematics of the lower extremities (Peters et al. 2010). Bone pins have been utilized to quantify 3D motion of the lumbar vertebrae during gait (Rozumalski et al. 2008; MacWilliams et al. 2013). While simultaneous measurements to skin-mounted markers were not considered, these results can be used for reference and comparative purposes (Needham et al. 2016a). Advancements in mathematical modeling techniques could eventually compensate for the degree of STA experienced during movement and provide an appropriate estimation of bone pose position and joint/segmental kinematics (Lu and O’Connor 1999; Cappello et al. 2005; Cappozzo et al. 1997). For detailed information on methods to assess and compensate for STA, readers are directed elsewhere (Leardini et al. 2005).

Two-Dimensional Modelling of the Spine Attaching markers on several spinous processes of the vertebrae provides a measure of trunk inclination in the sagittal and frontal plane. Various marker configurations have been developed using this approach (Frigo et al. 2003; Chockalingam et al. 2003; Syczewska et al. 1999). The configuration used will depend on the either the research question or the model application in a clinical setting. For instance, if there is a requirement to assess the trunk region as a rigid segment, a line between the seventh cervical spinous process (C7) and sacrum could be used (please refer to the section “Methods and Models: Rigid Body Modeling” for further information). As stated previously, this provides movement data in two dimensions. Therefore, a representation of axial rotation of the entire trunk could be defined by a projection of a line between the left and right acromion process (outlines shoulder girdle) and a line between left and right posterior-superior iliac spines (PSIS) onto a horizontal plane (Frigo et al. 2003).

Trunk and Spine Models for Instrumented Gait Analysis

5

Fig. 1. Segment angle to global reference frame (a) proposed by Frigo et al. (2003) (b) proposed by Frigo et al. (2003); Heyrman et al. (2013) (c) proposed by Heyrman et al. (2014). C7, seventh cervical vertebra; T2, second thoracic vertebra; T6, sixth thoracic vertebra; T10, tenth thoracic vertebra; L1, first lumbar vertebrae; L3, third lumbar vertebrae; L5, fifth lumbar vertebrae

A direct line between two adjacent markers that are attached to the spinous processes can be projected onto the principle planes with respect to a global reference frame, i.e., the vertical or the horizontal, providing a segmental angle measure (Fig. 1a). This approach has been used to study segmental movements of the spine during gait. Attaching markers on the spinous processes along the length of the spine, Syczewska et al. (1999) reported that during gait spine segments display lateral flexion toward the opposite side at initial contact in comparison to segments above a spinal level of the seventh thoracic vertebrae (T7). With further evidence that demonstrates intersegment movement of the thoracic spine (Needham et al. 2016a; Crosbie et al. 1997b), these findings question the practicality of thorax segment that is defined below the level of the sixth thoracic vertebrae (T6). This information when combined with knowledge of the functional workings of the spine, i.e., facet joints, would support the development of a defined three-dimensional thorax model. Planar projections between adjacent segments offer an assessment of intersegmental movement. This approach provides an understanding of the compensatory movements in the trunk region as a result of a spinal condition such as scoliosis or a related lower limb condition that can influence spine movement, i.e., leg length discrepancy. If the appropriate spinous processes are used, planar projections can assess static and dynamic spine posture. For example, a planar projection using three markers can provide lordosis angle in the sagittal plane (Frigo et al. 2003) (Fig. 1b). Four markers that define two segments that are not

6

R. Needham et al.

Fig. 2. Marker placements for thorax model according to Baker (2013) (a) and Wu et al. (2005) (ISB) (b) Anatomical landmarks SJN, deepest point of incisura jugularis; SXS, xiphoid process; C7, seventh cervical vertebra; T2, second thoracic vertebra; T8, eighth thoracic vertebra; T10, tenth thoracic vertebra

adjacent to each other can be used to represent sagittal plane angle of kyphosis (Heyrman et al. 2014) (Fig. 1c).

Three-Dimensional Modelling of the Spine It is the arrangement of markers in accordance with the kinematic model recommendations that define body segments. A minimum of three non-collinear markers are required to define the position and orientation of a body segment in threedimensional space (3D). As mentioned previously, two-dimensional movement can be analyzed by a direct line between two markers (M1 and M2). To define axial rotation of a segment, a third non-collinear marker (M3) is required that does not lie on the line directly between M1 and M2 (Fig. 2a). Readers are directed to a paper by Wu et al. (2005) for a detailed example on defining a thorax segment and respective coordinate system. Markers fixed to the manubrium, spinous process of the second thoracic vertebrae (T2), and the midpoint between the PSIS is a nonlinear marker set that can represent a 3D trunk model (Baker 2013). Landmarks on the shoulder girdle can also be incorporated into a defined 3D trunk model (Rab et al. 2002). However, markers that define the center link of the trunk and are attached to the thorax segment are favored due to the movement artifact imposed by the upper limb movement associated with using acromion landmarks (Nguyen and Baker 2004).

Trunk and Spine Models for Instrumented Gait Analysis

7

The thorax and abdomen are the two regions associated with the trunk. While substantial evidence on thorax movement is documented in the research literature, there is a lack of agreement on which kinematic model best defines the thorax segment. The conventional gait model typically defines a 3D thorax segment using anatomical landmarks on the proximal and distal end of the sternum along with the spinous processes of the spine. This approach is based on the International Society of Biomechanics (ISB) guidelines (Wu et al. 2005) (Fig. 2b). The Plug-in-Gait thorax model (Vicon, OMG, UK) uses a similar approach to the ISB thorax model, although the tenth thoracic vertebrae (T10) defines the posterior-distal aspect of the thorax instead of the eighth thoracic vertebrae (T8). Since the position of C7 can be influenced by movement of the head and neck (Armand et al. 2014) and represents the cervical region of the spine, the spinous process of the second thoracic vertebrae (T2) should be used as the posterior-superior landmark of a thorax segment. Alignment with the distal apex of each scapula (defined as MAI) has been suggested to be a suitable landmark to define the posterior-distal aspect of a thorax segment (Leardini et al. 2011). Since the MAI landmark is approximately at a level of the ninth thoracic vertebrae (T9) (Haneline et al. 2008), and that similar thorax movements during gait have been noted when either T8 or T10 was included as part of the model (Armand et al. 2014), the MAI could be a viable landmark. Differences in scapula position and spinal postures between individuals need to be considered before the MAI landmark is used as part of the thorax pose estimation. So far thorax models have been defined based on four markers, two on the sternum and two on the spine. Defining a thorax segment that requires a marker to be placed on the distal end of the sternum could have practical concerns for female participants. A minimal marker set using the T2, T10, and manubrium can define a 3D thorax segment (Baker 2013) (Fig. 2a). Consistent thorax movements have been demonstrated using this approach (Armand et al. 2014). Unlike the thorax region, there are no anatomical landmarks on the anterior side of the abdomen to support a 3D lumbar segment. To define axial rotation in the transverse plane in the lumbar region, additional markers on either side of the spinous processes can be applied (Crosbie et al. 1997a; Seay et al. 2008). These lateral markers could be subjected to greater soft tissue artifact than those attached to the spinous processes. Elastikon elastic tape wrapped around the lumbar region could be a possible solution to minimize the influence of soft tissue artifact (Seay et al. 2008; Mason et al. 2016). Alternatively, a 3D cluster could be used. This cluster consists of at least three markers in a nonlinear configuration. These are normally attached to a rigid or semirigid base which can be applied to relevant anatomical landmarks on the back (Fig. 3a, b). When implemented in the same gait laboratory, consistent and reliable measurements of lumbar spine movement have been reported using the 3D cluster technique (Schache et al. 2002; Needham et al. 2016a,b; Taylor et al. 1996; Thurston 1982). With application to track thorax motion (Mason et al. 2016; Seay et al. 2011), the 3D cluster has recently been shown to produce similar patterns of movement and range of motion in comparison to the conventional thorax model (Fig. 3a, b).

8

R. Needham et al.

Fig. 3. (a) Anatomical landmarks used in the IOR thorax model (T2, second thoracic vertebra; T8, eighth thoracic vertebra; SJN, deepest point of incisura jugularis; SXS, xiphoid process) and 3D cluster on T1 (first thoracic vertebra); (b) 3D cluster structural dimensions. Unless stated measurements are in millimeters

Kinematic Modeling of the Pelvis According to the International Society of Biomechanics (ISB) guidelines (Wu et al. 2002), the coordinate system of a pelvis segment is defined by skin-mounted markers placed on the anterior and posterior landmarks of the ilium. This marker configuration is an approach commonly employed in clinical gait analysis (Baker 2013). It has been shown that markers attached to the anterior superior iliac spines (ASISs) are influenced more by STA during gait than markers which are attached to the posterior-superior iliac spines (Hara et al. 2014). The effect of STA on pelvis kinematics would be more of an issue for those individuals with a higher body mass index and excessive soft tissue in the abdominal region. The arms could occlude the optoelectronic camera’s view of ASIS markers while walking. In light of the concerns of using ASIS landmarks in pelvis kinematic modelling, alternative techniques have been proposed. These include the use of individual markers placed on the posterior side of the ilium and sacrum (Frigo et al. 1998) or by attaching a rigid cluster of markers onto the sacrum region (Borhani et al. 2013). In addition, with advancements in biomechanical software, this has presented an opportunity to use virtual markers to support the creation of a segment and aid in the tracking of segmental movement (Kisho Fukuchi et al. 2010; McClelland et al. 2010). As there are a number of techniques available, an appreciation of the differences

Trunk and Spine Models for Instrumented Gait Analysis

9

between kinematic modeling approaches is essential when interpreting relative movement between the trunk region and a pelvis segment (Needham et al. 2016b).

Data Analysis Techniques and Clinical Outcomes Normally, trunk and spine movement will be analyzed with an established pelvis and lower limb marker set. Once a reliable model is used to collect the data, there are several reported methods for data analysis, and reporting. Depending on the type of clinical condition and use of a dataset, an appropriate method could be adopted. It might range from measuring and reporting a simple lateral flexion or forward bending to the assessment of rotation in specific clinical conditions or sporting situations/physical activity. Movement estimated from a 3D thorax model is often reported relative to a pelvis segment or to a laboratory location, although the former is the most common approach used by the scientific community. Kinematic waveforms and range of motion values are interpreted by a comparison against a reported normative database. Since the 3D thorax model (Wu et al. 2005) is associated with the conventional gait model, a substantial database on relative movement between the thorax and pelvis is available. This includes the use of subgroup classifications allowing for comparisons by age, by gender, and in some instances by clinical condition (e.g. cerebral palsy and low back pain (LBP)). In contrast to the 3D thorax model, there is scarce data on alternative models/methods outlined previously in this chapter. Reporting discrete kinematic values such as range of motion and peak angle values alone can be misleading when unaccompanied by kinematic waveforms. During gait for instance, while no difference in lumbar spine range of motion has been noted between individuals with chronic LBP and those considered healthy, differences in the kinematic waveforms were evident between groups, and this was due to the variability in the stride-to-stride movement (Vogt et al. 2001). A similar observation was reported in a study that examined lumbar kinematic variability during gait in participants with chronic LBP before and after a 12-week isolated lumbar extension exercise intervention (Steele et al. 2015). Subtle differences in range of motion and the timing of rotations in different directions of two defined segments influence the reporting of relative movement. To support the interpretation of a kinematic waveform that details relative movement, it is strongly advised that global waveforms (defined as movement of a segment relative to a laboratory location) of each respective segment is also documented. Examples of this type of interpretation are provided in a recent systematic review (Needham et al. 2016b). Alternatively, one should consider using novel data analysis techniques which has the capability to quantify the coordination pattern between two segments and can assess the variability of this interaction over time (Hamill et al. 2012). A modified vector coding technique is one approach, and while it is beyond

10

R. Needham et al.

the scope of this chapter to explain the calculations and coordination classification system of the vector coding technique, there are published papers which outlines this data analysis approach (Needham et al. 2014, 2015).

Future Directions There has been a lot of activity in the recent past with various papers reporting several types of marker sets for spine, trunk, and pelvis analysis. Whilst these data sets provide a good debate, there has to be a consensus not only on the model and methodology but also the reporting procedures, if any of these approaches are to be adopted for clinical use. In addtion, these papers have to provide clear detail on kinematic data reporting. For example details on whether it is intersegmental data or the range of movement of a specific segment in a specific plane, which will help in designing effective clinical management. The intersegmental data on the thorax is still scarce, and further work needs to be completed to understand the movement in normative gait and in various clinical conditions. In addition, new data analyses techniques outlined within this chapter could be employed to help understand these conditions in detail. Furthermore, these types of data will fuel innovation in the development of orthotics or prosthetics.

References Armand S, Sangeux M, Baker R (2014) Optimal markers’ placement on the thorax for clinical gait analysis. Gait Posture 39(1):147–153 Baker R (2006) Gait analysis methods in rehabilitation. J Neuroeng Rehabil 3(1):4 Baker R (2013) Measuring Walking: A Handbook of Clinical Gait Analysis. Mac Keith Press, Cambridge Borhani M, McGregor AH, Bull AMJ (2013) An alternative technical marker set for the pelvis is more repeatable than the standard pelvic marker set. Gait Posture 38(4):1032–1037 Cappello A et al (2005) Soft tissue artifact compensation in knee kinematics by double anatomical landmark calibration: performance of a novel method during selected motor tasks. IEEE Trans Biomed Eng 52(6):992–998 Cappozzo A (1983) The forces and couples in the human trunk during level walking. J Biomech 16 (4):265–277 Cappozzo A et al (1995) Position and orientation in space of bones during movement: anatomical frame definition and determination. Clin Biomech (Bristol, Avon) 10(4):171–178 Cappozzo A et al (1997) Surface-marker cluster design criteria for 3-D bone movement reconstruction. IEEE Trans Biomed Eng 44(12):1165–1174 Cappozzo A et al (2005) Human movement analysis using stereophotogrammetry. Part 1: theoretical background. Gait Posture 21(2):186–196 Chockalingam N et al (2003) A comparison of three kinematic systems for assessing spinal range of movement. Int J Ther Rehabil 10(9):402–407 Chou R et al (2007) Diagnosis and treatment of low back pain: a joint clinical practice guideline from the American College of Physicians and the American Pain Society. Ann Intern Med 147 (7):478–491 Crosbie J, Vachalathiti R, Smith R (1997a) Age, gender and speed effects on spinal kinematics during walking. Gait Posture 5(1):13–20

Trunk and Spine Models for Instrumented Gait Analysis

11

Crosbie J, Vachalathiti R, Smith R (1997b) Patterns of spinal motion during walking. Gait Posture 5 (1):6–12 Don R et al (2012) Instrumental measures of spinal function: is it worth? A state-of-the art from a clinical perspective. Eur J Phys Rehabil Med 48(2):255–273 Frigo C et al (1998) Functionally oriented and clinically feasible quantitative gait analysis method. Med Biol Eng Comput 36(2):179–185 Frigo C et al (2003) The upper body segmental movements during walking by young females. Clin Biomech (Bristol, Avon) 18(5):419–425 Hamill J, Palmer C, Van Emmerik REA (2012) Coordinative variability and overuse injury. SMARTT 4(1):45 Haneline MT et al (2008) Determining spinal level using the inferior angle of the scapula as a reference landmark: a retrospective analysis of 50 radiographs. J Can Chiropr Assoc 52 (1):24–29 Hara R et al (2014) Quantification of pelvic soft tissue artifact in multiple static positions. Gait Posture 39(2):712–717 Heyrman L, Feys H, Molenaers G, Jaspers E, Van de Walle P, Monari D, Aertbeliën E, Desloovere K (2013) Reliability of head and trunk kinematics during gait in children with spastic diplegia. Gait Posture 37(3):424–429 Heyrman L et al (2014) Altered trunk movements during gait in children with spastic diplegia: compensatory or underlying trunk control deficit? Res Dev Disabil 35(9):2044–2052 Kisho Fukuchi R et al (2010) Evaluation of alternative technical markers for the pelvic coordinate system. J Biomech 43(3):592–594 Leardini A et al (2005) Human movement analysis using stereophotogrammetry. Part 3. Soft tissue artifact assessment and compensation. Gait Posture 21(2):212–225 Leardini A et al (2011) Multi-segment trunk kinematics during locomotion and elementary exercises. Clin Biomech (Bristol, Avon) 26(6):562–571 Lee R (2002) Measurement of movements of the lumbar spine. Physiother Theory Pract 18 (4):159–164 Levine D, Richards J, Whittle MW (2012) Whittle’s Gait Analysis, fifth ed. Churchill Livingstone, London Lu TW, O’Connor JJ (1999) Bone position estimation from skin marker co-ordinates using global optimisation with joint constraints. J Biomech 32(2):129–134 MacWilliams BA et al (2013) Assessment of three-dimensional lumbar spine vertebral motion during gait with use of indwelling bone pins. J Bone Joint Surg (Am Vol) 95(23):e1841–e1848 Manal K et al (2000) Comparison of surface mounted markers and attachment methods in estimating tibial rotations during walking: an in vivo study. Gait Posture 11(1):38–45 Mason DL et al (2016) Reproducibility of kinematic measures of the thoracic spine, lumbar spine and pelvis during fast running. Gait Posture 43:96–100 McClelland JA et al (2010) Alternative modelling procedures for pelvic marker occlusion during motion analysis. Gait Posture 31(4):415–419 McGinley JL et al (2009) The reliability of three-dimensional kinematic gait measurements: a systematic review. Gait Posture 29(3):360–369 Needham R, Naemi R, Chockalingam N (2014) Quantifying lumbar-pelvis coordination during gait using a modified vector coding technique. J Biomech 47(5):1020–1026 Needham RA, Naemi R, Chockalingam N (2015) A new coordination pattern classification to assess gait kinematics when utilising a modified vector coding technique. J Biomech 48 (12):3506–3511 Needham R, Naemi R, Healy A, Chockalingam N (2016a) Multi-segment kinematic model to assess three-dimensional movement of the spine and back during gait. Prosthet Orthot Int 40 (5):624–635 Needham R, Stebbins J, Chockalingam N (2016b) Three-dimensional kinematics of the lumbar spine during gait using marker-based systems: a systematic review. J Med Eng Technol 40 (4):172–185

12

R. Needham et al.

Nguyen TC, Baker R (2004) Two methods of calculating thorax kinematics in children with myelomeningocele. Clin Biomech 19(10):1060–1065 Perry J (1992) Gait Analysis: Normal and Pathological Function. SLACK Incorporated Perry J, Burnfield J (2010) Gait analysis: normal and pathological function, 2nd ed. Thorofare, NJ: SLACK Incorporated Peters A et al (2010) Quantification of soft tissue artifact in lower limb human motion analysis: A systematic review. Gait Posture 31(1):1–8 Rab G, Petuskey K, Bagley A (2002) A method for determination of upper extremity kinematics. Gait Posture 15(2):113–119 Richards J (2008) Biomechanics in Clinic and Research. Churchill Livingstone, London Rozumalski A et al (2008) The in vivo three-dimensional motion of the human lumbar spine during gait. Gait Posture 28(3):378–384 Schache AG et al (2002) Intra-subject repeatability of the three dimensional angular kinematics within the lumbo-pelvic-hip complex during running. Gait Posture 15(2):136–145 Seay J, Selbie WS, Hamill J (2008) In vivo lumbo-sacral forces and moments during constant speed running at different stride lengths. J Sports Sci 26(14):1519–1529 Seay JF, Van Emmerik REA, Hamill J (2011) Influence of low back pain status on pelvis-trunk coordination during walking and running. Spine 36(16):E1070–E1079 Steele J et al (2015) A randomized controlled trial of the effects of isolated lumbar extension exercise on lumbar kinematic pattern variability during gait in chronic low back pain. PM R 8 (2):105–114 Syczewska M, Oberg T, Karlsson D (1999) Segmental movements of the spine during treadmill walking with normal speed. Clin Biomech (Bristol, Avon) 14(6):384–388 Taylor NF, Evans OM, Goldie PA (1996) Angular movements of the lumbar spine and pelvis can be reliably measured after 4 minutes of treadmill walking. Clin Biomech (Bristol, Avon) 11 (8):484–486 Thorstensson A et al (1984) Trunk movements in human locomotion. Acta Physiol Scand 121 (1):9–22 Thurston AJ (1982) Repeatability studies of a television/computer system for measuring spinal and pelvic movements. J Biomed Eng 4(2):129–132 Van Emmerik REA et al (2005) Age-related changes in upper body adaptation to walking speed in human locomotion. Gait Posture 22(3):233–239 Vogt L et al (2001) Influences of nonspecific low back pain on three-dimensional lumbar spine kinematics in locomotion. Spine 26(17):1910–1919 Wu G et al (2002) ISB recommendation on definitions of joint coordinate system of various joints for the reporting of human joint motion – part I: ankle, hip, and spine. J Biomech 35(4):543–548 Wu G et al (2005) ISB recommendation on definitions of joint coordinate systems of various joints for the reporting of human joint motion – part II: shoulder, elbow, wrist and hand. J Biomech 38 (5):981–992

Upper Extremity Models for Clinical Movement Analysis Andrea Giovanni Cutti, Ilaria Parel, and Andrea Kotanxis

Abstract

The quantitative analysis of upper-extremity motion is a challenging task. A single, universally accepted methodology does not exist, but it is possible to define a standardized way to report a measurement protocol and to formulate recommendations on the most important aspect. The aim of this chapter is to provide such guidelines, addressing common issues such as joint modeling, scapula tracking, soft-tissue artifact compensation, and summary of results. Keywords

Upper-extremity • Shoulder • Scapula • Elbow • Functional axis • Motion analysis • Kinematics • Optoelectronic system • Inertial sensors • Protocol • Measurement methods • Scapulo-humeral rhythm • Marker-set • Coordinate system • Anatomical landmarks • Functional frames • Cluster • Skin artifacts

Contents State of the Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Guidelines for Upper-Limb Motion Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A Framework for the Definition of Standardized Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Joints/Segments of Interest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mechanical Model of Joints/Degrees of Freedom of Segments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Definition of Segment or Joint CSs and Angles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2 3 3 5 5 7

A.G. Cutti (*) Applied Research, INAIL Prosthetic Center, Vigorso di Budrio, BO, Italy e-mail: [email protected] I. Parel Unit of Shoulder and Elbow Surgery, Cervesi Hospital, Cattolica, RN, Italy e-mail: [email protected] A. Kotanxis Leon Root Motion Analysis Laboratory, Hospital for Special Surgery, New York, NY, USA e-mail: [email protected] # Springer International Publishing AG 2017 B. Müller, S.I. Wolf (eds.), Handbook of Human Motion, DOI 10.1007/978-3-319-30808-1_30-1

1

2

A.G. Cutti et al.

Marker/Sensor Set-Up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Activities to Be Measured . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Extraction of Summarizing Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Upper-Extremity Models for Inertial and Magnetic Sensor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Future Direction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

10 17 17 18 20 21 21

State of the Art The upper limb is composed of six body segments (thorax, clavicle, scapula, humerus, forearm, and hand) and three main joints (shoulder, elbow, and wrist). Upper-extremity models attempt to reconstruct segments’ attitude (position and orientation) and joints’ kinematics during arm movements. This chapter will focus on shoulder and elbow, and occasionally discuss about wrist and hand. Quantitative motion analysis of the upper-extremity is not new, as described by Anglin and Wyss (2000), and received a great impulse in the late 1980–early 1990, in particular by the works produced in The Netherlands and UK, using the “palpator” and electromagnetic sensors (Anglin and Wyss 2000; Johnson and Anderson 1990; Van der Helm et al. 1992; Van der Helm ad Pronk 1995; Pronk and van der Helm 1991). Given the variety of tasks performed by the upper-extremity and the overall complexity of the system, researchers adopted a broad range of instrumentation and methods to address their specific needs. Over the last 10 years, however, instrumentations tended to stabilize, with a predominance of electromagnetic and stereophotogrammetric systems, thanks to the availability of well-established commercial products (“3D Dynamic Pose Estimation Using Reflective Markers or Electromagnetic Sensors”; “3D Dynamic Probabilistic Pose Estimation From Data Collected Using Cameras and Reflective Markers.”). The situation for the models used to extract biomechanically meaningful parameters from the instrumentation (“protocols” in short) is partially different. As opposed to clinical gait analysis, in which the “Plug-in gait” protocol (or its slight derivations) has reached enough popularity to be named “conventional”, this is not completely the case for the upper-extremity. Establishing a standardized protocol is a delicate matter and requires the completion of a detailed process. The application of motion analysis cannot be reduced to the “where do I apply the markers?” type of questions, but it starts from the clinical problem, passes through the definition of a mathematical model of the body, the construction of coordinate systems (CSs), and then markers start playing a role on how to track the CSs. In the remaining of the chapter we will explain these steps in details, by recalling the outline proposed by Kontaxis et al. (2009). This will give the framework to summarize which are the available recommendations. Whenever possible, we will

Upper Extremity Models for Clinical Movement Analysis

3

refer to the proposals formulated by the International Society of Biomechanics (ISB), or by the International Shoulder Group (ISG), one of ISB’s technical groups. Following this outline, sections “Definition of Segment or Joint CSs and Angles” and “Marker/Sensor Set-Up” will report about models easily applicable with any electromagnetic or stereophotogrammetric system. However, these might be difficult or cumbersome to apply to the (now popular) inertial & magnetic measurement systems (IMMS or IMUs). IMMS are wearable system containing accelerometers, gyroscopes, and possibly magnetometers and barometers that, through sensor fusion techniques, provide information about the orientation of the sensors themselves (“Three-dimensional Human Kinematics Estimation Using Magneto-Inertial Measurement Units”). The ISB/ISG standard for the CSs is hardly applicable to these sensors, and a recent meeting during the ISB Conference in Glasgow (2015) clearly showed the interest of the scientific community to formulate recommendations for this emerging technology. A few protocols have been proposed so far on this topic, but at present the most clinically validated is the ISEO protocol described in Cutti et al. (2008). To introduce the reader to these new topics, section “Upper-Extremity Models for Inertial and Magnetic Sensor” will provide a brief overview of ISEO (Parel et al. 2016).

Guidelines for Upper-Limb Motion Analysis A Framework for the Definition of Standardized Protocols Kontaxis et al. (2009) proposed a framework to support the definition of standardized protocols, their description, and the formulation of general recommendations. The framework is not only useful for developers, but it also represents a useful checklist for the young biomechanist or the clinician who wants to verify what is implemented in the commercial system available in the laboratory and make sure that the system is really suitable for the (clinical) needs. The steps for a proper definition of a standardized protocol are summarized in Fig. 1. The framework is composed of two nested flowcharts. The first flowchart (left panel) defines what is meant by a motion analysis study and the role of a motion analysis protocol. The second flowchart (central panel) specifies the steps required to build a motion analysis protocol appropriate to the study design. On the right hand panel, a summary of basic recommendations and current standards is reported, together with a list of open issues. The first flowchart is important because it reminds the basics of any scientific approach: the first step is a clear definition of the research question formulated in terms of hypotheses, simple and objective. This is a challenging task, because clinical problems can be multifactorial and curiosity can push us to formulate too many questions and ultimately ask too much to the patients/subjects. Good hypotheses will naturally indicate the parameters to measure and this will be the goal of our protocol. The protocol will be applied on subjects and patients to test the hypotheses

4

A.G. Cutti et al.

Fig. 1 Flowchart for the definition of a motion analysis protocol, based on Kontaxis et al. 2009

and possibly confirm them. The “protocol” is therefore a tool that should be simple and specific enough to provide the data to solve our questions. Following this line of reasoning, it might be objected that each clinical question could deserve a special, different protocol. While we think that if a single marker solves the problem that is the way to pursue, the researcher should keep in mind that replication of experiments and confirmation of results by independent centers is the foundation of science. This is especially true in the motion analysis field, in which

Upper Extremity Models for Clinical Movement Analysis

5

the number of subjects typically recruited is incomparably smaller than in the pharmaceutical field: no standardization, limited comparability, no common databases, and difficult meta-analysis. Moreover, patients and clinicians called to make clinical decisions on the reports provided by motion analysis laboratories should be confident in the quality of the data supplied. Therefore, we strongly believe that a basic protocol for the upper-extremity should be looked for. The second flowchart examines the construction of a motion analysis protocol and next to it, suggestions are made. These points will be commented separately in sections “Joints/Segments of Interest,” “Mechanical Model of Joints/Degrees of Freedom of Segments,” “Definition of Segment or Joint CSs and Angles,” “Marker/Sensor Set-Up,” “Activities to be Measured,” and “Extraction of Summarizing Parameters.”

Joints/Segments of Interest It is important to define if segments and/or joints are of interest. As described in Kontaxis et al., segment kinematics is defined as the attitude (position and orientation), of one bony segment with respect to either a global CS or a nonadjacent bone. Joint kinematics is defined as the description of the relative attitude of two adjacent bony segments. Typical segments of interest are the thorax, clavicle, scapula, humerus, forearm, and carpus of the hand. Axial rotations of the clavicle are almost impossible to track noninvasively (Ludewig et al. 2004; Marchese and Johnson 2000; Teece et al. 2008). For this reason, it was proposed to consider a fictitious segment linking sternum and glenohumeral head and refer that as “shoulder girdle” (Garofalo et al. 2009). Typical joints of interest are the sternoclavicular/girdle, scapulothoracic, glenohumeral, elbow, and wrist. By looking at the literature, it is common to describe elbow joint kinematics and to have a mixed segment/joint description of the shoulder; both scapula and humerus are usually referred to the thorax to describe the so-called scapulohumeral rhythm (SHR), which is the coordinated movement between scapula and humerus while the arm is elevated. Similarly, for the clavicle/girdle-humeral rhythm, with both segments referred to the thorax.

Mechanical Model of Joints/Degrees of Freedom of Segments For the segments and joints of interest, it is essential to define the degrees of freedom (DoFs), i.e., how they can move. A simple and commonly applied open-chain model is reported in Fig. 2, focusing on joint kinematics. Segments are depicted as bars connected by hinges, each representing a rotational degree of freedom. Only rotational DoFs are assumed between segments because noninvasive motion analysis does not allow a reliable description of translations within joints, given the relative

6

A.G. Cutti et al.

Fig. 2 A simple biomechanical model of the upper-extremity

movement of the sensors positioned on the skin with respect to the underlying bones (the so-called skin artifact or soft-tissue artifact). The order in which the hinges are connected and named is very important. Different sequences will bring different descriptions of the joint/segment orientation for the same movement, as reported in Karduna et al. (2000). The description provided in this step will need to be consistent with the Euler decomposition chosen in the next step (section “Definition of Segment or Joint CSs and Angles”). None of the sequences is wrong or correct per se, but some sequences are closer to the typical “clinical language” or easier to interpret, and for this reason, they are most frequently used or recommended. Specifically, the sequence in Fig. 2 is the one recommended in Kontaxis et al. (2009) for shoulder movements in (mostly) the sagittal plane. It changes for frontal plane movements into abduction–adduction, flexion–extension, and internal–external rotation. The one preferred by the ISB is different: selection of the plane of elevation, elevation, and internal–external rotation. Kontaxis and coworkers made a different suggestion compared to ISB to avoid the gimbal-lock position (undefined decomposition) when the humerus is vertical (down or up). This comes to the expense of a different decomposition based on the predominant plane of movement, which is not present in the ISB recommended sequence. This topic is further discussed in the section “Definition of Segment or Joint CSs and Angles.” Interestingly, motion analysis data can be used as input to complex upperextremity models, such as those implemented in OpenSim. OpenSim is a freely

Upper Extremity Models for Clinical Movement Analysis

7

available and open-source software (Delp et al. 2007). Users can develop models of musculoskeletal structures and create dynamic simulations of a wide variety of movements. In the last years, the OpenSim community has published and tested several upper-extremity models, and recently, a new OpenSim Shoulder Model has been developed with an improved description of the scapulothoracic and glenohumeral joints (Seth et al. 2016). Preferential axes of rotations of joints must be clearly incorporated and marked. While scapulathoracic and glenohumeral joints are commonly considered with three rotational DoF, this simplification is not supported by the literature considering the elbow. The elbow can be reasonably described as a double-hinge joint with nonintersecting axes of rotation. The first hinge describes the rotation of ulna and radius with respect to the humerus (flexion–extension), and the second describes the movement of the radius around the ulna (pronation–supination). There is sometimes confusion about the so-called carrying angle, and it is very important to distinguish between the anatomical and biomechanical definition. The first depicts a purely exterior situation, i.e., it states that the carrying angle is the angle between the long axis of the humerus and the ulna, when the elbow is completely extended and supinated, and decreases with elbow flexion; in the reference position, the carrying angle ranges between 10 to 15 (Goto et al. 2004). The second definition states that the carrying angle measures the relative orientation of the flexion–extension and pronation–supination axes of the joint and therefore it is an (almost) constant value which is subject specific (Stokdijk et al. 2000; Cutti et al. 2006a, b, 2008). Therefore, the carrying angle is not affected by the amount of elbow flexion–extension and pronation–supination. This second definition should always be preferred in biomechanical studies focusing on the elbow joint. The same holds for the wrist, in which an axial rotation of the hand relative to the forearm is not reported.

Definition of Segment or Joint CSs and Angles Local CSs are required to define the attitude of one segment relative to another or to the laboratory CS. CSs must be defined consistently with the mechanical model. As described by Kontaxis et al. (2009), when dealing with segment kinematics, CSs should be representative of the anatomy of the bone. The ISB, in conjunction with the ISG, proposed a set of such CSs that are commonly referred to as “anatomical frames” (AFs). AFs are typically constructed using the position of bony anatomical landmarks (ALs). A list of ALs and AFs is reported in Tables 1 and 2, respectively. The centre of the glenohumeral head (GH) is an internal landmark which cannot be directly identified by palpation. How to estimate it will be the topic of section “Activities to be Measured.” The CS reported for the shoulder-girdle is not part of the ISB proposal but comes from Garofalo et al. (2009). It is important for the reader to know that the planes of an AF approximate the frontal, transverse, and sagittal anatomical planes of the bone. Therefore, the AF

8

A.G. Cutti et al.

Table 1 Anatomical landmarks and functional axes used for the definition of the anatomical/functional CSs. The last column reports the segment/cluster to which the anatomical landmark/functional axis is typically associated to. Please refer to section “Marker/Sensor Set-Up” for the definition of “cluster” and the alternative use with the ulnar cluster Abbreviation IJ PX C7 T8 AA TS AI GH EL

Name Incisura jugularis Xiphoid process 7th Cervical vertebra 8th Thorac vertebra Angulus acromialis Trigonum spinae Angulus inferior Centre of glenohumeral head Lateral epicondyle

EM

Medial epicondyle

RS US M3 Vflex

Radial styloid Ulnar styloid 3rd metacarpus Direction of the elbow flexionextension axis (pointing laterally) Pivot point of the elbow flexionextension axis Direction of the forearm pronosupination axis (pointing proximally)

pflex Vflex

Segment of reference/cluster Thorax Thorax Thorax Thorax Scapula Scapula Scapula Scapula Usual: Humerus Alterative: H4 (see Table 2 and section “Marker/Sensor Set-Up”) Usual: Humerus Alterative: H4 (see Table 2 and section “Marker/Sensor Set-Up”) Forearm Forearm Hand Usual: Humerus Alternative: Ulna Ulna Forearm

“anatomical axes” are only rough approximations of the real axes of rotation of the joint(s) that the segment forms. As such, when anatomical axes are assumed to be the axes about which a joint rotates, the resultant joint kinematics can be substantially affected by kinematic cross-talk (Piazza and Cavanagh 2000; Cutti et al. 2006a, b), i.e., part of the angular variation of a DoF is read on another DoF. This is why the ISB standard (Wu et al. 2005) can be misleading when talking about Joint CSs (JCS) and using anatomical axes to calculate the rotations of adjacent bones. While the situation is relatively acceptable when joints are modeled as ball and sockets, the matter becomes critical for joints such as the elbow, in which axes of rotations are not aligned with the anatomical axes of the humerus and forearm. Similarly, this is dangerous for the description of joints (girdle-thoraxic or wrist) in which the kinematic model assumes a restricted number of DoF, i.e., not the full three rotational DoF, as in Fig. 2. To avoid the limitations above, the use of functional frames (FF) is recommended for joints of interest. As defined in Kontaxis et al. (2009), a FF is a CS associated with a segment and is specifically intended to describe the kinematics of a joint formed by the segment. The FF is based on at least one functional axis of rotation of

Upper Extremity Models for Clinical Movement Analysis

9

Table 2 Definition of the anatomical and functional frames. Please refer to section “Marker/Sensor Set-Up” for details on how to track CSs over time Segment Thorax (THX)

Shoulder girdle (GRD)

Scapula (SC)

Proximal humerus (H1)

Proximal humerus (H2)

Distal humerus (H3)

Forearm F1

Proximal forearm (F)

Hand

Axes YTHX = ((IJ + C7) / 2  (PX + T8) / 2)) / k (IJ + C7) / 2  (PX + T8) / 2 k : longitudinal XTHX = YTHX ^ (T8  PX) / k YTHX ^ (T8  PX) k : medio - lateral ZTHX = XTHX ^ YTHX : antero - posterior Origin: IJ XGRD = (GH  (IJ + C7) / 2) / k (GH  (IJ + C7) / 2) k : medio - lateral ZGRD = (XGRD ^ YTHX) / k  k : medio - lateral YGRD = (ZGRD ^ XGRD) / k  k : upward Origin: IJ XSC = (AA  TS)/ k AA  TS k : medio - lateral ZSC = (XSC ^ (AA  AI) / k XSC ^ (AA  AI) k : antero - posterior YSC = (ZSC ^ XSC) / k  k : longitudinal Origin: AA YH1 = (GH  E) / k GH  E k : longitudinal ZH1 = (YH1 ^ (EM  EL)) / k YH1 ^ (EM  EL) k : antero - posterior XH1 = YH1 ^ ZH1 / k  k : medio - lateral E = (EL + EM)/2 Origin: GH YH2 = (GH  E) / k GH  E k : longitudinal XH2 = (YH1 ^ YPS) / k YH1 ^ YPS k : antero - posterior ZH2 = XH2 ^ YH2 / k  k : medio - lateral Origin: GH XH3 = VFLEX / k VFLEX k : medio - lateral ZH3 = XH3 ^ (GH  REF) / k XH3 ^ (GH  REF) k : antero - posterior YH3 = (ZH3 ^ XH3) / k ZH3 ^ XH3 k : longitudinal REF = E (usual) or pflex(alternative when using ulnar cluster) Origin = REF YF1 = (E – US) / ||(E – US)|| : longitudinal ZF1 = (RS – US) ^ YF1 / ||(RS – US) ^ YF1|| : antero  posterior XF1 = YF1 ^ ZF1 : medio  lateral Origin = E YF = VPS / k VPS k : longitudinal ZF = ((RS  US) ^ YF / k (RS  SU) ^ YF k : antero - posterior XF = (YF ^ ZF) / k  k : medio - lateral S = (US + RS) / 2 Origin: S YHN = (S  M3) / k S  M3 k : longitudinal ZHN = (YHN ^ (US  RS)) / k YHN ^ (US  RS) k : anterior - posterior XHN = (YHN ^ ZHN) / k  k : medio - lateral Origin: M3

10

A.G. Cutti et al.

the joint, best expressed in an AF associated with the segment. A functional axis of a joint is the axis of rotation of the distal and proximal segments that are forming the joint, when these are actively or passively rotated relative to each other. For a pure hinge joint, the functional axis coincides with the rotational axis of the hinge. For an “almost hinge” joint (e.g., the elbow and the knee), the functional axis is assumed as the mean axis of rotation, computed from the joint instantaneous axes of rotation. For a ball and socket joint, there are no preferential axes of rotation. However, if the distal segment is rotated in a constant plane, the functional axis is defined as the axis perpendicular to the plane of rotation. Functional axes can be computed through a number of algorithms. One of the most commonly used is based on the estimation of Instantaneous Helical Axes (Woltring 1990) which expresses the mean axis of rotation of the joint of interest in the AF of the corresponding segment of interest. Other algorithms exist which also proved to be effective (Halvorsen et al. 1999; Gamage and Lasenby 2002). With the functional axis expressed in the AF of the segment of interest, FF can then be constructed (1) by using a combination of more functional axes of rotation or (2) by using a combination of the functional and anatomical axes. When computing the kinematics of the joint, the functional axis is assumed to be the axis around which joint rotations occur. This minimizes the kinematic cross-talk. Table 2 reports a recommended set of FFs to track the elbow: one for the distal humerus and one for the proximal forearm. It is therefore recommended to use two systems associated to humerus and forearm (the use of H3 is further discussed in section “Activities to be Measured”). The proximal humerus CS is intended to describe the glenohumeral joint or the humerus relative to the thorax, and the distal one to describe the elbow (Fig. 3). Similarly, a proximal forearm for the elbow and a distal forearm for the wrist. Also, it is important to notice that the CSs for the distal forearm and hand, as well as girdle and thorax, implement a constraint on the DoFs, which is consistent with the model of Fig. 2. Two coordinate systems are reported for the proximal humerus, H1 and H2. H1 is most commonly applied, but H2 has advantages for tracking the humerus internal–external rotation because it is insensitive to the soft-tissue artifact when the elbow is kept at a constant position, as described in section “Marker/Sensor Set-Up.” The typical Euler decompositions used to calculate joint angles from the relative orientation of CSs is reported in Table 3.

Marker/Sensor Set-Up The application of noninvasive methods to identify (van Sint Jan 2007) and track with sufficient accuracy the position of the ALs and of the anatomical/functional CSs during a motion is critical and often a challenging task. Before discussing recommendations about markers/sensors placement, it is important to define the purpose of the Technical Frame (TF). Kontaxis et al. (2009) report the following definition; a TF is a CS associated with a body segment.

Upper Extremity Models for Clinical Movement Analysis

11

Fig. 3 Description of the anatomical and functional frames for the humerus. In particular, a proximal anatomical frame can be defined (H1 or H2) to describe the glenohumeral kinematics. H3 can be used to optimally describe the elbow kinematics (Reprint from Kontaxis et al. 2009)

Table 3 Sequence of Euler angles for each segment/joint kinematics of interest Segment/Joint Thorax relative to global CS Scapula relative to thorax Shoulder-girdle relative to thorax Humerus relative to thorax

Forearm relative to humerus Hand relative to forearm

Euler Sequence (positive sign) XZ0 Y00 (flexion-abduction-internal rotation) YZ0 X00 (protraction-lateral rotation-posterior tilt) YZ0 X00 (protraction-lateral rotation-dummy) Mostly sagittal plane movements: XZ0 Y00 (flexion-abduction-internal rotation) Mostly frontal plane movements: ZX0 Y00 (abduction-flexion-internal rotation) XZ0 Y00 (flexion-carrying angle-pronation) Y0 Z0 X0 (dummy – radial deviation-flexion)

It has normally no repeatable reference to the morphology of the segment and as such has an arbitrary position and orientation with respect to the bone. The placement of markers or sensors that define a TF is usually such as to help with some technical difficulties that can occur during tracking a motion (e.g., minimize soft tissue artifact, enhance visibility and comfort, adapt to the muscular constitution of the subject, etc.). For optoelectronic systems, a TF is usually constructed by using the instantaneous position of at least three nonaligned superficial markers positioned on the segment (altogether referred to as “cluster”), based on an arbitrary geometric

12

A.G. Cutti et al.

rule. Clusters with a redundant number of reflective markers (4 or more) are very common when visibility or marker occlusion is a concern. For other measurement systems, e.g., electromagnetic or inertial and magnetic, a TF is the embedded local CS in the sensor that is attached on the segment. Hereinafter, we will adopt the term “cluster” to refer to both clusters of markers and to a sensor with an embedded CS. There are not universally accepted recommendations for cluster placement, size, and shape, since it can vary significantly based on the segment to track and the requirements of the motion to be recorded. However, there are some common practices that have been adopted in current studies. These will be firstly reviewed. Then, a more technical description of the options for the different segments will be provided. The use of clusters prepositioned on a rigid plastic plate (e.g., 3D printed) or low-temperature thermoplastic material (e.g., Thermolyn Pedilon, Ottobock, D) is very popular, and it is equally applicable with electromagnetic/inertial sensors. Once the cluster is prepared, it is positioned on the body segment. Markers can also be glued on elastic bandages wrapped around the segment to form a cluster. The general goal is to ensure an easy tracker placement on the subjects, without causing any movement restrictions. One of the basic practices is to shape the cluster with an asymmetric design and as big as possible (to maximize marker visibility and identification). More specific indications for clusters are provided in Cappozzo et al. (1997). The use of clusters is preferred compared to placement of markers directly on ALs since it has been shown to reduce skin artifacts (Cappozzo et al. 2005). However, the process of recognizing and correctly palpating the landmarks is critical and is recommended to practice between operators within the same laboratory to increase repeatability (McGinley et al. 2009). With clusters in place, ALs are calibrated with a pointer which has an embedded cluster, by applying what is known in gait analysis as the “CAST approach” but that was described for upper-extremity by Johnson et al., and van der Helm et al. in the early 1990 (Johnson and Anderson 1990; Van der Helm et al. 1992). The pointer is built with a known geometry so that the tip of the pointer has known coordinates in its embedded technical frame. The tip of the pointer is gently positioned on the AL, while the clusters on the segment and on the pointer are both visible to the measurement system. It is easy, at that stage, to calculate the coordinate of the AL in the TF of the segment (Cappozzo et al. 2005). Once all coordinates of the ALs are known in the segment TF, it is equally easy to build the AF and link it to the TF that is tracked by the measurement system over time. Sometimes, ALs are located within the body and are impossible to palpate (e.g., GH centre). Estimation of the position of those landmarks can be based either on regression equations or functional movements that are actively or passively executed by the subject that is under analysis. This is discussed in details in section “Activities to be Measured.” The GH centre of rotation is commonly reported with respect to the anatomical CS of the scapula (Veeger et al. 1997; Nikooyan et al. 2011; Lempereur et al. 2010; Lempereur et al. 2013). Campebell et al. (2009) recommended estimating the GH as the midpoint between the GH that is tracked by the acromion/scapula tracker and the proximal humerus. However, results were based on static measures

Upper Extremity Models for Clinical Movement Analysis

13

Fig. 4 Marker-set used at the INAIL Prosthesis Center, Italy. For the marker-set used at the Hospital for Special Surgery see Fig. 7

and noise and tracking issues were not considered. We are currently tracking GH with the acromion cluster. Pictures describing the cluster set applied by the INAIL Prosthesis Centre is shown in Fig. 4 and by the Hospital for Special Surgery in Fig. 7. Considering the thorax, the cluster is usually positioned on the upper portion of the sternum (close to IJ), centered with respect to the midline of the thorax. An alternative popular tracking method for the thorax is to place four individual markers; two of them located between the manubrium and the sternum and the other two on the spine and between the landmarks of C7 and T8. Both solutions are equally possible, but the former might be easier to apply on females and on overweight subjects. Considering the scapula, different tracking methods are possible. One of the first methods that was applied to record scapula motion was the scapula locator (Johnson et al. 1993; Barnett et al. 1999), which has been previously defined as “the silver standard” (Cutti and Veeger 2009) for scapula tracking (Fig. 5). The scapula locator is a rigid configurable structure that is designed to locate the three most palpable landmarks of the scapula (acromion tip, root of scapula spine, inferior angle) and collect a sequence of static positions. A regression equation that describes scapula motion is built based on those discrete positions. More recently, noninvasive dynamic scapula tracking methods are preferred, namely, the scapula tracker, the acromion tracker, and the scapula spine tracker (Fig. 6). The scapula tracker is designed to follow the scapula motion by tracking the scapula spine and the acromion, by means of a base that has a hinge joint that conforms to the scapula spine and a footpad that rests on the acromion (hinge joint and footpad are connected and are adjustable in length and height in order to fit different scapula sizes, as recommended by Karduna et al. 2001).

14

A.G. Cutti et al.

Fig. 5 The scapula locator is adjustable and specially designed to locate the most palpable landmarks of scapula (Johnson et al. 1993; Barnett et al. 1999). Data can be recorded in static humeral positions and after palpating and locating the position of the scapula ALs. Here in combination with the MTx Xsens sensors (IMMS) when applying the ISEO protocol (Cutti et al. 2008)

The acromion tracker consists of a cluster placed on the skin over the acromial plate of the scapula. This is one of the most popular tracking methods because the design is simpler compared to the scapula tracker and it is easily applicable. It has shown to record reliable scapula kinematics data for humerus elevation up to 100 degrees (Van Andel et al. 2009; Prinold et al. 2011; Lempereur et al. 2014). In order to minimize soft tissue artifacts, the cluster should be positioned at the meeting point of the acromion and the scapular spine (Shaheen et al. 2011; Lempereur et al. 2014), and multiple calibration of ALs should be adopted for the full range of motion (Brochard et al. 2011). The actual shape of the acromion cluster varies from laboratory to laboratory. A possible design, which can be easily replicated through a 3D printer, was designed at Centro Protesi INAIL by one of the authors (AGC) and is available for download through the ISG website. Finally, the spine tracker collects scapula kinematics by means of a cluster glued on a box 47 mm  30 mm  13 mm in size (the cluster can be replaced by an IMU embedded into the box itself). The box is positioned over the central third of the scapula spine, between AA and TS, aligned with the upper edge of the spine (Cutti et al. 2008; Parel et al. 2014). Humerus H1 and H3 CSs are typically tracked with a cluster on the distal part of the humerus itself. This position has shown to have the optimal tracking of the humeral long-axis by Hamming et al. (2012). However, the reader should be aware of the soft-tissue artifact affecting the humerus axial rotation. Cutti et al. (2005) reported that humeral clusters can significantly underestimate the internal/external

Upper Extremity Models for Clinical Movement Analysis

15

Fig. 6 Images depicting the scapula tracker (a) (model in use at the Hospital for Special Surgery, USA), the acromion tracker (b) (model in use at the INAIL Prosthetic Center, Italy), and the spine tracker (c) (applied with the ISEO protocol and MTx Xsens sensors)

humerothoracic rotation as much as 48% of the effective humeral axial rotation performed. Accurate estimation of humeral internal/external rotation can be of high importance depending on the clinical question, especially in the area of sports biomechanics. As an example, in overhead pitching, a sport where shoulder injuries occur in 57% of the athletes during a season, it is important to have accurate measurements of humeral axial rotation in order to fully understand the mechanics of throwing. To tackle this problem, some authors proposed alternative methods to

16

A.G. Cutti et al.

Fig. 7 The size of the ulnar cluster is small and has an oval shape (b), it is positioned in the proximal ulnar and distal from the olecranon (a)

compensate for the artifact, based on the humerus CS H2 (Table 2) possibly associated with specific algorithms (Cutti et al. 2006c) or through the adoption of specific marker-sets, e.g., using two markers on the ulna Rettig et al. (2009). A further alternative has been developed at the Hospital for Special Surgery (Kontaxis et al. 2014), who used an additional marker cluster on the ulna (together with the humeral cluster) in order to increase accuracy on the estimation of the internal/external rotation during pitching. The design of this cluster, referred herein as “ulnar cluster,” has an oval shape and is small in dimensions ( 0.97) ® between GAITRite and paper-andpencil method; high agreement in temporal measures (ICC > 0.95) ® between GAITRite and video-based method High correlations among all five kinematic parameters (Pearson correlation coefficient > 0.94). Systematic difference in step length (~3 cm) and gait speed (~0.1 m/s) High agreement between systems for all spatial measures (ICC > 0.9). Fair to moderate association for the SLS and DS%. Good inter-trial reliability among most gait parameters (ICC > 0.8) High degree of similarity between systems on all gait parameters (ICC > 0.92). No systematic bias was observed

(Nelson et al. 2002; Stover 2005; Chien et al. 2006), Alzheimer’s disease (Wittwer et al. 2008), and stroke (Kuys et al. 2011; Lewek and Randall 2011; Wong et al. 2014; Cho et al. 2015). Detailed description about the participant population,

8

R. Sun et al.

validation techniques, and key observations from aforementioned studies are listed in Table 2. Across all studies and populations, base of support and foot angle measures were the least reliable (ICC range 0.2–0.8), possibly due to the spatial resolution of system and the software algorithm for data extraction (Menz et al. 2004) and thus should be treated with caution. Even though most of the aforementioned studies focusing on validity and reli® ability on low density pedobarograph systems were conducted with the GAITRite system, it is assumed that other systems are similarly valid and reliable as the underlining technology is similar. The appropriateness of this assumption is not clear.

Current State of Clinical and Research Application The ease of use and automated analysis features make the low density pedobarograph walkway systems a standard method for gait assessment across various populations. Since the early validation studies, over 400 research publications have been conducted with the low density pressure-sensing instrumented walkway systems. Gait assessment using these systems have been performed on various populations including but not limited to infants (Garciaguirre et al. 2007), children (Dusing and Thorpe 2007), young adults, middle-aged adults, and older adults (Verghese et al. 2009), as well as individuals with various pathological conditions (down syndrome (Wu et al. 2007), cerebral palsy (Rinehart et al. 2006), attention-deficit/hyperactivity disorders (Papadopoulos et al. 2014), Tourette syndrome (Liu et al. 2014), traumatic brain injury (Katz-Leurer et al. 2008), leg amputee and prosthetic gait (Highsmith et al. 2010), diabetes (Paul et al. 2009), multiple sclerosis (Sosnoff et al. 2012), Parkinson’s disease (Chien et al. 2006), stroke (Patterson et al. 2010), mild cognitive impairment (Verghese et al. 2007), Alzheimer’s disease (Webster et al. 2006), and cerebellar ataxia (Schniepp et al. 2012). Investigations either quantified the existing gait deficits in the pathological populations that may be used for detecting disease onset and progression or evaluated the improvements of gait due to various therapeutic interventions. Pressuresensing walkway system has also been used as the standardized comparison for the accuracy of gait parameters measured from other techniques, such as body-worn sensors (Hartmann et al. 2009; Kim et al. 2015; González et al. 2016) and markerless motion tracking (Clark et al. 2013).

Guidelines Although the validity and reliability of the pressure-sensing walkway have been well documented, it is also important to implement a standard procedure for gait assessments to ensure outcome measurements are reliable and comparable across different studies. In order to enhance reproducibility of clinical gait measures and for better ® comparability of outcomes using the GAITRite system, in 2006, the European

Low Density Pedoboragraphy as a Gait Analysis Tool

9

Table 2 Summary of the test-retest reliability studies using the pressure-sensing walkway Author (year) Menz et al. (2004)

Subject/ population Young (22–40 years old) and older (76–87 years old) healthy adults

Van Uden and Besser (2004)

Goal Test the inter-trial reliability of the ® GAITRite measurements over a two-week period

Gait parameters Gait speed, cadence, step length, base of support, and toe in/out angle

Healthy adults (age range 19–59 years old)

Test the inter-trial reliability of the ® GAITRite measurements over a one-week period

Thorpe et al. (2005)

Typical developed children (age range 1–11 years old)

To determine the repeatability of the ® GAITRite measurements in health children within same day period

Gait speed, step length, stride length, step time, swing time, stance time, double support time, base of support, and toe in/out angle Gait speed, cadence, step length, base of support, swing time, double support time, toe in/out angle

Rao et al. (2005)

Adults with Huntington’s disease (age range 35–55 years old)

Stover (2005)

Adults with Parkinson’s disease (age range 49–85 years old)

To determine the reliability of the ® GAITRite measurements in patients with Huntington’s disease within same day period To determine the reliability of the ® GAITRite measurements in patients with Parkinson’s disease within same day period

Gait speed, stride time, stride length, cadence, and base of support

Gait speed, cadence, base of support, step length, stride length, and single/ double support percentage of gait cycle

Results Good to excellent reliability for gait parameters (ICC > 0.82), with the exception of base of support and toe in toe out angle in older subjects Good to excellent reliability among spatial-temporal gait measurements (ICC > 0.89), except the base of support (ICC = 0.79) Moderate to good reliability (ICC > 0.6) for most of the spatial parameters. Poor to fair reliability on the base of support and toe in/out angle High reliability of all parameters (ICC > 0.8)

Good reliability across all gait parameters (ICC > 0.8)

(continued)

10

R. Sun et al.

Table 2 (continued) Author (year) Sorsdahl et al. (2008)

Subject/ population Children with cerebral palsy (age range 3–13 years old)

Wittwer et al. (2008)

Adults with Alzheimer’s disease (age range 70–91 years old)

Kuys et al. (2011)

Adults admitted for rehabilitation following stroke (mean age of 64 years old)

Lewek and Randall (2011)

Adults with chronic hemiparesis resulting from stroke (mean age of 56 years old)

Wong et al. (2014)

Adults admitted for rehabilitation following stroke (mean age of 68 years old)

Cho et al. (2015)

Adults admitted for rehabilitation following stroke

Goal To determine the reliability of the ® GAITRite measurements in children with cerebral palsy within same day period To determine the reliability of the ® GAITRite measurements in in patients with Alzheimer’s disease over a week period To determine the reliability of the ® GAITRite measurements in stroke survivors over a 2-day period To determine the reliability of the ® GAITRite symmetry measurements in post-stroke patients over a 10-day period To determine the intra- and interrater reliability of ® the GAITRite measurements in stroke survivors within same day period To determine the reliability of the ® GAITRite measurements

Gait parameters Cadence, step length, stride length, step with, single support time

Results Good reliability of most parameters (ICC > 0.7) except the step width

Gait speed, cadence, step length, stride length, swing time, stance time, base of support, and toe in/out angle Gait speed, cadence, step time, step length, and stance phase duration

High test-retest reliability across all gait parameters (ICC > 0.86)

Gait speed, step length asymmetry, stance time asymmetry, swing time asymmetry

Excellent reliability for all symmetry measurements (ICC > 0.91)

Gait speed, step time, step length, step width

High intra- and inter-rater reliability (ICC > 0.90) across all parameters except step width

Gait speed, cadence, step length, stride length

Excellent reliability for all gait parameters in single task

Good reliability across all gait parameters (ICC > 0.72)

(continued)

Low Density Pedoboragraphy as a Gait Analysis Tool

11

Table 2 (continued) Author (year)

Subject/ population (mean age of 52.5 years old)

Sosnoff et al. (2015)

Adults with multiple sclerosis (age range 18–64 years old)

Goal during performance of single and dual task walking in post-stroke patients over a 2-day period To determine the reliability of gait parameters (measured by ® GAITRite ) in patients with multiple sclerosis over a 6-month period

Gait parameters

Results condition (ICC > 0.98) but not in dual task condition (ICC range 0.69–0.90)

Gait speed, step time, step length, cadence, base of support, double support percentage of gait cycle

High reliability for all gait parameters (ICC > 0.90), with the exception of base of support (ICC = 0.56)

®

GAITRite Network Group published the “guidelines for clinical applications of spatio-temporal gait analysis in older adults” (Kressig and Beauchet 2006). Key guidelines from this report include: 1. Measurements should be performed in a reproducible, well-lit environment. 2. Data collection should exclude any auditory or visual interference for participants. 3. Participants should be allowed to walk in their own footwear that are not slipper type or with heel height exceeding 3 cm. For follow-up gait analysis, subjects should wear the same footwear as was worn at baseline test. 4. Safety measures should be provided in case of an imminent fall. 5. Steady state gait should be tested at different gait speeds (e.g., slow, normal, fast), preferably in randomized order. 6. In order to achieve steady state walking, it is recommended to instruct participants to start walking at least 2 m prior to reach the electronic walkway and stopping at least 2 m beyond it. 7. Assistive devices used by participants, if necessary, should be documented by its type. 8. In order to evaluate stride-to-stride variability, a minimum of three consecutive gait cycles for both left and right side (i.e., a total of 6 foot falls) should be registered in a single walk over trial. Although such guidelines were specifically developed for gait assessments in older adults, it is reasonable to apply them to measurements of gait in other clinical populations.

12

R. Sun et al.

Assessments of Other Dynamic Locomotion and Postural Tasks Although steady state linear walking is the most commonly used paradigm for gait assessment, it is only one aspect of functional gait. Therefore, gait assessments in other dynamic locomotion tasks, such as gait initiation, gait termination, and turning have also be investigated. Traditionally, such dynamic tasks were often evaluated with the three-dimensional motion-tracking system in a laboratory setting. Recently, pressure-sensing walkways have also been used to conduct research in evaluation of subtasks of gait. For instance, Wajda et al. (2015) examined step initiation in multiple sclerosis utilizing a Zeno™ walkway. Specifically, the time from stimulus onset to toe off of the swing foot (step initiation timing) was quantified. It was found that the initiation timing was positively associated with the physiological fall risk score in the sample. In a related investigation on planned gait termination in multiple sclerosis patients (Roeing et al.), the time needed for the estimated center of mass (COMe) to stabilize during stopping phase of gait termination was extracted using a Zeno™ walkway and accompanying software. It was found that MS patients displayed elevated stabilization times for gait termination. In an investigation examining effect of dual task on turning ability in stroke survivors and older adults ® (Hollands et al. 2014), participants walked on a GAITRite walkway and made 90 degree turns under single and dual task condition. It was found that both groups exhibited dual task decrements in turning ability (measured by longer time to turn, higher variability in time to turn, and increased single support time during turning). ® The recent introduction of the GAITRite CIRFace system, which utilizes a series of wireless based square pressure mats, offers customizable walkway pattern that can be used to test the gait characteristics of turning, obstacle crossing/avoidance, stair ascending/descending, and other real-world tasks that have functional significance in the quality of life. At time of publication there was no published work available using the CIRFace system. Postural balance, which is also a major index for the evaluation of functional mobility and risk of falling among aging and pathological populations (Maki et al. 1994), can also be tested using the pedobarograph technique. The linear and nonlinear measures of COP sway during quiet standing, (sway area, sway amplitude, and velocity in anterior-posterior (AP)/medio-lateral (ML) direction, and the sample entropy, etc.) have been shown to be reflective of the mechanism of the balance control (Horak 1997; Cavanaugh et al. 2005). Recent studies have examined pedobarography systems as an alternative method to measure balance, Nomura ® et al. (2009) compared the simultaneous postural sway measures from a Tekscan pressure-mapping system to a force plate (e.g., gold standard). It was concluded that the pressure mat may attenuate the sway amplitude slightly but the overall correlation between systems was very high (concordance correlation coefficient > 0.93). Brenton-Rule et al. (2012) also test the reliability of the postural measures in older ® adults with rheumatoid arthritis using a Tekscan system and found good to excellent reliability (ICC above 0.84) in all sway measures in AP and ML direction. Recent work done by our group used the Zeno™ walkway to assess standing balance in individuals with multiple sclerosis and healthy controls, and compared the COP

Low Density Pedoboragraphy as a Gait Analysis Tool

13

Sway RangeAP

r = 0.8433

Mean Sway Velocity (mm/s)Zeno

Sway RangeAP(mm)Zeno

100

Mean Sway Velocity

50

0

r = 0.9498

60

40

20

0 0

50 Sway RangeAP(mm)FP Control_EO

100 MS_EO

0

20 40 60 Mean Sway Velocity (mm/s)FP

Control_EC

MS_EC

Fig. 3 Scatter plot of the sway range in AP direction and the mean sway velocity measured by force plate and the pressure-sensing walkway (EO eyes open, EC eyes closed, Control healthy control participants, MS patients with multiple sclerosis)

sway measures from the pressure-sensitive walkway with a force plate. In this study, subjects were required to complete 30 s standing balance trials (eyes open and closed) on a Zeno™ walkway and a Bertec force plate (Bertec Corporation, Columbus, OH). The investigation revealed that the force platform and pressure-sensitive walkway have high agreement (Pearson’s correlation coefficient > 0.80) in multiple sway measurements (Fig. 3). Latest development of the pressure sensor-embedded instrumented treadmill ® (FDM-T by Zebris Medical GmbH) overcomes the length restrictions of other instrumented walkways and offers the possibility to investigate long-distance locomotion. Gait characteristics in long distance walking can provide more precise measures of stride-to-stride variability, as there has been suggestion about using at least 400 steps to provide reliable estimation of gait variability (Owings and Grabiner 2003). Moreover, long distance locomotion may also reflect the impact of fatigue on gait performance.

Limitations Although pressure-sensing walkway systems have been widely used for gait assessment in research and clinical setting, there is at least one major limitation that should be taken into consideration. Its use is restricted to a given physical location so it cannot monitor patient’s gait during daily activities outside the lab or clinic nor provide gait assessments on a continuous basis. Longitudinal measurement of gait, rather than a single collection of lab-based test, may provide unique and important information about the progress of disease and symptom fluctuation pattern in

14

R. Sun et al.

response to certain environmental factors (Maetzler et al. 2013; González et al. 2016) and could be used as specific predictive markers of frailty syndrome (Fontecha et al. 2013), the onset of cognitive decline (Camicioli et al. 1998) and neurodegenerative diseases (Salarian et al. 2004), and potential risk of falling (Shany et al. 2012). It is worth noting that in order to better assess gait performance and stability control in locomotion, in addition to the spatial and temporal gait parameters, the dynamic interaction between the COP and COM should also be investigated. Although the estimated center of mass (COMe) measurement has been reported by some pressure-sensing walkway products, its validity has not been established with the three-dimensional motion-tracking system; thus, the COMe measurement should be treated with caution in the use of clinical assessments. The potential on integrating the pressure-sensing walkway with virtual environments should also be explored to assess gait in a simulated free-living locomotion tasks.

Summary Gait assessment conducted through a pressure-sensing walkway system have been widely utilized since the early 2000s. The relative low cost, high accuracy and consistency, and automated data analysis features make it an ideal platform for quantifying the spatial and temporal gait characteristics in various populations. Although steady state linear walking is the most commonly used paradigm for gait assessment, recent technological developments allow the pressure-sensing walkway system to evaluate gait performance in other daily locomotion tasks (e.g., gait initiation/termination, turning, sit to stand, obstacle crossing, stair ascending/ descending, etc.) as well as the postural balance performance.

Cross References ▶ 3D Kinematics of Human Motion ▶ Pressure Platforms ▶ Clinical Gait Assessment by Video Observation and 2D-Techniques ▶ Interpreting Ground Reaction Forces in Gait ▶ Gait Scores – Interpretations and Limitations ▶ Interpreting Spatiotemporal Parameters, Symmetry and Variability in Clinical Gait Analysis ▶ Integration of Foot Pressure and Foot Kinematics Measurements for Medical Applications ▶ Measures to Determine Dynamic Balance ▶ Slip and Fall Risk Assessment ▶ Gait During Real World Challenge: Gait Initiation, Gait Termination, Acceleration, Deceleration, Turning, Slopes and Stairs

Low Density Pedoboragraphy as a Gait Analysis Tool

15

References Baker R (2006) Gait analysis methods in rehabilitation. J Neuroeng Rehabil 3:1 Bilney B, Morris M, Webster K (2003) Concurrent related validity of the GAITRite ® walkway system for quantification of the spatial and temporal parameters of gait. Gait Posture 17:68–74 Brenton-Rule A, Mattock J, Carroll M, Dalbeth N, Bassett S, Menz HB, Rome K (2012) Reliability of the TekScan MatScan ® system for the measurement of postural stability in older people with rheumatoid arthritis. J Foot Ankle Res 5:1 Bridenbaugh SA, Kressig RW (2010) Laboratory review: the role of gait analysis in seniors’ mobility and fall prevention. Gerontology 57:256–264 Camicioli R, Howieson D, Oken B, Sexton G, Kaye J (1998) Motor slowing precedes cognitive impairment in the oldest old. Neurology 50:1496–1498 Cavanaugh JT, Guskiewicz KM, Stergiou N (2005) A nonlinear dynamic approach for evaluating postural control. Sports Med 35:935–950 Chesnin KJ, Selby-Silverstein L, Besser MP (2000) Comparison of an in-shoe pressure measurement device to a force plate: concurrent validity of center of pressure measurements. Gait Posture 12:128–133 Chien S-L, Lin S-Z, Liang C-C et al (2006) The efficacy of quantitative gait analysis by the GAITRite system in evaluation of parkinsonian bradykinesia. Parkinsonism Relat Disord 12:438–442 Chisholm AE, Perry SD, McIlroy WE (2011) Inter-limb Centre of pressure symmetry during gait among stroke survivors. Gait Posture 33:238–243 Cho KH, Lee HJ, Lee WH (2015) Test–retest reliability of the GAITRite walkway system for the spatio-temporal gait parameters while dual-tasking in post-stroke patients. Disabil Rehabil 37:512–516 Clark RA, Bower KJ, Mentiplay BF, Paterson K, Pua Y-H (2013) Concurrent validity of the Microsoft Kinect for assessment of spatiotemporal gait variables. J Biomech 46:2722–2725 Cutlip RG, Mancinelli C, Huber F, DiPasquale J (2000) Evaluation of an instrumented walkway for measurement of the kinematic parameters of gait. Gait Posture 12:134–138 Dusing SC, Thorpe DE (2007) A normative sample of temporal and spatial gait parameters in children using the GAITRite ® electronic walkway. Gait Posture 25:135–139 Eastlack ME, Arvidson J, Snyder-Mackler L, Danoff JV, McGarvey CL (1991) Interrater reliability of videotaped observational gait-analysis assessments. Phys Ther 71:465–472 Fontecha J, Navarro FJ, Hervás R, Bravo J (2013) Elderly frailty detection by using accelerometerenabled smartphones and clinical information records. Pers Ubiquit Comput 17:1073–1083 Fritz S, Lusardi M (2009) White paper: “walking speed: the sixth vital sign”. J Geriatr Phys Ther 32:2–5 Garciaguirre JS, Adolph KE, Shrout PE (2007) Baby carriage: infants walking with loads. Child Dev 78:664–680 González I, López-Nava IH, Fontecha J, Muñoz-Meléndez A, Pérez-SanPablo AI, QuiñonesUrióstegui I (2016) Comparison between passive vision-based system and a wearable inertialbased system for estimating temporal gait parameters related to the GAITRite electronic walkway. J Biomed Inform 62:210–223 Hartmann A, Luzi S, Murer K, de Bie RA, de Bruin ED (2009) Concurrent validity of a trunk tri-axial accelerometer system for gait analysis in older adults. Gait Posture 29:444–448 Highsmith MJ, Schulz BW, Hart-Hughes S, Latlief GA, Phillips SL (2010) Differences in the spatiotemporal parameters of transtibial and transfemoral amputee gait. J Prosthetics Orthot 22:26–30 Hollands K, Agnihotri D, Tyson S (2014) Effects of dual task on turning ability in stroke survivors and older adults. Gait Posture 40:564–569 Horak FB (1997) Clinical assessment of balance disorders. Gait Posture 6:76–84 Katz-Leurer M, Rotem H, Lewitus H, Keren O, Meyer S (2008) Relationship between balance abilities and gait characteristics in children with post-traumatic brain injury. Brain Inj 22:153–159

16

R. Sun et al.

Keefe FJ, Hill RW (1985) An objective approach to quantifying pain behavior and gait patterns in low back pain patients. Pain 21:153–161 Kim A, Kim J, Rietdyk S, Ziaie B (2015) A wearable smartphone-enabled camera-based system for gait assessment. Gait Posture 42:138–144 Kressig RW, Beauchet O (2006) Guidelines for clinical applications of spatio-temporal gait analysis in older adults. Aging Clin Exp Res 18:174–176 Kuys SS, Brauer SG, Ada L (2011) Test-retest reliability of the GAITRite system in people with stroke undergoing rehabilitation. Disabil Rehabil 33:1848–1853 Lewek MD, Randall EP (2011) Reliability of spatiotemporal asymmetry during overground walking for individuals following chronic stroke. J Neurol Phys Ther 35:116–121 Liu W-Y, Lin P-H, Lien H-Y, Wang H-S, Wong AM-K, Tang SF-T (2014) Spatio-temporal gait characteristics in children with Tourette syndrome: a preliminary study. Res Dev Disabil 35:2008–2014 Maetzler W, Domingos J, Srulijes K, Ferreira JJ, Bloem BR (2013) Quantitative wearable sensors for objective assessment of Parkinson’s disease. Mov Disord 28:1628–1637 Maki BE, Holliday PJ, Topper AK (1994) A prospective study of postural balance and risk of falling in an ambulatory and independent elderly population. J Gerontol 49:M72–M84 McDonough AL, Batavia M, Chen FC, Kwon S, Ziai J (2001) The validity and reliability of the GAITRite system’s measurements: a preliminary evaluation. Arch Phys Med Rehabil 82:419–425 Menz HB, Latt MD, Tiedemann A, San Kwan MM, Lord SR (2004) Reliability of the GAITRite ® walkway system for the quantification of temporo-spatial parameters of gait in young and older people. Gait Posture 20:20–25 Miyazaki S, Kubota T (1984) Quantification of gait abnormalities on the basis of continuous footforce measurement: correlation between quantitative indices and visual rating. Med Biol Eng Comput 22:70–76 Morris ME, Huxham F, McGinley J, Dodd K, Iansek R (2001) The biomechanics and motor control of gait in Parkinson disease. Clin Biomech 16:459–470 Nelson AJ, Zwick D, Brody S et al (2002) The validity of the GaitRite and the functional ambulation performance scoring system in the analysis of Parkinson gait. NeuroRehabilitation 17:255–262 Nomura K, Fukada K, Azuma T, Hamasaki T, Sakoda S, Nomura T (2009) A quantitative characterization of postural sway during human quiet standing using a thin pressure distribution measurement system. Gait Posture 29:654–657 Owings TM, Grabiner MD (2003) Measuring step kinematic variability on an instrumented treadmill: how many steps are enough? J Biomech 36:1215–1218 Papadopoulos N, McGinley JL, Bradshaw JL, Rinehart NJ (2014) An investigation of gait in children with attention deficit hyperactivity disorder: a case controlled study. Psychiatry Res 218:319–323 Patterson KK, Gage WH, Brooks D, Black SE, McIlroy WE (2010) Evaluation of gait symmetry after stroke: a comparison of current methods and recommendations for standardization. Gait Posture 31:241–246 Paul L, Ellis B, Leese G, McFadyen A, McMurray B (2009) The effect of a cognitive or motor task on gait parameters of diabetic patients, with and without neuropathy. Diabet Med 26:234–239 Podsiadlo D, Richardson S (1991) The timed “up & go”: a test of basic functional mobility for frail elderly persons. J Am Geriatr Soc 39:142–148 Rao AK, Quinn L, Marder KS (2005) Reliability of spatiotemporal gait outcome measures in Huntington’s disease. Mov Disord 20:1033–1037 Rinehart NJ, Tonge BJ, Iansek R, McGinley J, Brereton AV, Enticott PG, Bradshaw JL (2006) Gait function in newly diagnosed children with autism: cerebellar and basal ganglia related motor disorder. Dev Med Child Neurol 48:819–824

Low Density Pedoboragraphy as a Gait Analysis Tool

17

Salarian A, Russmann H, Vingerhoets FJ, Dehollain C, Blanc Y, Burkhard PR, Aminian K (2004) Gait assessment in Parkinson’s disease: toward an ambulatory system for long-term monitoring. IEEE Trans Biomed Eng 51:1434–1443 Schniepp R, Wuehr M, Neuhaeusser M et al (2012) Locomotion speed determines gait variability in cerebellar ataxia and vestibular failure. Mov Disord 27:125–131 Sekiya N, Nagasaki H, Ito H, Furuna T (1997) Optimal walking in terms of variability in step length. J Orthop Sports Phys Ther 26:266–272 Selby-Silverstein L, Besser M (1999) Accuracy of the GAITRite ® system for measuring temporalspatial parameters of gait. Phys Ther 79:S59 Shany T, Redmond S, Marschollek M, Lovell N (2012) Assessing fall risk using wearable sensors: a practical discussion. Z Gerontol Geriatr 45:694–706 Shull PB, Jirattigalachote W, Hunt MA, Cutkosky MR, Delp SL (2014) Quantified self and human movement: a review on the clinical impact of wearable sensing and feedback for gait analysis and intervention. Gait Posture 40:11–19 Sorsdahl AB, Moe-Nilssen R, Strand LI (2008) Test–retest reliability of spatial and temporal gait parameters in children with cerebral palsy as measured by an electronic walkway. Gait Posture 27:43–50 Sosnoff JJ, Sandroff BM, Motl RW (2012) Quantifying gait abnormalities in persons with multiple sclerosis with minimal disability. Gait Posture 36:154–156 Sosnoff JJ, Klaren RE, Pilutti LA, Dlugonski D, Motl RW (2015) Reliability of gait in multiple sclerosis over 6 months. Gait Posture 41:860–862 Stover AM (2005) Reliability of the GAITRite (R) Walking System for the assessment of gait in individuals with Parkinson’s disease. Master’s and Doctoral Project, The University of Toledo Digital Repository Thorpe DE, Dusing SC, Moore CG (2005) Repeatability of temporospatial gait measures in children using the GAITRite electronic walkway. Arch Phys Med Rehabil 86:2342–2346 Van Uden CJ, Besser MP (2004) Test-retest reliability of temporal and spatial gait characteristics measured with an instrumented walkway system (GAITRite ®). BMC Musculoskelet Disord 5:13 Verghese J, Wang C, Lipton RB, Holtzer R, Xue X (2007) Quantitative gait dysfunction and risk of cognitive decline and dementia. J Neurol Neurosurg Psychiatry 78:929–935 Verghese J, Holtzer R, Lipton RB, Wang C (2009) Quantitative gait markers and incident fall risk in older adults. J Gerontol Ser A Biol Med Sci 64:896–901 Wajda DA, Moon Y, Motl RW, Sosnoff JJ (2015) Preliminary investigation of gait initiation and falls in multiple sclerosis. Arch Phys Med Rehabil 96(6):1098–1102 Webster KE, Wittwer JE, Feller JA (2005) Validity of the GAITRite ® walkway system for the measurement of averaged and individual step parameters of gait. Gait Posture 22:317–321 Webster KE, Merory JR, Wittwer JE (2006) Gait variability in community dwelling adults with Alzheimer disease. Alzheimer Dis Assoc Disord 20:37–40 Whittle MW (1996) Clinical gait analysis: a review. Hum Mov Sci 15:369–387 Wittwer JE, Webster KE, Andrews PT, Menz HB (2008) Test–retest reliability of spatial and temporal gait parameters of people with Alzheimer’s disease. Gait Posture 28:392–396 Wondra VC, Pitetti KH, Beets MW (2007) Gait parameters in children with motor disabilities using an electronic walkway system: assessment of reliability. Pediatr Phys Ther 19:326–331 Wong JS, Jasani H, Poon V, Inness EL, McIlroy WE, Mansfield A (2014) Inter-and intra-rater reliability of the GAITRite system among individuals with sub-acute stroke. Gait Posture 40:259–261 Wu J, Looper J, Ulrich BD, Ulrich DA, Angulo-Barroso RM (2007) Exploring effects of different treadmill interventions on walking onset and gait patterns in infants with down syndrome. Dev Med Child Neurol 49:839–945

The Importance of Foot Pressure in Diabetes Malindu E. Fernando, Robert G. Crowther, and Scott Wearing

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . State of the Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Foot Pressure Sensors, Types, Spatial Resolutions, and Device Specifications . . . . . . . . . . . . . Types of Foot Pressure Outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Procedures Used to Acquire Foot Pressure Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reliability and Validity of Foot Pressures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Factors that Influence Foot Pressure Measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Foot Pressure in Individuals with Diabetes Mellitus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Foot Pressure in Individuals with High-Risk Feet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ultimately, Could a Threshold Be Found for Identifying Individuals at Risk of Ulceration? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . What Research Is Required in order to Make Foot Pressures More Effective and Beneficial? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2 4 6 9 11 12 16 18 19 21 21 22 23 23

M.E. Fernando (*) Podiatry Service, Kirwan Community Health Campus, Townsville, QLD, Australia College of Medicine, James Cook University, Townsville, QLD, Australia e-mail: [email protected] R.G. Crowther Sport and Exercise, School of Health and Wellbeing, University of Southern Queensland, Ipswich, QLD, Australia Smart Movement, Brisbane, QLD, Australia e-mail: [email protected] S. Wearing Institute of Health and Biomedical Innovation, Queensland University of Technology, Brisbane, QLD, Australia Faculty for Sport and Health, Technische Universität München, Munich, Bavaria, Germany e-mail: [email protected] # Springer International Publishing AG 2016 B. Müller, S.I. Wolf (eds.), Handbook of Human Motion, DOI 10.1007/978-3-319-30808-1_39-1

1

2

M.E. Fernando et al.

Abstract

Foot pressure assessment refers to the study of pressure fields acting between the plantar surface of the foot and a supporting surface using typically electronic sensors. Individuals with both type 1 and type 2 diabetes are at risk of developing diabetic peripheral neuropathy (loss of peripheral sensory and motor function), which predisposes them to the development of plantar foot ulcers (open wounds). Foot pressure assessments have been extensively utilized to investigate the outcomes of biomechanical features in individuals with diabetes related foot complications termed the “high-risk” foot. Thus, the application of foot pressure measurements in individuals with diabetes includes monitoring patients for risk of ulceration, determining pressure off-loading capacities, and investigating the mechanical factors responsible for foot ulceration and ulcer healing. The ideal application of foot pressure would be to utilize measurements to predict sites of potential ulceration, prior to ulcer occurrence, and to effectively guide pressure off-loading of ulcerated sites to progress wound healing. Although these two applications represent the overall importance of foot pressure assessments within the field of diabetes, such applications have limited use due to various reasons. The aim of this chapter is to provide the reader with an overview of foot pressure assessment in relation to diabetes mellitus and describe the factors which influence foot pressure assessments. In doing this, we hope to provide a focused discussion of the relevance of foot pressures in diabetes mellitus, utilizing the most up-to-date literature on the topic. Keywords

Plantar pressure • Foot ulcers • Pressure–time integral • Peak plantar pressure • Shear pressure • Diabetic peripheral neuropathy • Diabetes mellitus • Foot pressure • Pedobarography • Reproducibility • Sensors • Validity • Footwear • Orthoses

Introduction Pedobarography (also known as plantar or foot pressure assessment) is the study of pressure fields acting between the plantar surface of the foot and a supporting surface using a wide range of techniques that encompass electronic and nonelectronic methods (see Fig. 1 for an example). Foot pressures, simply stated, are the result of the vertical force (also known as ground reaction force) acting on a specific site of the plantar aspect of the foot divided by the contact area (the amount of surface contact between the plantar surface of the foot and the sensor). The assessment of foot pressure is employed in a wide range of applications including sports biomechanics and clinical assessments but has specific implications in diabetes mellitus. The clinical application of foot pressures has both direct and indirect uses (Hughes 1993). The direct uses can be to assess the effectiveness of a treatment by examination before and after a surgical procedure, to monitor treatment progress with

The Importance of Foot Pressure in Diabetes

3

Fig. 1 Foot pressure assessment of an individual demonstrating the various areas of foot pressure. The diagram to top indicates the different masks (or sites of foot pressure in various colors); the diagram to the bottom indicates an example of a pressure measurement in an individual. Note the area of red and yellow under the fifth metatarsal of the right foot (bottom diagram), indicating a site of peak pressure

repeated measurements, and to design and assess the effectiveness of foot orthoses in relation to pressure off-loading (Kato et al. 1996; Bus et al. 2004, 2008b, 2013; Armstrong et al. 1999; Mueller et al. 2003b; Fernando et al. 2015). Indirect uses come from the growing body of knowledge emerging from laboratories using this equipment for research which provide scientists with fundamental information regarding the control of gait and the influence of foot pressures (Hughes 1993; Fernando et al. 2016a; Hamatani et al. 2016; Barn et al. 2015; Qiu et al. 2015; Yavuz 2014; Hafer et al. 2013; Bus and Waaijman 2013; Waaijman and Bus 2012; Acharya et al. 2012). Individuals with both type 1 and type 2 diabetes mellitus are at risk of developing diabetic peripheral neuropathy (nerve damage due to uncontrolled or long-standing chronic hyperglycemia) and peripheral arterial disease. Both components predispose individuals to foot ulcers, but of particular relevance to foot pressures is the development of diabetic peripheral neuropathy. Diabetic peripheral neuropathy has sensory, motor, and autonomic components (Boulton et al. 2004). The sensory components have received most of the attention in the literature, and a “loss of protective sensation” has been implicated in the development of plantar foot ulcer (wound) formation (Armstrong 2005; Wood et al. 2005). While the motor component of neuropathy has received less attention in relation to foot ulcers, this has also been implicated in the development of foot ulcers by its effect on foot and lower limb biomechanics (Mueller et al. 1994a; Veves et al. 1992; Fernando et al. 2014, 2016b). Individuals with diabetes mellitus acquire foot deformities (such as claw and hammertoe deformities) and experience significant changes to their foot morphology. These changes are accompanied by a gradual decline in peripheral proprioception, motor function, and protective sensation of the foot, after the onset of diabetic peripheral neuropathy. The collective term for these changes is the “high-risk” foot, and such changes are associated with elevated pressure beneath the foot (see Fig. 1).

4

M.E. Fernando et al.

Such changes contribute to higher magnitudes and durations of mechanical stress within the foot during standing and walking. These higher foot pressures are thought to predispose patients with diabetic peripheral neuropathy to the development of foot ulcers (wounds) on the plantar aspect of the foot. Elevated foot pressures have also been implicated in delayed wound healing, which in turn predisposes individuals with foot ulcers to limb amputations due to wound infection and ischemia. Therefore, foot pressure assessment in patients with diabetes mellitus has multiple applications, and some of the main applications include the identification of sites of potential ulceration, evaluation of the duration and magnitude of pressure off-loading required for ulcer healing, and the assessment of the effectiveness of off-loading techniques. Other applications include the assessment of gait temporal–spatial parameters and balance in the presence of peripheral neuropathy. Unfortunately, despite an increase in the clinical use of pedobarography, the results obtained from the foot pressure assessments do not always reflect clinical expectations (i.e., sites of ulceration), leading to confusion and misdiagnosis (Choi et al. 2014). Hence the ability of pedobarography to assist in improving the identification of sites of potential foot ulceration remains somewhat controversial (Choi et al. 2014). Therefore, clinicians and researchers need to be aware of both the strengths and the shortcomings of foot pressure assessments in order to utilize it effectively. Hence, the aims of this chapter are to provide the reader with an overview of foot pressure assessment in relation to diabetes mellitus; to describe factors that influence foot pressure assessments, including some of the key technical requirements; and to provide a focused discussion of its importance in diabetes mellitus, utilizing the most up-to-date literature on the topic.

State of the Art The acquisition of foot pressure data in individuals with diabetes mellitus has evolved over the last three decades from only barefoot assessments to both barefoot and in-shoe foot pressure assessments (see Fig. 2) (Lord et al. 1986). This shift of methodology was predicated on the assumption that in-shoe pressure measurements are more representative of the foot pressures experienced by patients with diabetes on a day-to-day basis. While this is true from an application point of view, barefoot assessments and in-shoe assessments provide different types of biomechanical information. For example, barefoot investigations allow the assessment of the foot–ground interaction and the inherent foot pressures experienced due to the presence of neuropathy or deformity, without the confounding influence of footwear. In-shoe assessments, in contrast, allow for the identification of the influence of footwear on foot pressures, to assess areas of high-pressure and the adequacy of off-loading strategies. Therefore, both barefoot and in-shoe assessments continue to have a role within the field. Another key area of interest has been in assessing shear pressure, or the pressure related to the horizontal components of the ground reaction force. The shear components of foot pressure are thought to have an important role in the development of

The Importance of Foot Pressure in Diabetes

5

Fig. 2 Barefoot and in-shoe foot pressure acquisition. The image to the left represents an example of in-shoe foot pressure assessment. Here the pressure-detecting insoles lined with electronic sensors, capable of detecting foot pressures, can be seen inside the footwear. The data obtained is transmitted to a receiver (seen in the far left) (Image is courtesy of novel electronics incorporated, obtained with permission.) The image to the right represents an example of how barefoot foot pressure data is obtained using a pressure platform. The given example demonstrates a freestanding platform, 2 m long, which is usually embedded within the floor. This platform contains electronic sensors capable of detecting the foot pressure at various foot sites and estimating the vertical ground reaction force utilizing the pressure value and contact areas at various foot sites, provided the sensors are loaded uniformly

complications such as diabetic foot ulcers, in the presence of diabetic peripheral neuropathy (Yavuz 2014; Yavuz et al. 2007a, 2008, 2009; Park and Kim 2007). Although it was known over three decades ago that shear pressures had a profound influence on tissue viability in individuals with diabetes mellitus (Pollard and Le Quesne 1983), the quantification of the horizontal components of pressure has proved to be technically challenging (Yavuz et al. 2007a). With the exception of a few devices, most standard sensors used for pressure measurement do not measure the fore–aft or medial–lateral shear forces that are obtained using force platforms (Alexander et al. 1990; Orlin and McPoil 2000). Measurement of shear is technically difficult compared to vertical pressure, as shear is also dependent on frictional properties of the sensor surface (which may or may not be representative of the surrounding footwear or the surface of contact). Consequently, some attempts have been made to estimate shear forces by modeling the spatial change or gradient of vertical peak pressures beneath the foot, termed the peak pressure gradient (Mueller et al. 2005). Findings indicated that the peak pressure gradient was substantially

6

M.E. Fernando et al.

higher in the forefoot than in the rearfoot even when compared with the peak foot pressure (Mueller et al. 2005). While insightful, the accuracy of these methods has recently been questioned and has been suggested to be of limited clinical capacity (Yavuz et al. 2007a). Some technological advances have been able to provide the instrumentation required to evaluate the role of shear pressure in leading to ulceration and hence have assisted in advancing this area of knowledge (Yavuz 2014; Yavuz et al. 2009; Hamatani et al. 2016; Lord and Hosein 2000; Davis et al. 1998). Davis and colleagues (1998) were one of the first groups to develop a device to simultaneously measure the vertical pressure and the anterior–posterior and medial–lateral distributed shearing forces under the plantar surface of the foot using strain gauge technology. However, this device had a low sampling frequency of 37 Hz and was limited to a few plantar locations during simultaneous measurement (Davis et al. 1998). There is no commercially available measurement device for assessing shear pressure at present. The inability to measure the shear pressure may have contributed to the lack of clarity regarding pressure thresholds for foot ulceration and may also provide a reason as to why ulcer sites do not always correspond with sites of peak vertical foot pressure (Yavuz et al. 2007b). Yavuz and colleagues (2007b) demonstrated that sites of peak pressure do not always correlate with sites of peak shear pressure within the foot. In their study, peak shear occurred at the same site as the peak pressure for 20 % of the cohort with diabetes mellitus. For the other participants, the location of peak shear was more than 2.5 cm away from the site of peak pressure (Yavuz et al. 2007b). Hence, our current understanding of shear and vertical pressure and how they are involved in foot ulceration and ulcer healing continues to improve. These advancements may have a large impact in terms of how foot ulcers are managed in the future.

Foot Pressure Sensors, Types, Spatial Resolutions, and Device Specifications Types of Sensors There are a range of different foot pressure measurement devices. The foot pressure systems used by clinicians and researchers vary in sensor configuration to meet different application requirements. The types of pressure sensors that are utilized include capacitive sensors, resistive sensors, piezoelectric sensors, and piezoresistive sensors (Abdul Razak et al. 2012; Rosenbaum and Becker 1997; Urry 1999). Table 1 provides a summary of the various types of sensors. Capacitive sensors contain two electrically charged plates separated by an elastic medium and measure pressure as a voltage change proportional to the applied pressure. Resistive sensors work by measuring the resistance of conductive medium sandwiched between two electrodes. Application of pressure causes conductive particles to touch, increasing the current through the sensors. Piezoelectric sensors produce an electric field (voltage) in direct response to an applied force which can be measured. Piezoresistive sensors rely on the piezoresistive effect, where the electrical resistance of a material changes in

The Importance of Foot Pressure in Diabetes

7

Table 1 Types of foot pressure sensors and their characteristics. Type of sensor Capacitive sensors

Pressure detection strategy Measures voltage change

Resistive sensors

Measures resistance of conductive foam between two electrodes

Piezoelectric sensors

Produces an electric voltage in response to pressure

Piezoresistive sensors

Semiconductor material. Measures electrical voltage proportional to applied force

Examples Emed# system Pedar# system Matscan# system Tekscan# system Musgrave# system F-scan#

ParoTec#

Advantage Uses a calibration curve developed for each sensor Smaller in size Able to measure both force and pressure Highly elastic, show little material deformation May be able to measure shear forces

Disadvantage Sensor is larger than most other sensor types

Can only measure vertical force Conductive foam is prone to damage Highly temperature sensitive High signal to noise ratio due to electrical interference

The data in the table is adapted from the following review or original studies (Zhao et al. 2013; Orlin and McPoil 2000; Abdul Razak et al. 2012).

response to an applied mechanical strain. Typically made of a semiconductor material, the resistivity of the sensor is influenced by the force or pressure applied; when the sensor is unloaded, resistivity is high, and when force is applied, resistance decreases. When piezoresistors are placed in a Wheatstone bridge configuration and attached to a pressure-sensitive diaphragm, a change in resistance is converted to a voltage output, which is proportional to the applied pressure. The type of sensors within the device are also a determinant of measurement accuracy and precision (Giacomozzi 2010). For example, static and dynamic pressure tests have demonstrated very high accuracy of capacitive, elastomer-based technologies, and a high accuracy of resistive technology, despite a need for a complex ad hoc calibration (Giacomozzi 2010).

Types of Devices Pressure measurement devices are also broadly categorized into either pressure distribution platforms or in-shoe systems. Key benefits of using in-shoe measurement devices are the mobility, flexibility, and cost, as opposed to platform systems, which are less portable and generally more expensive. Platform systems were traditionally used for both static (standing) and dynamic (during walking) assessments, although several new in-shoe systems are also cable of assessing these parameters. Platform systems are typically constructed from a flat array of pressure sensing elements and various types of sensors arranged in a matrix configuration and

8

M.E. Fernando et al.

embedded in the floor to allow the capture of foot pressure information during gait (Abdul Razak et al. 2012). In-shoe sensors are flexible and embedded in the shoe such that measurements reflect the interface between the foot and the shoe (Abdul Razak et al. 2012), rather than the foot and the floor as in platform systems. Other advantages of using platform devices compared to in-shoe devices include a greater number of active sensors, the size of the active sensing area, a higher spatial resolution (i.e., the number of sensors for a given absolute area), and the ideal placement of the sensors parallel to the supporting surface to provide a “true” vertical force measurement (Orlin and McPoil 2000). Disadvantages include the need to target the pressure platform during gait, with smaller platforms which leads to altered pressure measurements and the large number of required steps to obtain reliable and reproducible data (Wearing et al. 1999; Abdul Razak et al. 2012; Orlin and McPoil 2000). A disadvantage of pressure measurement systems compared to force plates is their relatively lower temporal resolution. Modern systems typically have sample rates between 100 and 500 Hz (Abdul Razak et al. 2012). Although suitable for most clinical gait applications, the sampling frequency of commercially available pressure platforms may not be sufficient to accurately detect transient loads associated with heel strike (Verdini et al. 2006; Gillespie and Dickey 2003). A general advantage of in-shoe measurement systems is that multiple steps can be easily averaged and analyzed with the same mask, and the influence of footwear and orthotics can also be studied (Rosenbaum and Becker 1997). However, in-shoe sensors are susceptible to damage and generally have lower spatial resolution, yet they provide real-time information regarding foot pressures while wearing footwear. In-shoe systems may also be limited by the fact that data using different insole sizes may be incomparable given differences in spatial resolution between insoles of different sizes. Consistent with platform systems, the ability to detect the transient loads associated with heel strikes, which may be as high as 400 Hz in some footwear conditions, is seemingly limited with in-shoe systems (Gillespie and Dickey 2003). The decision of what type of device is based on the clinical or research requirement, the loading characteristics, and the outcomes of interest.

Specifications for Devices The required specifications for a pressure sensor in terms of sensor performance include the linearity, hysteresis, temperature sensitivity, sensor size, type, and the capture frequency (Abdul Razak et al. 2012; Rosenbaum and Becker 1997; Urry 1999). The linearity refers to whether sensor output is directly (linearly) related to sensor input over a specified range over time, and hence a higher linearity is preferred in a sensor. Hysteresis refers to a measure of the sensors performance on loading and unloading. Hence, the ideal sensor should have a low hysteresis or loss of energy. The temperature sensitivity refers to the sensitivity of the sensor to ambient temperature and the sensor type and size as well as the type of the elastomer used, and this determines the range of pressure able to be captured by the device. The spatial resolution refers to the size rather than the number of sensors within the capturing system (either a platform or an insole). The capture frequency is the rate of capture or sampling frequency, which is dependent on a number of factors including

The Importance of Foot Pressure in Diabetes

9

the natural frequency of the sensor. Hence, the sampling frequency is the number of samples measured by each sensor per second recorded in cycles per second or Hertz. Calibration refers to the priming of the sensors prior to capturing foot pressure data (Abdul Razak et al. 2012; Orlin and McPoil 2000). Most importantly, the specifications that are crucial for obtaining reliable and valid foot pressure measurements are the spatial resolution, temporal resolution, and accuracy of calibration (Orlin and McPoil 2000; Rosenbaum and Becker 1997).

Types of Foot Pressure Outcomes The two commonly used measures of foot pressure are peak pressure (mpp), by definition the peak pressure measurement during the stance phase of the gait cycle, and the pressure–time integral (pti), by definition the area under a peak pressure versus time curve (see Fig. 3). The most common units for mean peak pressure include newtons per centimeter squared (N/cm2) and kilopascals (kPa), and for the pressure–time integral, Ns/cm2 and kPa/s. The SI unit for pressure is kPa and is encouraged among the foot pressure measurement community. With regard to the types of measurement outcomes used, there remains some ambiguity in the field as to which is the better outcome to predict ulceration and ulcer healing in individuals with diabetes and diabetic peripheral neuropathy (Melai et al. 2011; Bus and Waaijman 2013). Therefore, there are differences between what measures are commonly used in studies (see Table 2 for a random selection of studies that reported mpp and/or pti). From observing the outcomes used in these studies, it is evident that

25

Pressure-time integral (N/cm2 . s)

Pressure (N/cm2)

20 15 10 5

a

0 Stance phase of gait cycle (s)

Fig. 3 An example of a pressure versus time graph of the whole foot. The x-axis represents the time or stance phase duration of the gait cycle in seconds (s), and the y-axis represents foot pressure in N/cm2. The mean peak pressure (the peak of the pressure vs. time graph) is highlighted by the red arrow and represents the magnitude of pressure. The pressure–time integral, the area under the curve, is represented by the shaded area under the graph (a) and represents a measure of the magnitude and duration of pressure. Hence, the two measurements provide different types of information related to the duration of pressure and magnitude of pressure

10

M.E. Fernando et al.

Table 2 Examples of studies reporting peak pressure and/or pressure–time integral. Study name (Melai et al. 2011) (Guldemond et al. 2008) (Bacarin et al. 2009) (Sacco et al. 2009) (Caselli et al. 2002) (Waaijman and Bus 2012) (Yuk San Tsung et al. 2004) (Owings et al. 2009) (Waldecker 2012) (Mueller et al. 2008)

Population studied DPN DPN DPN DPN DPN DPN DPN DMC DMC DPN

Reported on mpp Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes

Reported on pti Yes No Yes Yes No Yes Yes No Yes No

The table demonstrates a selection of studies in the area that were randomly selected to demonstrate what outcomes were reported. mpp = peak plantar pressure, pti = pressure–time integral, DPN = individuals with diabetic peripheral neuropathy, DM = individuals with diabetes mellitus without neuropathy, DMC = individuals with a diabetes mellitus-related foot complication excluding neuropathy

the mpp has historically been the more commonly used outcome. As these outcomes represent different types of information regarding the temporal and peak characteristics of foot pressure, it may be appropriate to measure both outcomes. There is evidence however that the level of reproducibility is different for the various foot pressure outcome measures (Fernando et al. 2016a; Gurney et al. 2013; Price et al. 2016). In general, the mean peak pressures seem to be more reproducible compared to the pressure–time integral in people with diabetes mellitus, although further research is needed to support this finding (Fernando et al. 2016a). Historically, an aggregated measure of foot pressure was reported for the entire plantar surface of the foot as opposed to site-specific foot pressures. Aggregated measures do not take into account the area of the foot that the pressure acts on, and, especially in individuals with diabetes, the location of peak pressure gives important information as to where the sites of ulceration may occur (Fernando et al. 2016a; Ledoux et al. 2013). Hence reporting a single pressure value, which is not representative of the entire plantar surface of the foot, provides only limited information. Rather site-specific measurements of pressure–time integral and mean peak pressure provide more insight to the localization of foot pressure as opposed to aggregated whole-foot pressure measurements. A benefit of foot pressure assessment is the ability to measure the distribution of load under the foot, which is not possible, for example, with force plate analyses.

The Procedures Used to Acquire Foot Pressure Data Although there have been efforts to standardize the protocols used for foot pressure measurements (Barnett 1998; Cavanagh et al. 2000; Cavanagh and Ulbrecht 1994; Fernando et al. 2015), there is no agreed standardized protocol to date, and various

The Importance of Foot Pressure in Diabetes

11

Fig. 4 A photo depicting the mid-gait strategy using a pressure platform (Image is courtesy of novel electronics incorporated, obtained with permission)

methods have been used to obtain data. This has led to concerns with the comparability of data obtained using different protocols. The traditional method for collecting data using a platform system has been termed the “mid-gait” technique (see Fig. 4) (Meyers-Rice et al. 1994). This method requires individuals to ambulate along a walkway, and pressure data are collected from a single foot contact on the sensor platform during steady-state walking. While this method of assessment is suitable for healthy participants, it is no longer considered appropriate for use in individuals with high-risk foot conditions (i.e., ulceration) and particularly in individuals with diabetic peripheral neuropathy due to increased repetitive stress and the potential for skin ulceration (Orlin and McPoil 2000). This led to the development of alternate methods of foot pressure assessment, including the one-step, two-step, and three-step methods, relating to the number of steps a person must take before stepping onto a platform to capture their foot pressure (McPoil et al. 1999; Bus and de Lange 2005). While the one-step method is often used clinically, foot pressures obtained using this technique are not representative of those collected using the mid-gait method. The two-step method of capture, in contrast, produces pressure values that are closer to those of the mid-gait technique and is widely advocated for pressure assessment in the high-risk foot (Bus and de Lange 2005). Irrespective of the gait capture protocol, at least three to five assessments of gait are

12

M.E. Fernando et al.

required to reliably assess pressures in individuals with and without diabetes mellitus (McPoil et al. 1999; Bryant et al. 1999; Bus and de Lange 2005). Whether foot pressures are acquired during the initiation or termination of gait also influences foot pressure measurements using the two-step protocol; therefore whether foot pressure data should be acquired in relation to gait initiation or termination is largely dependent on the gait parameter of interest (Wearing et al. 1999). For example, assessing foot pressures during gait initiation has minimal effect on peak pressures beneath the forefoot but markedly alters timing parameters of the gait cycle compared to the mid-gait method (Wearing et al. 1999). Although the assessment of foot pressure during the commencement of gait may be ideal for assessing healthy individuals, this protocol may be less than ideal in individuals with diabetes mellitus due to poor balance and proprioception and an unstable gait (Bus and de Lange 2005). This is consistent with other work which indicates that a steady gait is obtained between the second and third step after gait initiation (Miller and Verstraete 1996). Generally, an approach that reduces the total stance time could reduce the amount of mechanical loading. The three-step protocol may offer consistent results and avoid unnecessary loading of the foot, especially in individuals with diabetic peripheral neuropathy (Fernando et al. 2015, 2016a). Currently, there are no standardized protocols for the assessment of in-shoe foot pressures, and a range of different methods utilizing different numbers of steps have been utilized to capture foot pressure (see Fig. 5) data (Arts and Bus 2011; Price et al. 2014, 2016; Woodburn and Helliwell 1996). This creates a challenge in comparing results between studies that have used different methods of assessment. The other limitation of in-shoe pressure measurements is the influence of footwear and orthotics, which can have a confounding effect on the distribution and magnitude of foot pressure. Hence pressure data from studies using different footwear types are not directly comparable (Guldemond et al. 2007; Bus 2008; Bus et al. 2011; Fernando et al. 2013).

Reliability and Validity of Foot Pressures The validity of foot pressures refers to the ability of either a platform or an in-shoe device to quantify the magnitude and duration of pressure accurately. The reliability is the agreement between different pressure measurements of the same individual (which is dependent on both natural variability of pressure in an individual as well as measurement reproducibility of the system). The validity of foot pressure measurements has received limited attention in the literature (Price et al. 2016; McPoil et al. 1995; Luo et al. 1998; Giacomozzi 2010). The number and size of sensors, sampling frequency, and accuracy of calibration are likely to influence the validity of foot pressure outcomes. On the contrary, the reproducibility of various foot pressure platforms and in-shoe devices have been assessed in healthy populations (Hafer et al. 2013; Hughes et al. 1991; Gurney et al. 2008; Zammit et al. 2010) and in individuals with diabetes mellitus (Fernando et al. 2016a). These reproducibility assessments are

The Importance of Foot Pressure in Diabetes

13

Fig. 5 A photo depicting in-shoe pressure measurement (Image is courtesy of novel electronics incorporated, obtained with permission)

representative of the reliability of the pressure system and variability of the individuals assessed. Factors such as the protocol used to capture foot pressure data and the participant’s gait on the day (especially with respect to speed of walking) will also influence foot pressure assessments, and therefore it is often indicated that gait speed should be controlled (Hughes et al. 1991). These influences are discussed in the next section. It is however reasonable to say that perfect reliability (i.e., 100 % agreement between two measurements) cannot be expected in foot pressure assessments, because of inherent differences in each gait assessment that cannot be standardized, as it involves a dynamic system within an individual (Orlin and McPoil 2000). However, the variability of foot pressure within individuals may also be an important consideration in studying the risk of ulceration and ulcer healing as it provides important information regarding cyclic loading and the range of pressures experienced. Hafer and co-workers (2013) calculated the intra-mat, intra-manufacturer, and inter-manufacturer reliability of foot pressure parameters as well as the number of foot pressure trials needed to reach a stable estimate for healthy participants using intraclass correlation statistics. Their results indicated that the intra-platform reliability correlations were greater than 0.70 and that the inter-platform correlations of

14

M.E. Fernando et al.

reliability were more than 0.70 for more than half of the foot pressure outcomes at various sites, when using five gait assessments per participant (Hafer et al. 2013). Intrasubject or platform comparisons of greater than 0.75 are considered to be of good reliability, those between 0.5 and 0.75 moderate reliability, and those under 0.5 poor reliability (Ferrarin et al. 2011). In addition, all foot pressure parameters improved in consistency and were within 90 % of the mean when five gait assessments were used (Hafer et al. 2013). This is consistent with other work indicating that the use of three to five gait assessments increases the reproducibility of foot pressures (Gurney et al. 2008; Hughes et al. 1991; Wearing et al. 1999). However, the comparisons by Hafer and co-workers (2013) were limited to a healthy population and to two types of platform systems, and the intraclass correlation statistics in general are known to be poor predictors of overall reliability (Lee et al. 2012). With respect to measurements in patients with diabetic peripheral neuropathy using platform systems, findings have indicated that the coefficient of variation and the reproducibility are also site and outcome dependent as in healthy individuals (Gurney et al. 2013). This is also the case in individuals with diabetic peripheral neuropathy and active foot ulcers. Figure 6 demonstrates differences in the coefficient of variability of foot pressure measurements across different plantar sites and when using two outcomes of foot pressures in individuals with diabetic foot ulcers, in individuals with diabetes without ulcers, and in healthy controls (Fernando et al. 2016a). The overall reproducibility of foot pressures are, however, site dependent (Gurney et al. 2008; Fernando et al. 2016a) and outcome dependent (Fernando et al. 2016a; Gurney et al. 2008) in individuals with and without diabetes mellitus. Therefore, the site of measurement within the foot (i.e., plantar hallux), the size of the mask or pressure capture location, and type of pressure outcome (pressure–time integral or mean peak pressure) also determine the overall reproducibility of foot pressure, as in healthy controls (Wearing et al. 1999). The mean peak pressure on average has better reproducibility in diabetes participants compared to the pressure–time integral (Fernando et al. 2016a). When using in-shoe pressure assessments, Ahroni and colleagues (Ahroni et al. 1998) found that the peak foot pressures over the metatarsal heads had the best reproducibility with low coefficients of variability and interclass correlation coefficients of greater than 0.7. However whether this constitutes an appropriate level of reproducibility is questionable (Lee et al. 2012). The reproducibility of foot pressure at the heel, the whole foot, and the hallux also showed fair to good reproducibility (Ahroni et al. 1998). More recent findings have indicated that some in-shoe pressure assessment systems may only be suitable for recording peak pressures in the range of 200 to 300 kPa (Price et al. 2016) and that these measurements often had poor reproducibility. In this study, two in-shoe pressure measurement devices were most accurate between 200 and 300 kPa, and the contact area was relatively repeatable for all systems (Price et al. 2016). The largest error in peak pressures ranged from 50 kPa for capacitive devices and up to 600 kPa for resistive devices (Price et al. 2016). The authors speculated that the reduced accuracy and precision of the resistive systems demonstrates higher variability between sensors and could be attributed to inherent

The Importance of Foot Pressure in Diabetes

15

Fig. 6 The two feet demonstrate the coefficients of variation for foot pressure at ten plantar sites assessed with a pressure platform in individuals with a plantar foot ulcer and neuropathy (n = 4) (gray), in individuals with diabetes mellitus without neuropathy and foot ulcers (orange) (n = 5), and in healthy controls (n = 5) (blue). The platform used was the Footscan ® pressure plate (RSScan International, Olen, Belgium). This plate was 2 m in length and 0.4 m in width and contained 16,384 sensors, with individual sensor dimensions of 0.0076 m  0.0051 m. All pressure data were captured at rate of 100 Hz. Foot pressure measurements were performed over 5 days and are the average results from five assessments, averaged across the two feet. The two graphs represent two outcomes of foot pressure. The top graph represents the coefficient of variation of mean peak pressure; the bottom graph represents the coefficient of variation of pressure–time integral. The x-axis represents the various sites of foot pressure measurement; the y-axis represents the coefficient of variation as a percentage (%) (Adapted from Fernando et al. 2016a)

16

M.E. Fernando et al.

sensor noise (Price et al. 2016). Therefore, this questions the use of foot pressure assessments in quantifying pressures above this level. Arts and Bus (2011) carried out a study to assess the ideal number of steps needed for reproducible and valid foot pressure data in individuals with peripheral diabetic neuropathy when wearing custom-made footwear. Their results indicated that 12 steps per foot were required to obtain valid and reliable in-shoe foot pressure data (Arts and Bus 2011) in this population. Between 7 and 17 footsteps per foot were required to obtain reliable data, depending on the parameter of interest, but 12 steps per foot were required to obtain both valid and reliable in-shoe pressure data in all parameters (Arts and Bus 2011). This provides some novel insight regarding the type of protocol, which should be considered when measuring in-shoe foot pressures in a high-risk population.

Factors that Influence Foot Pressure Measurements Understanding the role of elevated foot pressure in the neuropathic foot has been previously highlighted as an important consideration, if not the most important consideration toward the prevention and treatment of foot ulcers in individuals with diabetes mellitus and diabetic peripheral neuropathy (Brand 1979; Brand and Ebner 1969; Cavanagh et al. 1993, 2000; Formosa et al. 2013; Boulton et al. 1983). However, there still remains a lack of understanding regarding the determinants of elevated foot pressure in individuals with diabetes mellitus with and without the coexistence of peripheral neuropathy (Barn et al. 2015). These determinants can broadly be divided into laboratory- and protocol-related influences and patientspecific influences.

Laboratory Environment and Protocol-Related Influences The effect of gait protocol, targeting, and laboratory environment on the foot pressures of individuals with diabetes mellitus, in particular in individuals with diabetic peripheral neuropathy, is largely unknown and needs to be investigated. The gait speed is an important confounder affecting foot pressure measurements (Burnfield et al. 2004; Rosenbaum et al. 1994). Hence, gait speed is often tightly controlled under research conditions to obtain reproducible results and to avoid the confounding effect of speed on foot pressures (Segal et al. 2004; Burnfield et al. 2004; Rosenbaum et al. 1994; Warren et al. 2004). However, the application of such results to the everyday walking characteristics of individuals is unclear. Hence studying the foot pressures of individuals during their natural speed of walking may offer measurements that are more realistic (Fernando et al. 2015). A limitation of using natural gait speed is that it is context specific and is dependent on the task or activity that is being performed. Faster gait speeds (i.e., shorter stance phase durations) have been shown to increase foot pressure at the heel, medial and central forefoot, and lesser toes, while decreasing foot pressure beneath the midfoot and lateral forefoot (Burnfield et al. 2004; Rosenbaum et al. 1994; Hughes et al. 1991). This has been termed a medialization of the loading pattern (Rosenbaum et al. 1994).

The Importance of Foot Pressure in Diabetes

17

As patients with neuropathy are likely to have a slower gait speed, especially after the development of foot ulcers, controlling gait speed or adjusting for gait speed may be an important consideration in this population (Fernando et al. 2016b).

Patient-Specific Influences Restriction in range of motion of the foot (particularly at the first metatarsophalangeal joint and ankle joint) which is often associated with peripheral neuropathy has been associated with significantly elevated foot pressures and hence an increased risk of ulceration in individuals with diabetic peripheral neuropathy (Fernando et al. 1991; Payne et al. 2002; Armstrong and Lavery 1998; Goldsmith et al. 2002; Hastings et al. 2000; Mueller et al. 1994b; Orendurff et al. 2006; Rao et al. 2006). However, elevated foot pressures in the absence of neuropathy are not attributed to ulceration, as the presence of neuropathy has been termed a requirement for plantar ulceration (Masson et al. 1989; van Schie and Boulton 2002; van Schie 2005). Alterations in lower limb joint moments, which often results from diabetic peripheral neuropathy and include an increased plantarflexon moment and longer mid-stance phase, have been associated with elevated forefoot foot pressures in individuals with diabetic peripheral neuropathy (Savelberg et al. 2009). It has also been proposed that intrinsic foot muscle atrophy and extensor tendon substitution coupled with flexor tendon stabilization (a compensatory biomechanical strategy) can result in increased supinatory moments and increased pressure under the lateral forefoot in individuals with foot ulcers (Stess et al. 1997). The presence of foot deformity has repeatedly been observed to be an important predictor for higher foot pressures in individuals with diabetes (Bus et al. 2005; Guldemond et al. 2008; Mueller et al. 2003a). Foot deformity also often results in plantar callus and thickening of the skin (Abouaesha et al. 2001). In a landmark study by Cavanagh and colleagues (1997), foot structure as defined by 27 radiological measures were compared to regional foot pressure distributions in healthy participants. Using multiple regression analyses, their findings could only account for up to 38 % of the variance in peak foot pressure at the heel and first metatarsal, respectively (Cavanagh et al. 1997). They concluded that gait and individual variation in gait are more likely to exert the major influences on peak plantar foot pressures during walking than foot structure. Since this study, other studies have investigated the relationship between clinical and structural variables on in-shoe or platform-based foot pressures in diverse populations with diabetes and obtained variable results (Ahroni et al. 1999; Payne et al. 2002; Cavanagh et al. 1991; Mueller et al. 2003a). Therefore, the influence of variables such as age, body weight, duration of diabetes, plantar soft tissue thickness, and range of motion of foot joints is largely unknown. More recently, Barn and colleagues (2015) studied patients with diabetic peripheral neuropathy and a history of ulceration to determine the predictors of peak foot pressure using multivariate linear regression analyses in this high-risk population. They determined that different factors were likely to influence foot pressures beneath foot sites. They were only able to predict between 6 % (heel) and 41 % (midfoot) of the variation in peak foot pressure (Barn et al. 2015). The largest contributing factor

18

M.E. Fernando et al.

to peak foot pressure at the heel was glycosylated hemoglobin concentration (hbA1c) (17.8 %), whereas in the midfoot it was the presence of Charcot neuroarthropathy (a complication of severe neuropathy) (41.1 %) and the presence of prominent metatarsal heads in the forefoot (37.3 %) (Barn et al. 2015). Presence of hammertoe deformity (31.9 %) was the biggest contributing factor to elevated plantar pressures at the lesser toes as opposed to a history of previous ulceration (32.9 %) which was the biggest contributor for elevated plantar pressures at the hallux (Barn et al. 2015). Overall, their results indicated that local influences such as the presence of foot deformity were stronger predictors of foot pressure than global features such as the body mass, age, gender, and diabetes duration (Barn et al. 2015). This is consistent with findings in healthy populations indicating that the weight and age only account for X % of the variability in foot pressures (Martínez-Nova et al. 2008). Interestingly, other findings have highlighted the potential role of glycation of collagen-rich tissues, particularly in the plantar fascia of the foot and the posterior muscle compartments of the ankle, in the pathogenesis of diabetes-related foot problems (D’Ambrogi et al. 2003; Giacomozzi et al. 2005). More recent investigations have shown a relationship between potential glycation of collagen in the Achilles tendon and higher forefoot pressures in individuals with poorly controlled diabetes mellitus (Couppe et al. 2016). These findings are consistent with other work indicating an association between skin autofluorescence (a surrogate measure of glycation of collagen) and foot ulcers in patients with type 2 diabetes and vascular complications (Liu et al. 2015). These findings indicate that the aetiology of the mechanical changes that are observed as changes in foot pressure may be more complicated than previously thought and could include more intricate structural changes within the foot. Novel investigations such as the measurement of skin autofluorescence are providing new and innovative data regarding the factors previously thought to influence foot pressure, but which were previously unable to be assessed.

Foot Pressure in Individuals with Diabetes Mellitus Identifying the foot pressure profiles of patients with diabetes mellitus has been a topic of interest for a number of decades (Boulton et al. 1983, 1987; Veves et al. 1992; Armstrong et al. 1998; Lavery et al. 2003; Masson et al. 1989; Pham et al. 2000; Cavanagh et al. 1993, 2000). Following early work done by Brand and colleagues in patients with leprosy (Brand 1979; Brand and Ebner 1969), it was apparent that foot pressures could be used to potentially identify areas of the plantar foot that were prone to ulceration. Hence, a mechanical aetiology to diabetes foot ulcers was thought to exist (Stokes et al. 1975). There are now at least four systematic reviews that have compiled results of several studies which investigated the foot pressure features of individuals with diabetes mellitus (Wrobel and Najafi 2010; Allet et al. 2008; Fernando et al. 2013, 2014). These reviews have suggested a progressive increase in the foot pressures of people with diabetes neuropathy compared to controls without neuropathy and a subsequent substantial increase in foot pressures following foot ulceration. However,

The Importance of Foot Pressure in Diabetes

19

one of the key limitations has been that although many studies have proposed that foot pressures can be used as a surrogate measure of trauma to the plantar foot, current evaluation methods suffer from various shortcomings including the lack of a threshold of peak foot pressure that predicts the development of foot ulcers (Wrobel and Najafi 2010). Foot pressures are thought to be significantly higher in individuals with diabetes mellitus, however, than in healthy controls (Cavanagh et al. 1993). What is clinically relevant is that high foot pressures alone are unlikely to cause ulceration in individuals with diabetes with adequate sensation in the absence of diabetic peripheral neuropathy (Masson 1992; Masson et al. 1989; Fernando et al. 1991; Boulton 1998; Boulton et al. 1983, 1987, 2004). Therefore, assessments of foot pressure are most important in individuals with diabetic peripheral neuropathy.

Foot Pressure in Individuals with High-Risk Feet A large number of retrospective and prospective studies have shown that foot pressures are elevated in patients with diabetic peripheral neuropathy at high risk of foot ulceration, as well as in those with a past history of ulceration compared to healthy controls (Veves et al. 1992; Wrobel and Najafi 2010; Frykberg et al. 1998; Pham et al. 2000; Boulton et al. 1983; Fernando et al. 2013, 2014). The frequency, magnitude, and duration of foot pressure can either individually or cumulatively contribute to increased loading of the foot in patients with diabetic peripheral neuropathy (van Schie 2005). This predisposes plantar tissues to trauma and to skin breakdown and ulceration (van Schie 2005). In a meta-analysis of foot pressure acquired using platform measurements, it was evident that individuals with diabetic peripheral neuropathy experienced significantly higher foot pressures compared to controls (Fernando et al. 2013). Specifically, the peak forefoot and rearfoot pressure and pressure–time integrals were higher in individuals with neuropathy compared to individuals with diabetes without neuropathy and to healthy controls (Fernando et al. 2013). A limitation of this finding was the fact that not all confounding factors were controlled in individual studies. A prior history of foot ulceration is thought to be associated with significantly high foot pressure measurements in the forefoot in individuals with diabetic peripheral neuropathy, irrespective of whether the ulcer has healed (Fernando et al. 2014). An important reason for the higher incidence of skin breakdown in the forefoot than in the rearfoot is that the ratio of foot pressure in the forefoot compared to the rearfoot is significantly higher during barefoot ambulation (Caselli et al. 2002). Therefore, the risk of subsequent ulceration does not reduce in this group of individuals and warrants appropriate off-loading interventions (Raspovic et al. 2000; Tsung et al. 2004; Bus et al. 2008a). A systematic discussion of off-loading options in patients with diabetes mellitus is beyond the scope of this chapter. However, when assessing the efficacy and appropriateness of off-loading interventions, foot pressure assessments can be used and are recommended to obtain quantitative evaluations of off-loading (Bus et al. 2013). Figure 7 demonstrates site-specific differences in plantar foot pressure in a person with foot ulcers and

20

M.E. Fernando et al.

Mean peak foot pressure (N/cm2)

16

Mean peak pressure

14 12 10 8

Foot-ulcer + neuropathy

6

Diabetes without neuropathy

4

Healthy control

2 0

Location of foot pressure

Fig. 7 The graph demonstrates the mean peak foot pressure (N/cm2) at ten plantar sites assessed with a pressure platform in a person with an active plantar foot ulcer and neuropathy (gray), a person with diabetes mellitus without neuropathy and foot ulcers (orange), and in a healthy control (blue) during self-selected gait speeds. The platform used was the Footscan ® pressure plate (RSScan International, Olen, Belgium). This plate was 2 m in length and 0.4 m in width and contained 16,384 sensors, with individual sensor dimensions of 0.0076 m  0.0051 m. All the pressure data were captured at rate of 100 Hz. Gait speed was not adjusted in these analyses to assess the natural pressure distribution, and this may have affected the pressure results. Foot pressure measurements are averaged from five bilateral measurements, except in the person with the foot ulcer, whose foot pressures are only representative of the ulcerated foot. The x-axis represents the various sites of foot pressure measurement; the y-axis represents the mean peak foot pressures in N/cm2 (Adapted from Fernando et al. (2016a))

diabetic peripheral neuropathy, compared to a person with diabetes mellitus without peripheral neuropathy and a healthy control. A large difference in foot pressures in the presence of neuropathy and ulceration can be appreciated. Foot pressures have also been shown to increase in individuals with diabetes mellitus in the presence of digital and partial foot amputations, thereby increasing the mechanical stress and trauma to the tissue and increasing the risk of subsequent ulceration (Kanade et al. 2006a, b). Therefore, foot pressures demonstrate a gradual increase in individuals with diabetes mellitus, in the presence of high-risk feet, and subsequent foot complications.

Future Directions Ultimately, Could a Threshold Be Found for Identifying Individuals at Risk of Ulceration? The ideal scenario is to be able to utilize foot pressures to predict sites of potential ulceration, prior to ulcer occurrence, and to guide pressure off-loading of ulcerated sites effectively. While progress has been made with such applications, proposed

The Importance of Foot Pressure in Diabetes

21

thresholds have been inadequate due to a number of limitations including the poor agreement between various methods and instrumentation used to measure foot pressure, the lack of correlation between shod and barefoot foot pressure assessments, and a large between-day variability in obtained measurements. This had led to an uncertainty in what is clinically important as opposed to what is the expected natural variation in measurements in patients. The extent to which areas of high or elevated foot pressure corresponds with areas of ulceration is also controversial (Cavanagh et al. 2000; van Schie 2005). Several attempts have been made in the past to identify ulceration risk in diabetes patients utilizing foot pressure. Lavery et.al investigated risk factors for foot ulceration in 225 patients with diabetes mellitus with and without ulceration and established that a foot pressure >650 kPa contributed a high ulceration risk (Lavery et al. 1998). In a case–control study of 219 participants with and without foot ulceration, Armstrong et.al (Armstrong et al. 1998) determined that the optimal cutoff point for ulceration was 700 kPa, which generated a sensitivity of 70 % and a specificity of 65 %. As there was no optimal cutoff point for screening patients for ulceration beyond this, it was concluded that the higher the peak pressure, the higher the corresponding risk for ulceration. In another longitudinal study comprising 1,666 consecutive patients with diabetes mellitus, the optimum cutoff for ulceration was believed to be 875 kPa, with a sensitivity of 64 % and a specificity of 46 % (Lavery et al. 2003). Owings and colleagues (2009) studied 49 individuals with a history of diabetic foot ulcers and identified that an in-shoe pressure cutoff of 207 kPa should be utilized as a threshold target for footwear prescription for individuals with a history of foot ulcers (Owings et al. 2009). However, barefoot peak pressure only predicted approximately 35 % of the variance of in-shoe peak pressure, indicating that other factor such as individual footwear prescriptions may have a bigger influence of in-shoe foot pressure measurements (Owings et al. 2009). These findings have also suggested that foot pressure assessment by itself was a poor predictive tool for ulceration. A systematic review and meta-analysis, however, concluded that high foot pressures increased the risk of foot ulceration (Crawford et al. 2007). Nonetheless, the results from these studies have been inconclusive and the strength of evidence restricted in determining a threshold pressure for ulceration. Hence, foot pressure measurements alone cannot be utilized as a diagnostic tool for foot ulceration. The threshold foot pressure at which human skin ulcerates is unknown, although several attempts have been made to identify this silver bullet (Frykberg et al. 1998; Armstrong et al. 1998; Veves et al. 1992). The reality may be that there is no such threshold which is able to be applied to all individuals with diabetes and diabetic peripheral neuropathy. Hence, the foot pressure value likely responsible for ulcer development varies between individual to individual, based on body composition, ethnicity, sex, the range of motion of lower limb joints, gait features, and on-foot morphology and structure as earlier reported. Therefore a pressure threshold for ulceration is influenced by other factors that alter the biological composition and mechanical properties of plantar tissue and their viability (Bacarin et al. 2009). Factors such as glycation of the plantar fascia and the Achilles tendon and fatty

22

M.E. Fernando et al.

infiltration on the intrinsic foot muscles due to diabetes mellitus have also been suggested to alter both the stress within the plantar tissues and their response to stress (Craig et al. 2008; Huijberts et al. 2008; Wrobel and Najafi 2010; D’Ambrogi et al. 2003; Giacomozzi et al. 2005). Until all the factors that influence foot pressures are identified, it may be difficult to establish a threshold for ulceration. Perhaps this means that thresholds are so individualized that it should be based on a formula or a composite score that takes into account various measurements of which foot pressures are one factor. A composite score should take into account variables such as the range of movement and clinical and radiological measures together with foot pressures to identify individuals at risk of ulceration and locations of plantar ulceration. Such an approach is however not currently utilized.

What Research Is Required in order to Make Foot Pressures More Effective and Beneficial? Although important changes in foot pressures can be assessed throughout the sequelae of foot complications of diabetes mellitus including the development of diabetic peripheral neuropathy and foot ulceration, there is a lack of correlation between sites of ulceration and sites of peak pressure. While this has resulted in some questioning the utility of foot pressure assessments in diabetes mellitus, considerably more research is required to improve our understanding of factors that influence foot pressures and how these may be altered in diabetes. Irrespective of the absence of a threshold for ulceration, international consensus should be made on the methods used to collect both platform and in-shoe foot pressure data. Consensus and agreement needs to take place in the protocols for foot pressure measurements, the frequency of measurement, and the expected deviations due to measurement inaccuracy and on how measurement error should be minimized within the field. This may be challenging given the differences between laboratories, patients, and protocols used. As a bare minimum, reporting requirements should be adhered to in studies allowing for open disclosure of protocols and methods used to obtain plantar pressures. Such recommendations may allow for comparisons that are more appropriate and at a global standard between studies, patients, and institutions. Furthermore, consensus needs to be made on how and when elevated foot pressures should be clinically managed in patients with diabetes mellitus at different stages of diabetes-related foot complications. This has already started to take place with international guidelines, and such guidelines need to be revised and advanced within the field (Bus et al. 2015). A major focus of future direction should be addressing the current paucity in the reproducibility of pressure measurement between devices. As there is no international recommendation on key requirements, there is no current requirement for producers to adhere to when to manufacture devices. It is especially important to consider the spatial factors and the temporal behavior of sensors that are used (Urry 1999). This is of particular importance with more recent focus on shear pressures (Yavuz et al. 2008). The current dogma is that the choice of sensor is based on the

The Importance of Foot Pressure in Diabetes

23

principle of matching the required performance profile to the particular measurement task. Therefore, further research should focus on testing the special characteristic of existing sensors and in developing newer sensors that are sensitive and have a range of performance characteristics including the ability to detect shear pressures. Foot pressure assessments are currently limited due to significant differences between measurement protocols, measurement methods, types of sensors used, and a lack of international consensus on key requirements for foot pressure assessments in individuals with diabetes mellitus. Furthermore, there are currently no commercially available devices to measure shear pressures, and hence currently plantar pressures at best only provide partial observations regarding the mechanical characteristics of individuals with diabetes-related foot complications. With advancements in technology, it is anticipated that the foot pressure assessment may be more readily employed in evaluating the mechanical characteristics of individuals with diabetes-related foot complications.

Cross-References ▶ Assessing Club Foot and Cerebral Palsy by Pedobarography ▶ Assessing Pediatric Foot Deformities ▶ Fusion of Foot Pressure and Foot Kinematics Measurements for Medical Applications ▶ Plantar Pressure ▶ Pressure Platforms ▶ The Use of Low Resolution Pedoboragraphs

References Abdul Razak AH, Zayegh A, Begg RK, Wahab Y (2012) Foot plantar pressure measurement system: a review. Sensors 12:9884 Abouaesha F, van Schie CH, Griffths GD, Young RJ, Boulton AJ (2001) Plantar tissue thickness is related to peak plantar pressure in the high-risk diabetic foot. Diabetes Care 24:1270–1274 Acharya UR, Ghista DN, Nergui M, Chattopadhyay S, Ng EYK, Sree SV, Tong JWK, Tan JH, Meng LK, Suri JS (2012) Diabetes mellitus: enquiry into its medical aspects and bioengineering of its monitoring and regulation. J Mech Med Biol 12 Ahroni JH, Boyko EJ, Forsberg R (1998) Reliability of F-scan in-shoe measurements of plantar pressure. Foot Ankle Int 19:668–673 Ahroni JH, Boyko EJ, Forsberg RC (1999) Clinical correlates of plantar pressure among diabetic veterans. Diabetes Care 22:965–972 Alexander IJ, Chao EY, Johnson KA (1990) The assessment of dynamic foot-to-ground contact forces and plantar pressure distribution: a review of the evolution of current techniques and clinical applications. Foot Ankle 11:152–167 Allet L, Armand S, Golay A, Monnin D, DE Bie R, de Bruin E (2008) Gait characteristics of diabetic patients: a systematic review. Diabetes/Metabo Res Rev 24:173–191 Armstrong DG (2005) Detection of diabetic peripheral neuropathy: strategies for screening and diagnosis. Adv Stud Med 5:S1033–S1037

24

M.E. Fernando et al.

Armstrong DG, Lavery LA (1998) Plantar pressures are higher in diabetic patients following partial foot amputation. Ostomy Wound Manag 44:30–32, 34, 36 passim Armstrong DG, Peters EJ, Athanasiou KA, Lavery LA (1998) Is there a critical level of plantar foot pressure to identify patients at risk for neuropathic foot ulceration? J Foot Ankle Surg 37:303–307 Armstrong DG, Stacpoole-Shea S, Nguyen H, Harkless LB (1999) Lengthening of the Achilles tendon in diabetic patients who are at high risk for ulceration of the foot. J Bone Joint Surg Series A 81:535–538 Arts MLJ, Bus SA (2011) Twelve steps per foot are recommended for valid and reliable in-shoe plantar pressure data in neuropathic diabetic patients wearing custom made footwear. Clin Biomech 26:880–884 Bacarin TA, Sacco ICN, Hennig EM (2009) Plantar pressure distribution patterns during gait in diabetic neuropathy patients with a history of foot ulcers. Clinics 64:113–120 Barn R, Waaijman R, Nollet F, Woodburn J, Bus SA (2015) Predictors of barefoot plantar pressure during walking in patients with diabetes, peripheral neuropathy and a history of ulceration. PLoS One 10:e0117443 Barnett S (1998) International protocol guidelines for plantar pressure measurement. Diabet Foot 1:137–140 Boulton AJ (1998) Lowering the risk of neuropathy, foot ulcers and amputations. Diabet Med 15 (Suppl 4):S57–S59 Boulton AJ, Hardisty CA, Betts RP, Franks CI, Worth RC, Ward JD, Duckworth T (1983) Dynamic foot pressure and other studies as diagnostic and management aids in diabetic neuropathy. Diabetes Care 6:26–33 Boulton AJ, Betts RP, Franks CI, Ward JD, Duckworth T (1987) The natural history of foot pressure abnormalities in neuropathic diabetic subjects. Diabetes Res 5:73–77 Boulton AJ, Krisner RS, Vileikyte L (2004) Neuropathic diabetic foot ulcers. N Engl J Med 351:48–55 Brand PW (1979) Management of the insensitive limb. Phys Ther 59:8–12 Brand PW, Ebner JD (1969) Pressure sensitive devices for denervated hands and feet. A preliminary communication. J Bone Joint Surg Am 51:109–116 Bryant A, Singer K, Tinley P (1999) Comparison of the reliability of plantar pressure measurements using the two-step and midgait methods of data collection. Foot Ankle Int 20:246–250 Burnfield JM, Few CD, Mohamed OS, Perry J (2004) The influence of walking speed and footwear on plantar pressures in older adults. Clin Biomech (Bristol, Avon) 19:78–84 Bus SA (2008) Foot structure and footwear prescription in diabetes mellitus. Diabetes Metab Res Rev 24:S90–S95 Bus SA, de Lange A (2005) A comparison of the 1-step, 2-step, and 3-step protocols for obtaining barefoot plantar pressure data in the diabetic neuropathic foot. Clin Biomech 20:892–899 Bus SA, Waaijman R (2013) The value of reporting pressure–time integral data in addition to peak pressure data in studies on the diabetic foot: a systematic review. Clin Biomech (Bristol, Avon) 28:117–121 Bus SA, Ulbrecht JS, Cavanagh PR (2004) Pressure relief and load redistribution by custom-made insoles in diabetic patients with neuropathy and foot deformity. Clin Biomech 19:629–638 Bus SA, Maas M, de Lange A, Michels RP, Levi M (2005) Elevated plantar pressures in neuropathic diabetic patients with claw/hammer toe deformity. J Biomech 38:1918–1925 Bus S, van Deursen R, Kanade R, Wissink M, Manning E, van Baal J, Harding K (2008a) Plantar pressure relief in the diabetic foot using forefoot offloading shoes. Gait Posture 29:618–620 Bus SA, Valk GD, van Deursen RW, Armstrong DG, Caravaggi C, Hlavacek P, Bakker K, Cavanagh PR (2008b) The effectiveness of footwear and offloading interventions to prevent and heal foot ulcers and reduce plantar pressure in diabetes: a systematic review. Diabetes Metab Res Rev 24(Suppl 1):S162–S180

The Importance of Foot Pressure in Diabetes

25

Bus SA, Haspels R, Busch-Westbroek TE (2011) Evaluation and optimization of therapeutic footwear for neuropathic diabetic foot patients using in-shoe plantar pressure analysis. Diabetes Care 34:1595–1600 Bus SA, Waaijman R, Arts M, DE Haart M, Busch-Westbroek T, van Baal J, Nollet F (2013) Effect of custom-made footwear on foot ulcer recurrence in diabetes: a multicenter randomized controlled trial. Diabetes Care 36:4109–4116 Bus SA, Armstrong DG, van Deursen R, Lewis J, Caravaggi C, Cavanagh PR, (IWGDF), OBOTIWGOTDF (2015) IWGDF Guidance on footwear and offloading interventions to prevent and heal foot ulcers in patients with diabetes [Online]. IWGDF. Available: http:// www.iwgdf.org/files/2015/website_footwearoffloading.pdf. Accessed 14th July 2015 Caselli A, Pham H, Giurini JM, Armstrong DG, Veves A (2002) The forefoot-to-rearfoot plantar pressure ratio is increased in severe diabetic neuropathy and can predict foot ulceration. Diabetes Care 25:1066–1071 Cavanagh PR, Ulbrecht JS (1994) Clinical plantar pressure measurement in diabetes: rationale and methodology. Foot 4:123–135 Cavanagh PR, Sims DS Jr, Sanders LJ (1991) Body mass is a poor predictor of peak plantar pressure in diabetic men. Diabetes Care 14:750–755 Cavanagh PR, Simoneau GG, Ulbrecht JS (1993) Ulceration, unsteadiness, and uncertainty: the biomechanical consequences of diabetes mellitus. J Biomech 26(Suppl 1):23–40 Cavanagh PR, Morag E, Boulton AJ, Young MJ, Deffner KT, Pammer SE (1997) The relationship of static foot structure to dynamic foot function. J Biomech 30:243–250 Cavanagh P, Ulbrecht J, Caputo G (2000) New developments in the biomechanics of the diabetic foot. Diabetes Metab Res Rev 16(Suppl 1):S6–S10 Choi YR, Lee HS, Kim DE, Lee DH, Kim JM, Ahn JY (2014) The diagnostic value of pedobarography. Orthopedics 37:e1063–e1067 Couppe C, Svensson RB, Kongsgaard M, Kovanen V, Grosset JF, Snorgaard O, Bencke J, Larsen JO, Bandholm T, Christensen TM, Boesen A, Helmark IC, Aagaard P, Kjaer M, Magnusson SP (2016) Human Achilles tendon glycation and function in diabetes. J Appl Physiol 120:130–137 Craig ME, Duffin AC, Gallego PH, Lam A, Cusumano J, Hing S, Donaghue KC (2008) Plantar fascia thickness, a measure of tissue glycation, predicts the development of complications in adolescents with type 1 diabetes. Diabetes Care 31:1201–1206 Crawford F, Inkster M, Kleijnen J, Fahey T (2007) Predicting foot ulcers in patients with diabetes: a systematic review and meta-analysis. QJM 100:65–86 D’ambrogi E, Giurato L, D’agostino MA, Giacomozzi C, Macellari V, Caselli A, Uccioli L (2003) Contribution of plantar fascia to the increased forefoot pressures in diabetic patients. Diabetes Care 26:1525–1529 Davis B, Parry J, Neth DC, Waters KC (1998) A device for simultaneous measurement of pressure and shear force distribution on the plantar surface of the foot. J Appl Biomech 14:93–107 Fernando DJ, Masson EA, Veves A, Boulton AJ (1991) Relationship of limited joint mobility to abnormal foot pressures and diabetic foot ulceration. Diabetes Care 14:8–11 Fernando M, Crowther R, Lazzarini P, Sangla K, Cunningham M, Buttner P, Golledge J (2013) Biomechanical characteristics of peripheral diabetic neuropathy: a systematic review and metaanalysis of findings from the gait cycle, muscle activity and dynamic barefoot plantar pressure. Clin Biomech (Bristol, Avon) 28(8):831–845 Fernando ME, Crowther RG, Pappas E, Lazzarini PA, Cunningham M, Sangla KS, Buttner P, Golledge J (2014) Plantar pressure in diabetic peripheral neuropathy patients with active foot ulceration, previous ulceration and no history of ulceration: a meta-analysis of observational studies. PLoS One 9:e99050 Fernando ME, Crowther RG, Cunningham M, Lazzarini PA, Sangla KS, Golledge J (2015) Lower limb biomechanical characteristics of patients with neuropathic diabetic foot ulcers: the diabetes foot ulcer study protocol. BMC Endocr Disord 15:59 Fernando M, Crowther R, Cunningham M, Lazzarini P, Sangla K, Buttner P, Golledge J (2016a) The reproducibility of acquiring three dimensional gait and plantar pressure data using

26

M.E. Fernando et al.

established protocols in participants with and without type 2 diabetes and foot ulcers. J Foot Ankle Res 9:4 Fernando ME, Crowther RG, Lazzarini PA, Sangla KS, Buttner P, Golledge J (2016b) Gait parameters of people with diabetes-related neuropathic plantar foot ulcers. Clin Biomech 37:98–107 Ferrarin M, Bovi G, Rabuffetti M, Mazzoleni P, Montesano A, Moroni I, Pagliano E, Marchi A, Marchesi C, Beghi E, Pareyson D (2011) Reliability of instrumented movement analysis as outcome measure in Charcot–Marie–Tooth disease: results from a multitask locomotor protocol. Gait Posture 34:36–43 Formosa C, Gatt A, Chockalingam N (2013) The importance of clinical biomechanical assessment of foot deformity and joint mobility in people living with type-2 diabetes within a primary care setting. Prim Care Diabetes 7:45–50 Frykberg RG, Lavery LA, Pham H, Harvey C, Harkless L, Veves A (1998) Role of neuropathy and high foot pressures in diabetic foot ulceration. Diabetes Care 21:1714–1719 Giacomozzi C (2010) Appropriateness of plantar pressure measurement devices: a comparative technical assessment. Gait Posture 32:141–144 Giacomozzi C, D’ambrogi E, Uccioli L, Macellari V (2005) Does the thickening of Achilles tendon and plantar fascia contribute to the alteration of diabetic foot loading? Clin Biomech 20:532–539 Gillespie KA, Dickey JP (2003) Determination of the effectiveness of materials in attenuating high frequency shock during gait using filterbank analysis. Clin Biomech (Bristol, Avon) 18:50–59 Goldsmith JR, Lidtke RH, Shott S (2002) The effects of range-of-motion therapy on the plantar pressures of patients with diabetes mellitus. J Am Podiatr Med Assoc 92:483–490 Guldemond NA, Leffers P, Schaper NC, Sanders AP, Nieman F, Willems P, Walenkamp GHIM (2007) The effects of insole configurations on forefoot plantar pressure and walking convenience in diabetic patients with neuropathic feet. Clin Biomech 22:81–87 Guldemond NA, Leffers P, Walenkamp GHIM, Schaper NC, Sanders AP, Nieman FHM, van Rhijn LW (2008) Prediction of peak pressure from clinical and radiological measurements in patients with diabetes. BMC Endo Disord 8:16 Gurney JK, Kersting UG, Rosenbaum D (2008) Between-day reliability of repeated plantar pressure distribution measurements in a normal population. Gait Posture 27:706–709 Gurney JK, Marshall PW, Rosenbaum D, Kersting UG (2013) Test-retest reliability of dynamic plantar loading and foot geometry measures in diabetics with peripheral neuropathy. Gait Posture 37:135–137 Hafer JF, Lenhoff MW, Song J, Jordan JM, Hannan MT, Hillstrom HJ (2013) Reliability of plantar pressure platforms. Gait Posture 38:544–548 Hamatani M, Mori T, Oe M, Noguchi H, Takehara K, Amemiya A, Ohashi Y, Ueki K, Kadowaki T, Sanada H (2016) Factors associated with callus in diabetic patients, focused on plantar shear stress during gait. J Diabetes Sci Technol 10(6):1353–1359 Hastings M, Mueller M, Sinacore D, Salsich G, Engsberg J, Johnson J (2000) Effects of a tendoachilles lengthening procedure on muscle function and gait characteristics in a patient with diabetes mellitus. J Orthop Sports Phys Ther 30:85–90 Hughes J (1993) The clinical use of pedobarography. Acta Orthop Belg 59:10–16 Hughes J, Pratt L, Linge K, Clark P, Klenerman l (1991) Reliability of pressure measurements: the EM ED F system. Clin Biomech (Bristol, Avon) 6: 14–18. Huijberts MS, Schaper NC, Schalkwijk CG (2008) Advanced glycation end products and diabetic foot disease. Diabetes Metab Res Rev 24(Suppl 1):S19–S24 Kanade RV, Van Deursen RWM, Harding K, Price P (2006a) Walking performance in people with diabetic neuropathy: benefits and threats. Diabetologia 49:1747–1754 Kanade RV, van Deursen RWM, Price P, Harding K (2006b) Risk of plantar ulceration in diabetic patients with single-leg amputation. Clin Biomech 21:306–313 Kato H, Takada T, Kawamura T, Hotta N, Torii S (1996) The reduction and redistribution of plantar pressures using foot orthoses in diabetic patients. Diabetes Res Clin Pract 31:115–118

The Importance of Foot Pressure in Diabetes

27

Lavery LA, Armstrong DG, Vela SA, Quebedeaux TL, Fleischli JG (1998) Practical criteria for screening patients at high risk for diabetic foot ulceration. Arch Intern Med 158:157–162 Lavery LA, Armstrong DG, Wunderlich RP, Tredwell J, Boulton AJM (2003) Predictive value of foot pressure assessment as part of a population-based diabetes disease management program. Diabetes Care 26:1069–1073 Ledoux WR, Shofer JB, Cowley MS, Ahroni JH, Cohen V, Boyko EJ (2013) Diabetic foot ulcer incidence in relation to plantar pressure magnitude and measurement location. J Diabetes Complications 27:621–626 Lee KM, Lee J, Chung CY, Ahn S, Sung KH, Kim TW, Lee HJ, Park MS (2012) Pitfalls and important issues in testing reliability using intraclass correlation coefficients in orthopaedic research. Clin Orthop Surg 4:149–155 Liu C, Xu L, Gao H, Ye J, Huang Y, Wu M, Xie T, Ni P, Yu X, Cao Y, Lu S (2015) The association between skin autofluorescence and vascular complications in Chinese patients with diabetic foot ulcer: an observational study done in Shanghai. Int J Low Extrem Wounds 14:28–36 Lord M, Hosein R (2000) A study of in-shoe plantar shear in patients with diabetic neuropathy. Clin Biomechan 15:278–283 Lord M, Reynolds DP, Hughes JR (1986) Foot pressure measurement: a review of clinical findings. J Biomed Eng 8:283–294 Luo ZP, Berglund LJ, An KN (1998) Validation of F-Scan pressure sensor system: a technical note. J Rehabil Res Dev 35:186–191 Martínez-Nova A, Huerta JP, Sánchez-Rodríguez R (2008) Cadence, age, and weight as determinants of forefoot plantar pressures using the biofoot in-shoe system. J Am Podiatr Med Assoc 98:302–310 Masson EA (1992) What causes high foot pressures in diabetes: how can they be relieved?. Proceedings of the IDF satellite symposium on the diabetic foot, Washington 1991. Foot 2:212–217 Masson EA, Hay EM, Stockley I, Veves A, Betts RP, Boulton AJM (1989) Abnormal foot pressures alone may not cause ulceration. Diabet Med 6:426–428 McPoil TG, Cornwall MW, Yamada W (1995) A comparison of two in-shoe plantar pressupe measurement systems. Lower Extrem 2:95–103 McPoil TG, Cornwall MW, Dupuis L, Cornwell M (1999) Variability of plantar pressure data. A comparison of the two-step and midgait methods. J Am Podiatr Med Assoc 89:495–501 Melai T, Ijzerman TH, Schaper NC, DE Lange TLH, Willems PJB, Meijer K, Lieverse AG, Savelberg HHCM (2011) Calculation of plantar pressure time integral, an alternative approach. Gait Posture 34:379–383 Meyers-Rice B, Sugars L, McPoil T, Cornwall MW (1994) Comparison of three methods for obtaining plantar pressures in nonpathologic subjects. J Am Podiatr Med Assoc 84:499–504 Miller CA, Verstraete MC (1996) Determination of the step duration of gait initiation using a mechanical energy analysis. J Biomech 29:1195–1199 Mueller M, Minor S, Sahrmann S, Schaaf J, Strube M (1994a) Differences in the gait characteristics of patients with diabetes and peripheral neuropathy compared with aged-matched controls. . . including commentary by McPoil T and Cavanaugh PR with author response. Phys Ther 74:299–313 Mueller MJ, Sinacore DR, Hoogstrate S, Daly L (1994b) Hip and ankle walking strategies: effect on peak plantar pressures and implications for neuropathic ulceration. Arch Phys Med Rehabil 75:1196–1200 Mueller MJ, Hastings M, Commean PK, Smith KE, Pilgram TK, Robertson D, Johnson J (2003a) Forefoot structural predictors of plantar pressures during walking in people with diabetes and peripheral neuropathy. J Biomech 36:1009–1017 Mueller MJ, Sinacore DR, Hastings MK, Strube MJ, Johnson JE (2003b) Effect of Achilles tendon lengthening on neuropathic plantar ulcers: a randomized clinical trial. J Bone Joint Surg Series A 85:1436–1445

28

M.E. Fernando et al.

Mueller MJ, Zou D, Lott DJ (2005) “Pressure gradient” as an indicator of plantar skin injury. Diabetes Care 28:2908–2912 Mueller MJ, Zou D, Bohnert KL, Turtle LJ, Sinacore DR (2008) Plantar stresses on the neuropathic foot during barefoot walking. Phys Ther 88:1375–1384 Orendurff MS, Rohr ES, Sangeorzan BJ, Weaver K, Czerniecki JM (2006) An equinus doformity of the ankle accounts for only a small amount of the increased forefoot plantar pressure in patients with diabetes. J Bone Joint Surg Series B 88:65–68 Orlin M, McPoil T (2000) Plantar pressure assessment. Phys Ther 80:389–409 Owings TM, Apelqvist J, Stenstrom A, Becker M, Bus SA, Kalpen A, Ulbrecht JS, Cavanagh PR (2009) Plantar pressures in diabetic patients with foot ulcers which have remained healed. Diabet Med 26:1141–1146 Park SW, Kim Y-H (2007) Local shear stress measurements in the normal and diabetic foot patients during walking. In: Magjarevic R, Nagel JH (eds) World congress on medical physics and biomedical engineering 2006: August 27 – September 1, 2006 COEX Seoul, Korea “imaging the future medicine”. Springer, Berlin/Heidelberg Payne C, Turner D, Miller K (2002) Determinants of plantar pressures in the diabetic foot. J Diabetes Complications 16:277–283 Pham H, Armstrong DG, Harvey C, Harkless LB, Giurini JM, Veves A (2000) Screening techniques to identify people at high risk for diabetic foot ulceration: a prospective multicenter trial. Diabetes Care 23:606–611 Pollard JP, le Quesne LP (1983) Method of healing diabetic forefoot ulcers. Br Med J (Clin Res Ed) 286:436–437 Price C, Parker D, Nester CJ (2014) Validity and repeatability of three commercially available in-shoe pressure measurement systems. J Foot Ankle Res 7:A67 Price C, Parker D, Nester C (2016) Validity and repeatability of three in-shoe pressure measurement systems. Gait Posture 46:69–74 Qiu X, Tian DH, Han CL, Chen W, Wang ZJ, Mu ZY, Liu KZ (2015) Plantar pressure changes and correlating risk factors in Chinese patients with type 2 diabetes: preliminary 2-year results of a prospective study. Chin Med J (Engl) 128:3283–3291 Rao S, Saltzman C, Yack HJ (2006) Ankle ROM and stiffness measured at rest and during gait in individuals with and without diabetic sensory neuropathy. Gait Posture 24:295–301 Raspovic A, Newcombe L, Lloyd J, Dalton E (2000) Effect of customized insoles on vertical plantar pressures in sites of previous neuropathic ulceration in the diabetic foot. Foot 10:133–138 Rosenbaum D, Becker HP (1997) Plantar pressure distribution measurements. Technical background and clinical applications. Foot Ankle Surg 3:1–14 Rosenbaum D, Hautmann S, Gold M, Claes L (1994) Effects of walking speed on plantar pressure patterns and hindfoot angular motion. Gait Posture 2:191–197 Sacco IC, Bacarin TA, Canettieri MG, Hennig EM (2009) Plantar pressures during shod gait in diabetic neuropathic patients with and without a history of plantar ulceration. J Am Podiatr Med Assoc 99:285–294 Savelberg HH, Schaper NC, Willems PJ, de Lange TL, Meijer K (2009) Redistribution of joint moments is associated with changed plantar pressure in diabetic polyneuropathy. BMC Musculoskelet Disord 10:16 Segal A, Rohr E, Orendurff M, Shofer J, O’Brien M, Sangeorzan B (2004) The effect of walking speed on peak plantar pressure. Foot Ankle Int 25:926–933 Stess RM, Jensen SR, Mirmiran R (1997) The role of dynamic plantar pressures in diabetic foot ulcers. Diabetes Care 20:855–858 Stokes IA, Faris IB, Hutton WC (1975) The neuropathic ulcer and loads on the foot in diabetic patients. Acta Orthop Scand 46:839–847 Tsung BYS, Zhang M, Mak AFT, Wong MWN (2004) Effectiveness of insoles on plantar pressure redistribution. J Rehabil Res Dev 41:767–774 Urry S (1999) Plantar pressure-measurement sensors. Meas Sci Technol 10

The Importance of Foot Pressure in Diabetes

29

van Schie CH (2005) A review of the biomechanics of the diabetic foot. Int J Low Extrem Wounds 4:160–170 van Schie CM, Boulton AM (2002) Biomechanics of the diabetic foot: the road to foot ulceration. In: Veves A, Giurini J, Logerfo F (eds) The diabetic foot. Humana Press, Totowa Verdini F, Marcucci M, Benedetti MG, Leo T (2006) Identification and characterisation of heel strike transient. Gait Posture 24:77–84 Veves A, Murray HJ, Young MJ, Boulton AJM (1992) The risk of foot ulceration in diabetic patients with high foot pressure: a prospective study. Diabetologia 35:660–663 Waaijman R, Bus SA (2012) The interdependency of peak pressure and pressure–time integral in pressure studies on diabetic footwear: no need to report both parameters. Gait Posture 35:1–5 Waldecker U (2012) Pedographic classification and ulcer detection in the diabetic foot. Foot Ankle Surg 18:42–49 Warren GL, Maher RM, Higbie EJ (2004) Temporal patterns of plantar pressures and lower-leg muscle activity during walking: effect of speed. Gait Posture 19:91–100 Wearing SC, Urry S, Smeathers JE, Battistutta D (1999) A comparison of gait initiation and termination methods for obtaining plantar foot pressures. Gait Posture 10:255–263 Wood WA, Wood MA, Werter SA, Menn JJ, Hamilton SA, Jacoby R, Dellon AL (2005) Testing for loss of protective sensation in patients with foot ulceration: a cross-sectional study. J Am Podiatr Med Assoc 95:469–474 Woodburn J, Helliwell PS (1996) Observations on the F-Scan in-shoe pressure measuring system. Clin Biomech (Bristol, Avon) 11:301–304 Wrobel JS, Najafi B (2010) Diabetic foot biomechanics and gait dysfunction. J Diabetes Sci Technol 4:833–845 Yavuz M (2014) Plantar shear stress distributions in diabetic patients with and without neuropathy. Clin Biomech (Bristol, Avon) 29:223–229 Yavuz M, Botek G, Davis BL (2007a) Plantar shear stress distributions: comparing actual and predicted frictional forces at the foot-ground interface. J Biomech 40:3045–3049 Yavuz M, Erdemir A, Botek G, Hirschman GB, Bardsley L, Davis BL (2007b) Peak plantar pressure and shear locations: relevance to diabetic patients. Diabetes Care 30:2643–2645 Yavuz M, Tajaddini A, Botek G, Davis BL (2008) Temporal characteristics of plantar shear distribution: relevance to diabetic patients. J Biomech 41:556–559 Yavuz M, Ocak H, Hetherington VJ, Davis BL (2009) Prediction of plantar shear stress distribution by artificial intelligence methods. J Biomech Eng 131:091007 Yuk San Tsung B, Zhang M, Fuk Tat Mak A, Wan Nar Wong M (2004) Effectiveness of insoles on plantar pressure redistribution. J Rehabil Res Dev 41:4671–4674 Zammit GV, Menz HB, Munteanu SE (2010) Reliability of the TekScan MatScan(R) system for the measurement of plantar forces and pressures during barefoot level walking in healthy adults. J Foot Ankle Res 3:11 Zhao J, Guo Y, Wang L (2013) An insole plantar pressure measurement system based on 3D forces piezoelectric sensor. Sens Transducers 160:49–54

Assessing the Impact of Aerobic Fitness on Gait Annet Dallmeijer, Astrid Balemans, and Eline Bolster

Abstract

Children with cerebral palsy often develop walking problems, like reduced walking distance, or early fatigue during walking. These problems are primarily caused by motor impairments that lead to gait deviations and increased energy demands of walking which reduce activities in daily life. As a consequence, physical inactivity and low fitness are frequent in this population. The combination of an increased energy demand of walking and reduced aerobic fitness brings about high levels of physical strain of walking. Maintaining adequate (aerobic) fitness levels by physical training is therefore important in children with high energy demands of walking as this keeps up their metabolic reserve and reduces fatigue-related walking problems and inactivity. Exercise testing is applied to measure the energy demands of walking and aerobic fitness, guiding whether treatment should focus on reducing energy cost, increasing fitness, or both. Testing is especially indicated when walking problems and physical activityrelated fatigue are reported. To improve aerobic fitness in deconditioned children, training of sufficient frequency, intensity and duration is required, preferably combined with specific functional exercises.

A. Dallmeijer (*) • E. Bolster Department of Rehabilitation Medicine, MOVE Research Institute Amsterdam, EMGO+ Institute for Health and Care Research, VU University Medical Center, Amsterdam, The Netherlands e-mail: [email protected]; [email protected] A. Balemans Department of Rehabilitation Medicine, MOVE Research Institute Amsterdam, EMGO+ Institute for Health and Care Research, VU University Medical Center, Amsterdam, The Netherlands Brain Center Rudolf Magnus and Center of Excellence for Rehabilitation Medicine University Medical Center, Utrecht, The Netherlands De Hoogstraat Rehabilitation, Utrecht, The Netherlands e-mail: [email protected] # Springer International Publishing AG 2017 B. Müller, S.I. Wolf (eds.), Handbook of Human Motion, DOI 10.1007/978-3-319-30808-1_40-1

1

2

A. Dallmeijer et al.

Keywords

Aerobic fitness • Cerebral palsy • Energy consumption • Gait deviations • Physical strain • Training

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 State of the Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Cerebral Palsy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Walking Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Energy Demands of Walking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Energy Cost in Children with CP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Aerobic Fitness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Aerobic Fitness Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Aerobic Fitness and Cerebral Palsy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Physical Strain and Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Physical Strain of Walking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Training Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Aerobic Training Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Practical Implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Specificity of Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

Introduction Children with cerebral palsy (CP) have motor impairments that often lead to gait deviations and increased energy demands of walking. These children often develop walking related problems, like reduced walking distance and fatigue during walking, leading to inactivity and low aerobic fitness levels. The combination of an increased energy demand of walking and reduced aerobic fitness evokes high levels of physical strain of walking. This chapter describes how the physical strain and its underlying components, the energy cost and aerobic fitness, can be measured and how this information can be used to reduce walking problems in children with CP.

State of the Art Cerebral Palsy Cerebral palsy (CP) is described as a group of conditions with permanent disorders of the development of movement and posture, causing activity limitation. The condition is caused by a nonprogressive injury of the developing fetal or infant brain and is the most common cause of physical disability in childhood with a prevalence around 2–3 cases per 1000 live births. Motor disorders are the most important clinical symptoms, including disturbances of motor control, abnormal muscle tone, hyperreflexia, and muscle weakness. These lead to gait deviations and limitations at the level of activity and participation as described in the

Assessing the Impact of Aerobic Fitness on Gait

3

International Classification of Function, Disability and Health (ICF). Despite the nonprogressive nature of the primary brain injury, secondary impairments, like joint contractures and bony deformities, may develop during growth, which can further limit activities and societal participation across the lifespan. The condition is highly variable in terms of severity and is often accompanied by sensory, behavioral, and cognitive disturbances (Graham et al. 2016).

Walking Limitations Limitations of walking ability and overall mobility are a predominant feature that is highly variable among children with CP. To describe the impact of this heterogeneous condition on mobility and to facilitate communication among health professionals, the Gross Motor Function Classification System (GMFCS) was developed, a five-level functional classification system that classifies gross motor ability in children with CP. Distinction between categories is based on self-initiated movement, use of assistive walking devices, use of wheeled mobility, and to a lesser extent, the quality of movement. Children in GMFCS level I can walk indoors and outdoors and climb stairs without restriction but have limitations in more advanced motor skills like running, GMFCS level II includes children who can walk indoors and outdoors without assistive devices but experience limitations walking outdoors on uneven surface and stairs, children in GMFCS level III walk indoors and outdoors with an assistive mobility device, those in GMFCS level IV walk short distances with a device but rely more on wheeled mobility in daily life, and children classified as GMFCS level V have no means of independent mobility (Palisano et al. 1997). Despite the nonprogressive nature of the primary brain injury, secondary impairments, like joint contractures and bony deformities, may develop during growth, which can further deteriorate gait deviations and limit walking ability. An important aim of treatment during childhood is therefore to maintain walking ability and optimize overall mobility. Walking problems are among the main reasons to seek medical advice in ambulant children with CP. Problems that are frequently seen in clinical practice are reduced walking distance, decreased walking speed, and the experience of fatigue during walking, sports, or other daily life activities. It is important to address these complaints in childhood, as it is known that walking ability deteriorates in adults with CP starting in early adulthood (Opheim et al. 2009). Long-term aim of treatment is therefore to prevent early deterioration of walking ability in adulthood.

Energy Demands of Walking The walking limitations in children with CP often originate from a pathological gait pattern. The motor disorders can lead to a variety of gait deviations. The most typical gait deviations are crouch gait, where children walk with increased knee flexion during stance phase and equines gait where children walk with increased plantar flexion without heel contact, stiff knee gait, and excessive hip flexion with endo

4

A. Dallmeijer et al.

rotation/adduction movement (Graham et al. 2016). Although the etiology is not always well understood, these gait deviations generally lead to increased energy demands of walking and thus a low walking economy, in comparison to typically developing children (Bolster et al. 2017). The energy demands of walking can be determined by measuring oxygen uptake (VO2, ml/min), carbon dioxide output (VCO2, ml/min), and ventilation (l/min) with a mobile metabolic system during walking at comfortable walking speed. As the oxidative system needs time to start the oxygen delivery to the muscles and produce energy (ATP), children have to walk for at least 3 min before the actual energy demands of walking can be measured. After 2–3 min, there will be equilibrium between the oxygen demands and oxygen supply, which is called “steady state.” The mean oxygen uptake and carbon dioxide output over 2-min steady-state exercise can be used to calculate the energy demands. The energy consumption, expressed in Joules, can be calculated by converting the oxygen uptake (ml/min) to Joules (1 liter VO2  21 KJ, depending on the respiratory exchange ratio: VCO2/VO2). The energy consumption of walking can be expressed per meter, which is called the energy cost (J/kg/m).

Energy Cost in Children with CP

Fig. 1 Energy cost of walking at comfortable walking speed for typically developing children (TD, n = 63) and children with CP in GMFCS level I (n = 48), II (n = 56), and III (n = 24). Boxplots represent median and interquartile ranges

Gross energy cost during walking (J/kg/m)

Energy cost of walking is increased in children with CP who have deviated gait patterns. Values increase with higher GMFCS levels (i.e., more severe motor involvement), with mean values of around 120% of age-predicted values for children in GMFCS level I up to 220% of age-predicted values in children in GMFCS level III. Values for individual children with GMFCS level III can be up to three times higher than typically developing controls (Bolster et al. 2017) (Fig. 1). For the net energy cost, which is calculated by subtracting the resting energy consumption from the gross energy consumption, similar increases are found in children with CP, ranging from 130% to 250% of age-predicted values (Bolster et al. 2017).

20

15

10

5

0 TD

II I GMFCS level

III

Assessing the Impact of Aerobic Fitness on Gait

5

In typically developing children, it has been shown that gross energy cost declines from around 5–6 J/kg/m at age 6 to 3–4 J/kg/m at age 18 (Bolster et al. 2017). This decline is attributed to a (relative) decrease in resting energy consumption during growth and has been shown to depend on differences in body height and size, especially body surface area (BSA) (Kamp et al. 2014). This is important to note as children with CP are generally smaller and show larger variability in body size. The dependency of the gross energy cost on body size limits its use for monitoring changes in walking economy with growth or for comparing values among children of different ages and body sizes. For monitoring walking economy during growth, the net energy cost is a more appropriate outcome. The net energy cost is less dependent on age and height and is therefore assumed to give a better indication of the walking economy during growth (Schwartz et al. 2006). Still, there is a small dependency of net energy cost on age with a decline of around 0.073 J/kg/m per year, which needs to be taken into account when comparing net energy cost values over time (Bolster et al. 2017). The disadvantage of using the net energy cost is that the measurement error is larger, because a second assessment (i.e., resting metabolic rate) is required and thus reliability is lower (Brehm et al. 2007). Another normalization procedure is described by Schwartz et al. (2006), which uses leg length to adjust walking speed and energy consumption (J/kg/min) for growth, resulting in nondimensional variables. The net nondimensional energy cost shows a similar decline with age compared to the net energy cost and can therefore be used interchangeably with the net energy cost (Bolster et al. 2017). Common gait deviations such as a walking pattern with flexed knees (crouch gait) and equines gait (plantar flexion ankles) are associated with increased energy cost of walking and thus low walking economy in children with CP. Increased cocontraction of agonist and antagonist muscles accounted for up to 50% of the variability in oxygen uptake during walking in children with CP (Unnithan et al. 1996). Several treatments are aimed at improving the gait pattern, preventing further deterioration of gait and the development of musculoskeletal deformities. These treatments are also expected to improve walking economy by reducing the energy cost. The most common treatments include orthopedic surgery, treatment to reduce spasticity (e.g., botulinum toxin), or orthotic treatment with ankle foot orthoses (Graham et al. 2016; Brehm et al. 2008; Scholtes et al. 2006). Evaluation of energy cost before and after treatments that aim to improve gait and walking economy is done to provide insight in the individual treatment effects.

Aerobic Fitness Aerobic Fitness Assessment A high energy consumption during walking results by definition also in a higher effort or physical strain of walking. To determine the physical strain of walking, the maximal (or peak) oxygen uptake (VO2peak) needs to be known as well. The VO2peak or maximal aerobic capacity is considered as the gold standard for aerobic fitness (i.e., cardiorespiratory) and can be assessed with a cardiopulmonary exercise test (CPET).

6

A. Dallmeijer et al.

During this laboratory-based test, VO2, VCO2, ventilation, and heart rate are measured during standardized test conditions with an incremental exercise protocol. Exercise testing in children with CP may be complicated by the prevalent motor disorders such as spasticity, limited range of motion, impaired selective motor control, and increased levels of coactivation. Nevertheless, adapted test protocols have been developed for exercise testing in this population, including cycle ergometer and treadmill tests (Verschuren and Balemans 2015). In walking children with CP (GMFCS I–II), exercise tests are typically performed on a bicycle ergometer or treadmill in laboratory conditions (Balemans et al. 2013a). These tests enable a direct assessment of the VO2peak and determination of the objective physiological criteria for the achievement of maximal exercise. These criteria are (1) a heart rate above 180 beats per minute, (2) a respiratory exchange ratio (RER) above 1.00, and (3) subjective signs of exhaustion in order to ensure a cardiorespiratory maximum. The test is considered as a valid measurement of VO2peak when at least two out of these three criteria are met. Protocols are developed to reach maximal effort, which means that the child experiences exhaustion within 8–12 min (Balemans et al. 2013b). Treadmill testing is most specific for evaluating fitness when walking ability is of interest. Most children who walk without a walking aid are able to perform a treadmill test, but sufficient balance control is required, and optimal walking velocity needs to be determined pretesting. For children with more extensive motor control problems, i.e., those with GMFCS II or III, cycle ergometry is a more feasible alternative, despite its lower specificity for walking. Both treadmill test protocols (GMFCS I–II, Verschuren and Takken 2010) as well as cycle ergometry protocols (GMFCS I–III, Brehm et al. 2014) showed good reliability in children with CP. However, although most – but not all – children with GMFCS level III can be tested on a bicycle ergometer, this does not apply to children with GMFCS IV or V. Due to their gross motor limitations, these children are often not able to cycle against incremental resistance. For this group, exercise tests need to be developed that are compliant with their functional limitations. The recent development of training and exercise tests on a racerunner (i.e., a tricycle without pedals that can be propelled by stepping forward, Bolster et al. 2016) is promising for this group with more severe disability. Another feasible alternative is measuring VO2peak during field-based tests, like the shuttle run test that was adapted for children with CP (Verschuren et al. 2006). If the above described criteria for maximal exercise can be measured to assure maximal effort, these tests, though less standardized, can also be used to determine the VO2peak (see, e.g., Dallmeijer and Brehm 2011). Children can be tested from age 6 or 7, and they need to be able to follow simple instructions. Exclusion criteria for CPET in children with CP are similar to typically developing peers (i.e., no contraindications for maximal exercise).

Aerobic Fitness and Cerebral Palsy Aerobic fitness levels in children with CP are lower than in typically developing children (Balemans et al. 2013b, Verschuren and Takken 2010). When measured on

Assessing the Impact of Aerobic Fitness on Gait

7

a bicycle ergometer, values were 15–30% lower in children with CP (GMFCS level I–III) than in typically developing children. As expected from a reduced motor control and concomitant lower mechanical efficiency, peak power output was even more reduced (20–55%) and was much more dependent on GMFCS level than VO2peak (Balemans et al. 2013b). Treadmill testing showed a decline of 17% for VO2peak in children with GMFCS I and II compared to typically developing children (Verschuren and Takken 2010). Although values tend to be lower in children with higher GMFCS levels (i.e., more severe motor involvement), there were no significant differences between GMFCS levels. The variability within groups was however high, indicating that some children have adequate fitness levels, while others are highly deconditioned. The proportion of children with CP with VO2peak levels below the tenth percentile of age and gender-based predicted values for typically developing children was high, ranging from around 45% in children with GMFCS level I to 65% in GMFCS II and 85% for those in GMFCS III or IV (unpublished observations). All these children were seeking medical advice because of walking problems or fatigue complaints during activities of daily living. This large proportion of children with low fitness levels and the large variability within the GMFCS levels emphasize the need to monitor aerobic fitness in this population, especially when fatigue-related walking problems are present. Inactivity most likely explains the low fitness levels of children with CP. It has been shown that children with CP are less active in physical activities than their typically developing peers and that activity levels decrease with larger motor involvement (Van Wely et al. 2014). Walking children with CP spend only 4–10% of their waking hours in moderate to vigorous activities, compared to 29% in typically developing children. Also sedentary time, which is associated with an increased risk for health problems in the general population, is as high as 73–82% in children with CP, compared to 39% in typically developing children. Lower levels of physical activity are related to low aerobic fitness levels (i.e., VO2peak) in children with a bilateral CP (Maltais et al. 2005; Balemans et al. 2015a).

Physical Strain and Training Physical Strain of Walking The combination of an increased energy cost of walking and a decreased aerobic fitness in children with CP leads to high levels of physical strain during walking and other daily activities (Dallmeijer and Brehm 2011). The physical strain of walking can be calculated by expressing the oxygen uptake during walking (in ml/kg/min) as a percentage of the maximal oxygen uptake (in ml/kg/min). In walking children with CP, the average physical strain for GMFCS levels I, II, and III has shown to be as high as 55%, 62%, and 78% of VO2peak, respectively, compared to 40% in typically developing children (Balemans et al. 2015b). Again, large variation within GMFCS groups (Fig. 2) suggests that this problem differs among individual children and that exercise testing is required to establish actual level of physical strain.

Fig. 2 Physical strain (in % VO2peak) for typically developing children (TD, n = 20) and children with CP in GMFCS levels I (n = 13), II (n = 17), and III (n = 7). Boxplots represent median and interquartile ranges

A. Dallmeijer et al.

Physical strain during walking (%)

8

100 80 60 40 20 0 TD

I

II

III

GMFCS level

From the above, it is apparent that a high physical strain may arise from either an increased energy cost of walking, a decreased aerobic fitness level, or both. When an increased energy cost is the main factor causing high levels of physical strain, treatment should focus on improving the walking economy. Most common walking economy treatments are spasticity treatment (e.g., botulinum toxin treatment), treatment with orthotics (e.g., ankle foot orthosis), or orthopedic interventions (Graham et al. 2016). When however a reduced fitness level is causing the high physical strain, an individualized physical training program is indicated. Distinguishing the cause of an increased physical strain by exercise testing is therefore relevant for appropriate treatment management. Recent data showed that less than 20% of the children with CP who have walking problems had no deviations in either energy cost or VO2peak, while 25% of the children had an increased energy cost in combination with a normal VO2peak and also 25% had a decreased VO2peak in combination with a normal energy cost. Around one third of the children with CP had both an increased energy cost and decreased VO2peak (Balemans et al. 2015b). Walking speed is often reduced, especially for those with increased energy cost. As a consequence of the lower walking speed, energy consumption, and thus physical strain, is lower in these children than in children with normal walking speed despite similar deviations in energy cost and VO2peak. Clearly, a high physical strain leaves a smaller metabolic reserve (i.e., the difference between actual oxygen uptake during walking and VO2peak) that may cause early fatigue during daily life activities, further limiting mobility and decreasing activity (Dallmeijer and Brehm 2011). Although it seems counterintuitive, the higher levels of physical strain do apparently not result in higher fitness levels but, in contrast, may lead to physical inactivity and a concomitant deterioration in fitness level. An individually targeted fitness training program may help to cut this vicious cycle by increasing their fitness levels and thereby reducing the relative effort of their daily activities.

Assessing the Impact of Aerobic Fitness on Gait

9

Training Guidelines As in the general population, exercise training can be undertaken safely in most individuals with CP and is increasingly propagated in this population (Fowler et al. 2007, Maltais et al. 2014). Training guidelines for improving aerobic fitness are suggested to be similar to the general population with a frequency of at least 2–4 times per week with a minimum duration of 20 min, at a moderate intensity of about 60–75% maximum heart rate, 40–80% of heart rate reserve, or 50–65% VO2peak (Verschuren et al. 2016). For younger children, it is advised to incorporate an interval training scheme rather than long bouts of continuous exercise, as to increase compliance and to match their daily activity pattern.

Aerobic Training Effects A number of aerobic training studies have been described for children and adolescents with CP. In general, few studies evaluated training effects on VO2peak. Studies that applied proper training guidelines regarding frequency, duration, and intensity of exercise support the effectiveness of aerobic training interventions in children with CP (see Verschuren et al. 2016). Reported increases in VO2peak varied from 23% for an 8-week intervention in young people with CP (10–16 years, Nsenga et al. 2013) to 18% for a 3-month intervention in adolescents with CP (14–18 years and GMFCS II and III, Unnithan et al. 2007) and 9% for a 3-month intervention in adolescents and young adults with CP (16–24 years, Slaman et al. 2015). These results indicate that exercise training can effectively increase aerobic fitness in children, adolescents, and young adults with CP, but evidence with regard to age and disability subgroups needs to be extended.

Practical Implications Maintaining adequate aerobic fitness levels (i.e., VO2peak) is especially important in children who are exposed to high energy demands of walking. The increased physical strain that results from the higher energy consumption of walking is even higher in those with low VO2peak levels, which may lead to an exacerbation of the vicious cycle of inactivity and decreased fitness. In children with normal energy cost of walking but low VO2peak, physical strain will also be higher than normal, up to 40–50% of VO2peak, but not as high as the 60–80% of VO2peak, as found in children with both increased energy cost of walking and decreased VO2peak. The latter intensity level equals vigorous exercise which is likely to affect daily activities. Figure 2 shows that for the majority of children with GMFCS level II or higher, the effort of walking is increased to such an extent that the physical strain is above 60%. For these children, it is essential to lower the physical strain by either increasing the VO2peak or decreasing the energy cost. The benefits of improving the VO2peak are therefore larger, and are more relevant, in those with a high physical strain and highly increased energy cost of

10

A. Dallmeijer et al.

walking. Generally, an increase of 20% of the VO2peak would result in a decrease of the physical strain of around 10–15% (depending on the oxygen uptake of walking). Obviously, next to improving the VO2peak, reducing the energy cost of walking is also essential in order to reduce the physical strain of walking.

Specificity of Training As the main goal of treatment is aimed at improving activities of daily living, it is advised that aerobic training is incorporated in functional exercises that comply with movement behavior in daily life. This especially applies to training in younger children, in order to stimulate their motor development. Training with functional exercises like walking, running, stepping, and stair climbing have been shown to improve functional outcomes measured with the shuttle run test in children and adolescents with GMFCS I–II by 41% (7–20 years, Verschuren et al. 2007). Although it is not known to what extent these training effects can be attributed to fitness improvements (i.e., increased VO2peak) or improvements in motor skills, it is likely that both components contributed to a better performance on the shuttle run test. These positive effects are with no doubt relevant for daily life activities.

Conclusion Motor impairments of children with CP lead to gait deviations and increased energy demands of walking. As a consequence, walking children with CP often develop walking-related problems, causing inactivity and low aerobic fitness levels. The combination of an increased energy cost of walking and reduced aerobic fitness brings about high levels of physical strain during walking. Apart from reducing the energy cost of walking, maintaining or improving adequate aerobic fitness levels is of utmost importance in children who walk with increased energy cost levels as this increases their metabolic reserve and may reduce fatigue-related walking problems and inactivity. Exercise testing to identify children with high energy cost and/or low maximal aerobic fitness is therefore indicated when walking problems or physicalactivity related fatigue is present. To improve aerobic fitness in deconditioned children, training of sufficient frequency, intensity, and duration is required, preferably combined with functional exercises. Treatment of walking problems in children with CP should not only focus on improving the walking pattern, but individualized training interventions are indicated when aerobic fitness levels are low.

References Balemans AC, Fragala-Pinkham MA, Lennon N, Thorpe D, Boyd RN, O'Neil ME, Bjornson K, Becher JG, Dallmeijer AJ (2013a) Systematic review of the clinimetric properties of laboratory-

Assessing the Impact of Aerobic Fitness on Gait

11

and field-based aerobic and anaerobic fitness measures in children with cerebral palsy. Arch Phys Med Rehabil 94:287–301 Balemans AC, Van Wely L, de Heer SJ, Van den Brink J, de Koning JJ, Becher JG, Dallmeijer AJ (2013b) Maximal aerobic and anaerobic exercise responses in children with cerebral palsy. Med Sci Sports Exerc 45:561–568 Balemans AC, Van Wely L, Becher JG, Dallmeijer AJ (2015a) Longitudinal relationship among physical fitness, walking-related physical activity, and fatigue in children with cerebral palsy. Phys Ther 95:996–1005 Balemans AC, Bolster E, Bakels J, Blauw R, Becher JG, Dallmeijer AJ (2015b) Physical strain of walking in cerebral palsy. Dev Med Child Neurol Suppl 57:65–66 Bolster EA, Balemans AC, Brehm MA, Buizer AI, Dallmeijer AJ (2017) Energy cost during walking in association with age and body height in youth with cerebral palsy. Gait Posture 54:119–126 Bolster E, Dallmeijer AJ, de Wolf GS, Versteegt M, Schie PE (2016) Reliability and construct validity of the 6-minute racerunner test in children and youth with cerebral palsy. GMFCS Levels III and IV Phys Occup Ther Pediatr 17:1–12 Brehm MA, Balemans AC, Becher JG, Dallmeijer AJ (2014) Reliability of a progressive maximal cycle ergometer test to assess peak oxygen uptake in children with mild to moderate cerebral palsy. Phys Ther 94:121–128 Brehm MA, Becher J, Harlaar J (2007) Reproducibility evaluation of gross and net walking efficiency in children with cerebral palsy. Dev Med Child Neurol 49:45–48 Brehm MA, Harlaar J, Schwartz M (2008) Effect of ankle-foot orthoses on walking efficiency and gait in children with cerebral palsy. J Rehabil Med 40:529–534 Dallmeijer AJ, Brehm MA (2011) Physical strain of comfortable walking in children with mild cerebral palsy. Disabil Rehabil 33:1351–1357 Fowler EG, Kolobe TH, Damiano DL, Thorpe DE, Morgan DW, Brunstrom JE, Coster WJ, Henderson RC, Pitetti KH, Rimmer JH, Rose J, Stevenson RD (2007) Promotion of physical fitness and prevention of secondary conditions for children with cerebral palsy: section on pediatrics research summit proceedings. Phys Ther 87:1495–1510 Graham HK, Rosenbaum P, Paneth N, Dan B, Lin JP, Damiano DL, Becher JG, Gaebler-Spira D, Colver A, Reddihough DS, Crompton KE, Lieber RL (2016) Cerebral palsy. Nat Rev Dis Primers 2:15082 Kamp FA, Lennon N, Holmes L, Dallmeijer AJ, Henley J, Miller F (2014) Energy cost of walking in children with spastic cerebral palsy: relationship with age, body composition and mobility capacity. Gait Posture 40:209–214 Maltais DB, Pierrynowski MR, Galea VA, Bar-Or O (2005) Physical activity level is associated with the O2 cost of walking in cerebral palsy. Med Sci Sports Exerc 37:347–353 Maltais DB, Wiart L, Fowler E, Verschuren O, Damiano DL (2014) Health-related physical fitness for children with cerebral palsy. J Child Neurol 29:1091–1100 Nsenga AL, Shephard RJ, Ahmaidi S (2013) Aerobic training in children with cerebral palsy. Int J Sports Med 34:533–537 Opheim A, Jahnsen R, Olsson E, Stanghelle JK (2009) Walking function, pain, and fatigue in adults with cerebral palsy: a 7-year follow-up study. Dev Med Child Neurol 51:381–388 Palisano R, Rosenbaum P, Walter S, Russell D, Wood E, Galuppi B (1997) Development and reliability of a system to classify gross motor function in children with cerebral palsy. Dev Med Child Neurol 39:214–223 Scholtes VA, Dallmeijer AJ, Knol DL, Speth LA, Maathuis CG, Jongerius PH, Becher JG (2006) The combined effect of lower limb multilevel botilinun toxin type a and comprehensive rehabilitation on mobility in children with cerebral palsy: a randomized clinical trial. Arch Phys Med Rehabil 87:1551–1558 Schwartz MH, Koop SE, Bourke JL, Baker R (2006) A nondimensional normalization scheme for oxygen utilization data. Gait Posture 24:14–22 Slaman J, Roebroeck M, Dallmeijer A, Twisk J, Stam H, van den Berg-Emons R (2015) Can a lifestyle intervention programme improve physical behaviour among adolescents and young

12

A. Dallmeijer et al.

adults with spastic cerebral palsy? A randomized controlled trial. Dev Med Child Neurol 57:159–166 Unnithan VB, Dowling JJ, Frost G, Bar-Or O (1996) Role of cocontraction in the O2 cost of walking in children with cerebral palsy. Med Sci Sports Exerc 28:1498–1504 Unnithan VB, Katsimanis G, Evangelinou C, Kosmas C, Kandrali I, Kellis E (2007) Effect of strength and aerobic training in children with cerebral palsy. Med Sci Sports Exerc 39:1902–1909 Van Wely L, Dallmeijer AJ, Balemans AC, Zhou C, Becher JG, Bjornson KF (2014) Walking activity of children with cerebral palsy and children developing typically: a comparison between the Netherlands and the United States. Disabil Rehabil 36:2136–2142 Verschuren O, Takken T, Ketelaar M, Gorter JW, Helders PJ (2006) Reliability and validity of data for 2 newly developed shuttle run tests in children with cerebral palsy. Phys Ther 86(8): 1107–1117 Verschuren O, Ketelaar M, Gorter JW, Helders PJ, Uiterwaal CS, Takken T (2007) Exercise training program in children and adolescents with cerebral palsy: a randomized controlled trial. Arch Pediatr Adolesc Med 161:1075–1081 Verschuren O, Takken T (2010) Aerobic capacity in children and adolescents with cerebral palsy. Res Dev Disabil 31(6):1352–1357 Verschuren O, Balemans AC (2015) Update of the core set of exercise tests for children and adolescents with cerebral palsy. Pediatr Phys Ther 27:187–189 Verschuren O, Peterson MD, Balemans ACJ, Hurvitz EA (2016) Exercise and physical activity recommendations for people with cerebral palsy. Dev Med Child Neurol 58(8):798–808

Oxygen Consumption in Cerebral Palsy Hank White, J. J. Wallace, and Sam Augsburger

Abstract

Oxygen consumption is a measure of aerobic fitness and the body’s ability to deliver oxygen for energy generation during exercise. Oxygen consumption can be measured directly or indirectly. To measure oxygen consumption, breath by breath measurements of the oxygen inspired and the carbon dioxide exhaled are measured by specialized equipment. For other indirect measures, heart rate or distance walked/ran during exercise are used in regression equations to estimate oxygen consumption. Examples of such measures of oxygen consumption and energy expenditure are shuttle runs/rides, stair climbing tests, 6 min walk tests, 1 min walking tests, and mechanical energy estimation. Regardless of age children, adolescents and adults diagnosed with cerebral palsy have decreased physical activity and increased energy, oxygen cost and oxygen consumption (measured with direct and indirect methods) when walking compared to ablebodied persons. Decreases in physical activity may increase the risk of cardiovascular and cardiopulmonary compromise in children and adults diagnosed with cerebral palsy and these impairments may contribute to further decreases in physical activities. However, surgical interventions (single event multi-level surgeries, and Rhizotomy) and therapy have been reported to increase walking distances and decrease energy expended when walking for persons diagnosed with cerebral palsy. Keywords

Cerebral palsy • Children • Oxygen consumption • Oxygen cost • Energy expenditure • Walking

H. White (*) • J.J. Wallace • S. Augsburger Motion Analysis Center, Shriners Hospitals for Children Medical Center, Lexington, KY, USA e-mail: [email protected]; [email protected]; [email protected] # Springer International Publishing Switzerland 2016 B. Müller, S.I. Wolf (eds.), Handbook of Human Motion, DOI 10.1007/978-3-319-30808-1_41-1

1

2

H. White et al.

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cerebral Palsy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Exercise Testing and Children . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Exercise Testing and Cerebral Palsy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Equipment for Exercise Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Measures of Oxygen Consumption (VO2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Direct Measures of VO2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Indirect Measures of VO2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Relationship between Mechanical Energy and VO2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Interventions That Affect VO2 Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Surgery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Therapy Effects on VO2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2 2 3 4 4 5 6 6 10 11 15 15 17 18 18

Introduction The international classification of functioning, disability and health (ICF) is a classification system that provides a standard language and framework for the description of health and health related states (Vargus-Adams and Majnemer 2014). Different domains describe changes in body function and structure, as well as changes in activity and participation. The ICF differentiates what a person can do in a standard environment (their level of capacity) versus what a person actually does in their usual environment (their level of performance). Walking, running, and other gross motor skills are considered measures of capacity while playing, working and activities of daily living are considered levels of performance. The ICF can help clinicians and researchers understand the relationship between a person’s capacity and their level of performance. There is a cyclical nature between impairments in walking abilities, mechanical energy and oxygen cost when walking and performance that may affect participation in the home and community. For example, if a person’s impairment worsens it may decrease his/her participation and if someone’s participation decreases then a progression of impairments may occur. The goal of this chapter to give clinicians and researchers the knowledge needed to understand how walking efficiency is measured and to provide an overview of oxygen consumption, oxygen cost and mechanical energy when walking for children with cerebral palsy.

Cerebral Palsy Cerebral Palsy (CP) is a clinical syndrome characterized by some type of insult to the brain during development, birth, or in the first 2 years of life (Scholtes et al. 2006). The incidence of Cerebral Palsy (CP) is approximately 2–3 children per 1,000 births

Oxygen Consumption in Cerebral Palsy

3

(Nene et al. 1993). The clinical presentation of CP includes a broad spectrum of impairments of the neuromusculoskeletal system (Scholtes et al. 2006). Among these impairments are decreases in motor control and increases in muscle spasticity and muscle/joint contractures. These impairments have been proposed to cause an increase in energy cost when walking (Augsburger and Tylkowski 2000; Norman et al. 2004; Stallings et al. 1996; van den Hecke et al. 2007). This can result in activity limitations such as a decreased ability to walk and perform transfers safely (Kay et al. 2004). The end result can be participation restrictions in the home, school and community. CP is often classified based on topographical presentation. A child diagnosed with hemiplegia presents as a person who has sustained a cerebral vascular accident with involvement of the ipsilateral arm and leg. CP spastic diplegia presentation is involvement of bilateral lower extremities with or without upper extremity involvement. Hemiplegia and diplegia are the most common presentations of CP. CP spastic quadriplegia presents with involvement of all four extremities, and trunk with or without neck involvement (Morris 2007). Other less common presentations of CP include hypotonia, ataxia, and mixed (spasticity with hypotonia, dystonia and/or ataxia) (Morris 2007). A better classification of CP is the Gross Motor Function Classification System (GMFCS). The GMFCS is a valid and reliable five level classification system for children diagnosed with CP based on self-initiated movements (sitting, standing, walking) (Palisano et al. 2000; Wood and Rosenbaum 2000). The criteria for classification are age based. In general for children 6 years or older GMFCS Level I children demonstrate minimal impairments and mild gait and gross motor abnormalities. Level II will demonstrate more impairments of gait and gross motor abilities, however, can walk without an assistive device on level and uneven surfaces. Children classified as Level III require use of an assistive device (walker or crutches) to ambulate. Children classified as level IV may be able to take steps or perform standing transfers, however have decreased trunk control and cannot sit without external supports. The highest level of impairments is present in children classified as Level V who demonstrate few self-initiated movements.

Exercise Testing and Children Cardiorespiratory health can be assessed by measuring the oxygen uptake during either a maximal or submaximal exercise test (ACSM’s Guidelines for Exercise Testing and Prescription/American College of Sports Medicine. Philadelphia: Wolters Kluwer Health/ Lippincott Williams and Wilkins 2014). Measuring oxygen consumption (VO2) during exercise is considered the gold standard for assessing energy expenditure and exercise capacity (Rose et al. 1990). VO2 is also commonly used to assess changes in walking efficiency after surgery, or rehabilitation in adults and children with disabilities (Bar-Haim et al. 2004; Chan et al. 2008). Oxygen consumption testing can be conducted on a treadmill, cycle ergometer or on level ground in children as young as 3–4 years of age (ACSM’s Exercise

4

H. White et al.

Management for Persons with Chronic Diseases and Disabilities/American College of Sports Medicine. Champaign and IL: Human 2016). Tests can be either maximal or submaximal. Exercise testing is typically a graded test in which the work load is incrementally increased until a plateau in oxygen uptake (L/min) is achieved regardless of the increasing load. Maximal exercise tests are used to elicit the highest level of participation and can be quite demanding for subjects to complete. Alternatively, submaximal tests are designed to be easier to complete and utilize regression equations to estimate maximal exercise levels. The mode of exercise testing, running, walking and cycling, all produce different VO2 results. In able-bodied children treadmill tests produce a 7%–10% higher VO2 than cycling tests, and running on a treadmill will produce a 6%–10% higher VO2 maximum (VO2max) result compared to walking on a treadmill (ACSM’s Exercise Management for Persons with Chronic Diseases and Disabilities/American College of Sports Medicine. Champaign & IL: Human 2016).

Exercise Testing and Cerebral Palsy Assessment The American College of Sports medicine (ACSM), has set specific guidelines for VO2 assessment in children (ACSM’s Guidelines for Exercise Testing and Prescription/American College of Sports Medicine. Philadelphia: Wolters Kluwer Health/ Lippincott Williams and Wilkins 2014). However, for children diagnosed with CP, ACSM guidelines are not always appropriate (ACSM’s Exercise Management for Persons with Chronic Diseases and Disabilities/American College of Sports Medicine. Champaign & IL: Human 2016). Clinical exercise testing protocols like the Bruce or Balke, require specialized equipment, are too intense and increase the work load too steeply for children with CP (Nsenga Leunkeu et al. 2012). Cycle ergometer tests are also not recommended for children with CP; due to the decreased coordination and spasticity that make reciprocal pedaling and maintaining pedal cadence difficult (Nsenga Leunkeu et al. 2012). Testing on a treadmill can be performed in less involved children with CP, with access to a harness system that can ensure patient safety (Potter and Unnithan 2005). Therefore, for children with CP exercise testing is typically performed on level ground (Potter and Unnithan 2005). Children with GMFCS Level I may be able to follow the ACSM guidelines for testing; however Levels II and III may need adaptive measures (ACSM’s Exercise Management for Persons with Chronic Diseases and Disabilities/American College of Sports Medicine. Champaign & IL: Human 2016). For children with CP Levels IV and V, functional mobility testing may be more appropriate than exercise testing (ACSM’s Exercise Management for Persons with Chronic Diseases and Disabilities/American College of Sports Medicine. Champaign & IL: Human 2016). Previous literature assessing VO2 reported that children with CP fatigue more quickly (Potter and Unnithan 2005), do not tolerate exercise as well as peers (Rodda et al. 2006), are less physically active (Gorter et al. 2009), have a lower VO2max during cycling and

Oxygen Consumption in Cerebral Palsy

5

treadmill exercise (Potter and Unnithan 2005; Verschuren et al. 2010), have a two to three times increase in O2 cost when exercising (Bar-Haim et al. 2004; Potter and Unnithan 2005; Bowen et al. 1998a; Thomas et al. 2004) and have a lower mechanical efficiency at maximum load (Rose et al. 1990; Bar-Haim et al. 2004). These differences could be due to co-contractions of muscles (Potter and Unnithan 2005; Thomas et al. 2004), increased muscle tone, muscle spasticity, and bony abnormalities (Thomas et al. 2004).

Equipment for Exercise Testing Measuring energy expenditure during exercise can be accomplished through direct or indirect calorimetry. Direct calorimetry is a direct measure of the heat production while the subject is in an enclosed chamber called a calorimeter (Powers and Howley 2007). This method requires a large and extremely expensive device and is not always feasible for clinical use (Battley 1995). To measure energy expenditure when exercising the chamber must be large enough to have a treadmill or bike inside and the heat produced by the device must be accounted for (Powers and Howley 2007). Therefore, for clinical and research purposes indirect calorimetry, the measurement of the oxygen consumed during rest or exercise, is more widely used (Powers and Howley 2007). Indirect calorimeters are less expensive, require less space, and have been proven to be just as reliable and valid as direct measures (Powers and Howley 2007; Battley 1995). The most common type of indirect calorimetry used is open-circuit spirometry (Powers and Howley 2007). Expired air is sent into a mixing chamber that uses electronic gas analyzers to measure the amount of oxygen (O2) and carbon-dioxide (CO2) and the results are processed in a computer. (Powers and Howley 2007). Open-circuit spirometry can be either a dilution method or a breath-by-breath analysis. With the dilution method the subject is connected to a hood or mask that covers the entire face or head, the dilution pump samples air from the space, the data is sent to a metabolic cart and data is averaged over 15 s intervals (Welch et al. 2015). The breath by breath method analyzes the O2 and CO2 content of every breath, which can be portable, is more applicable in real world scenarios outside the laboratory, and is also more patient friendly and is less cumbersome in clinical settings (Welch et al. 2015). Both methods have been proven to be reliable and valid, however results between methods cannot be interchanged and data from different methods should not be used for pre and post comparisons (Welch et al. 2015). Another method that estimates energy expenditure is the use of the joint kinematics and kinetics to estimate the mechanical energy generated and absorbed by the body when walking or running. The estimation of joint kinematics and kinetics requires the use of force plates and a three-dimensional motion capture systems, both of which are expensive and require large spaces. It is thought that the use of gait mechanics to measure the energy cost of walking can give researchers and clinicians a better understanding of factors that contribute to the elevated cost of walking seen with pathology (Umberger et al. 2013). It is postulated that disruptions in the transfer

6

H. White et al.

of mechanical energy in body segments, as seen in pathologic gait, require compensations by the body that can increase the metabolic cost (Umberger et al. 2013). Therefore, utilizing mechanical energy estimations can enhance the interpretation of data obtained from open-circuit spirometry.

Measures of Oxygen Consumption (VO2) VO2 is a measure of aerobic fitness and the body’s ability to deliver oxygen for energy generation during exercise (Verschuren et al. 2010). VO2 can be measured directly or indirectly. To directly measure VO2, breath by breath measurements of the oxygen inspired and the carbon dioxide exhaled is measured by specialized equipment (Verschuren et al. 2010). For indirect measures, heart rate or distance walked/ran during exercise are used in regression equations to estimate VO2. Examples of indirect measures of oxygen consumption and energy expenditure are shuttle runs/rides, stair climbing tests, 6 min walk tests (6-mwt), 1 min walking tests (1-mwt), and mechanical energy estimation.

Direct Measures of VO2 Breath by Breath Oxygen Consumption (VO2) and CP VO2 measurements when walking for children with CP (GMFCS levels I–III) compared to that of a control group of typically developing children is presented in Table 1. Walking speed was more variable for the control group (50–121.7 m/min) compared to that of the children with CP (41.3–55.9 m/min) (Norman et al. 2004; van den Hecke et al. 2007; Rose et al. 1989; Unnithan et al. 1996). The values for VO2 in children with CP when walking ranged from 14.6 to 23.4 ml/kg/min (average 18.58 ml/kg/min), which is higher than that of the typically developing group 6.3–25.1 (average 13.65 ml/kg/min) (Norman et al. 2004; van den Hecke et al. 2007; Rose et al. 1989; Unnithan et al. 1996). When walking at the same speed (3 km/h), children with CP use 54% of their VO2max compared to only 23% of VO2max for typically developing children (Potter and Unnithan 2005). However, children with CP can improve cardiorespiratory fitness through training. A 9% increase in VO2max and a 7% increase in walking distance for children with CP after a 9 week (2 a week) exercise program has been reported (Gorter et al. 2009). Oxygen Cost and CP Oxygen cost is the volume of oxygen in milliliters measured per kilogram of body weight per meter walked and is reported as ml/kg/m (ACSM’s Guidelines for Exercise Testing and Prescription/American College of Sports Medicine. Philadelphia: Wolters Kluwer Health/Lippincott Williams and Wilkins 2014). Oxygen cost is a measure of the oxygen required to walk one meter and allows for comparison of people walking at different speeds. Compared to adults, children can have a 40%–70% greater oxygen cost when walking at their maximum speed (DeJaeger

Oxygen Consumption in Cerebral Palsy

7

Table 1 Oxygen consumption in children with cerebral palsy and in normal controls

Study Rose et al. (1989) Rose et al. (1989) Unnithan et al. (1996) Unnithan et al. (1996) Norman et al. (2004) Norman et al. (2004) Van Den Hecke et al. (2007) Van Den Hecke et al. (2007)

Subjects diagnosis CP

# of Subjects 13

Normal control CP

18

Normal control CP

8

Normal control CP

15

Normal control

9

10

20 6

Age (years) 11.2 (7–16) 12.5 (7–17) 12.7 (2.8) 13.6 (2.1) 12.8 (2.9) 11.7 (2.7) 8.1 (1.6) 9.9 (0.55)

Walking speed (m/min) 55.9(20)

VO2 (ml/kg/ min) 23.4(7)

Resting VO2 (ml/kg/min) 4.8(0.9)

121.7(14)

25.1(5)

4.9(1)

50

16.6 (6.5) 10.2 (1.2) 19.7 (11.3) 6.3(1.9)

50 41.3(13.4) 66.7(9.6) 41.67(6.17) 75.5(4)

5.1(3.2) 2.9(1.1)

14.6 (3.2) 13(2)

et al. 2001). However, with increasing age oxygen cost has been shown to decrease for typically developing populations (DeJaeger et al. 2001). In addition, a linear relationship between the body surface area inverse and oxygen cost exists (Bowen et al. 1998b). These findings demonstrate that as children grow their oxygen cost decreases with both age and with increasing body size. When assessing oxygen cost in children over time or pre/post intervention changes due to growth should be taken into account. The oxygen cost of children with CP during walking at different speeds and across different GMFCS levels is presented in Table 2. In typically developing children, only a small increase in oxygen cost is seen while walking at a selfselected speed compared to resting oxygen cost. However, children with CP demonstrate significant increases in oxygen cost when walking at a self-selected speeds (Norman et al. 2004). Between day variability of VO2 measurements was found to be similar between typically developing children and children with CP (variability of 13% for children with CP and 14% for typically developing children) (Bowen et al. 1998a). For children diagnosed with CP, oxygen cost when walking has been reported to be two to three times greater than able bodied children’s oxygen cost (Thomas et al. 2004). Children with diplegic CP have double the exercise oxygen cost of children with hemiplegia; 0.56(0.18) ml/kg/min and 0.21(0.01)ml/kg/min respectively (Rose et al. 1990). At self-selected walking speeds, children with CP have a net oxygen cost of 0.18 ml/kg/m (Kerr et al. 2008). However, when stratified by GMFCS levels,

13 13 10 57

CP

CP

CP

Level I

Level I

Level II

Kerr et al. (2008)

Kerr et al. (2008)

91

57

3 10 13

Rose et al. (1990) Rose et al. (1990) Keefer et al. (2004) Keefer et al. (2004) Keefer et al. (2004) Norman et al. (2004) Kerr et al. (2008)

# of subjects 13

Subjects diagnosis CP (3 hemiplegics, 10 diplegics) Hemiplegics Diplegics CP

Study Rose et al. (1990)

Table 2 Oxygen cost when walking for children with CP

12.06 (0.41) 12.05 (.41) 9.93 (0.37)

12.8(2.9)

11.2 (3)

11.2 (3)

11.2 (3)

Age (years) 11.2 (7–16)

Self-selected

Self-selected

Self-selected

41.3(13.4)

67.2

53.4

40.2

Walking speed (m/min) 46.0(11.4)

0.18(0.1)

0.55(38)

0.25(0.20–0.29)

0.25(0.19–0.31)

0.21(0.01) 0.56(0.18) 0.28(0.21–0.36)

Exercising O2 cost (ml/kg/m) 0.43(0.22)

0.17(0.07)

0.15(0.04)

0.15(0.04)

0.16(0.12–0.23)

0.15(0.11–0.22)

0.15(0.07–0.22)

Exercising NET O2 cost (ml/kg/m)

8 H. White et al.

179 134 106

CP ALL

Hemiplegics

Diplegics

Level I

Level II

Level III

Kerr et al. (2008)

Kerr et al. (2008)

Kerr et al. (2008)

Oeffinger et al. (2004) Oeffinger et al. (2004) Oeffinger et al. (2004)

84

94

184

14

Level III

Kerr et al. (2008)

22

Level III

Kerr et al. (2008)

11.17 (4.25) 10.58 (4.17)

10.27 (.89) 11.75 (0.98) 10.76 (3/61) 10.62 (3.49) 10.89 (3.70) 10.75(4)

Self-selected

Self-selected

SeIf-selected

Self-selected

Self-selected

Self-selected

Self-selected

Self-selected

0.78 (0.23–2.5)

0.47(0.06–1.1)

0.37 (0.11–1.4)

0.24(0.12)

0.14(0.04)

0.18(0.1)

0.36(0.27)

0.31(0.12)

Oxygen Consumption in Cerebral Palsy 9

10

H. White et al.

net oxygen cost rises as GMFCM levels rise; Level I 0.15(0.04)ml/kg/m, Level II 0.17(0.07)ml/kg/m, Level III 0.31(0.12)ml/kg/m and Level IV 0.36(0.27)ml/kg/m. The minimum clinically important difference (MCID) is the magnitude of change of a measure that is required for a patient to perceive as beneficial (Cook, 2008). The MCID is based on the effect size of the measure. The MCID of large (0.8) effect size for Oxygen cost (ml O2/kg/min) has been reported to be 0.06, 0.17 and 0.09 for GMFCS levels I, II, and III receptively (Oeffinger et al. 2008).

GMFCS Levels, GMFM Relationships with Oxygen Cost and VO2 Energy cost, oxygen cost and oxygen consumption when walking increased with increases in GMFCS level (Kamp et al. 2014). These differences are believed to be due in part to the increased severity of impairments (muscle tone/spasticity, co-contractures and increased gait abnormalities) (Johnston et al. 2004). Children with CP (GMFCS level I) demonstrated similar mean oxygen cost to able-bodied children (0.25 ml/kg/m) (Kamp et al. 2014). However, large inter-individual differences of energy cost were noted within GMFCS levels (Kamp et al. 2014). The gross motor functional measure (GMFM) is a standardized objective measure of gross mobility for children diagnosed with CP (Russell et al. 1993). A higher score on the GMFM test was associated with lower energy cost when walking (Kamp et al. 2014). The percent score of section D of GMFM had the strongest inverse association with energy cost when walking. In 15 young adults (mean age 29 years) diagnosed with CP (GMFCS levels I and II), oxygen cost when walking had a strong inverse association with GMFM scores (Sections D and E[r = -0.57 and -0.066]) (Maltais et al. 2012). This is because both GMFM and energy cost both measure impairments of the musculoskeletal system. One study assessed the relationship between oxygen cost when walking and activity limitations and participation restrictions for 184 children diagnosed with CP stratified across GMFCS levels (GMFCS level I = 57; II = 91, III = 22; IV = 14) (Kerr et al. 2008). Net oxygen cost was measured with the Cosmed K4b2 device (Cosmed, Rome, Italy) while subjects walked at their self-selected walking speed. Net oxygen cost data was skewed, therefore the data had to be transformed to log net oxygen cost. The PEDI questionnaire assesses child’s functional abilities and is similar to PODCI. Moderate correlations were found between log net oxygen cost and GMFM scores (r = -0.61), and with log net oxygen cost and the mobility section of Pediatric Evaluation of Disability Inventory (PEDI) (r = -0.28). The LAQ-CP questionnaire assesses the effect the child’s disability has on child and family. No relationship was found between log net oxygen cost and Lifestyle Assessment Questionnaire for Cerebral Palsy (LAQ-CP).

Indirect Measures of VO2 Mechanical Energy Until the mid-1990s assessing movement and energy expenditure was limited to cardiopulmonary indices, oxygen consumption, and gait kinematic analyses (Jones

Oxygen Consumption in Cerebral Palsy

11

and McLaughlin 1993). These techniques were limited by “their inability to control work-load with enough precision to allow comparisons over time or to estimate the relative inefficiency of a disabled individual compared with the able-bodied population.” (Jones and McLaughlin 1993) With the onset of full body gait kinematic and kinetic assessments, this list increased to include calculated mechanical energy expenditure of the lower extremities during gait. Mechanical energy expenditure is calculated by summing the integrals of the rectified power curves of the lower extremities kinetic profiles (Augsburger and Tylkowski 2000; Van de Walle et al. 2012a). This approximation does not include the energy expenditure of the upper extremities and the trunk, or energy spent and lost in the form of heat, but serves as a reasonable approximation of the mechanical work performed during gait by the subject. It does not include the energy spent by co-contractures or spastic muscles. It is simply a measure of the actual mechanical energy output of the lower extremities. It can, however, be compared with the oxygen consumption to give an estimation of the efficiency of the gait, and is an indirect measure of co-contractures and/or spastic muscles that do not result in mechanical output.

Relationship between Mechanical Energy and VO2 For children diagnosed with CP, the magnitude of impairments (spasticity, weakness, co-contracture of muscles) can be approximated by comparing a patient’s volumetric oxygen consumption (VO2) and gait mechanical energy (Eg) with able-bodied subjects; graphically represented by VO2 versus velocity squared(V2) and by comparing VO2 with Eg (Augsburger and Tylkowski 2000). A linear relationship (R2 0.938–0.999) between VO2 and velocity square has previously been reported (Augsburger and Tylkowski 2000). The slope of this curve is an indication of an individual’s differences in metabolic rate. In addition, Eg versus V2 also has a linear relationship (R2 = 0.961) and these two curves can be combined to produce a graphic representation of VO2 versus Eg. For able-bodied persons who walk with a simulated crouched gait pattern and patients diagnosed with CP who naturally walked with a crouched gait pattern (knee flexion 25 –35 ) both demonstrate an increased energy cost when walking (Augsburger and Tylkowski 2000). When plotting VO2 versus V2 both groups demonstrate similar increases in VO2. However, when plotting VO2 versus Eg the able-bodied subjects demonstrating simulated crouch to the subjects with CP who naturally walked in a crouched gait pattern differences were noted. Specifically, ablebodied subjects demonstrated increases in VO2 and Eg. However, subjects with CP demonstrated increases in VO2 without increasing Eg. Because some subjects with CP demonstrated increased VO2 and decreased Eg and others demonstrated increased both VO2 and Eg it was proposed the these differences were not due to just walking in a crouched posture but increased muscle tone/spasticity and increased co-contractures of the lower extremity musculature (Augsburger and Tylkowski 2000).

12

H. White et al.

This increase in energy cost when walking could be due to decreased efficiency of the muscles or increased total work performed by the muscles. The increased muscle work could be due to increased step frequency, muscle co-contraction, kinematic abnormalities, or increased muscle tone/spasticity (Augsburger and Tylkowski 2000; Norman et al. 2004; Stallings et al. 1996; van den Hecke et al. 2007). The mechanical work and metabolic cost of walking was assessed in 20 children with CP, spastic hemiplegia and 6 able-bodied children (van den Hecke et al. 2007). Total work was calculated from external work needed to move the center of mass of the body (calculated from ground reaction forces) and internal work forces needed to move the body segments relative to the center of mass based on a summation of the rotational and translational kinetic energy of each segment. Results of the study indicated the increased cost was due to increased mechanical work performed by the muscles and not decrease efficiency (van den Hecke et al. 2007). Therefore children with CP use a larger proportion of their aerobic capacity resulting in decreased endurance when walking. The net joint moments are a summation of the mechanical energy absorbed, generated and transferred between body segments when walking (Umberger et al. 2013). Disruption of this mechanical energy transfer is believed to be the cause of increase oxygen cost when walking for children diagnosed with CP and other disorders (Umberger et al. 2013). A seven segment dynamic model of the whole body was developed by comparing the center of mass acceleration from force plate data and joint moment and power data calculated from inverse dynamics. Results indicated the ankle joint moments generated and transferred the largest magnitude of energy during late single limb stance and weight release. While the hip joint moments transfer similar magnitude of energy to the ipsilateral limb, contralateral limb and the head, arms and trunk (HAT) from 30% to 70% of the gait cycle. The authors concluded lower extremity extensor moments primarily cause energy flow into the HAT and flexor moments cause energy flow out of the HAT (Umberger et al. 2013). It is the disruption of the transfer of mechanical energy between the ipsilateral limb, contralateral limb and the (HAT) segment that results in less efficient gait. The disruption of transfer of energy could result in increased metabolic cost in walking because increased energy by the muscle would be required. This inefficient transfer of energy is also supported by a prospective study of children with CP, spastic diplegia (GMFCS level I and II) (n = 18) to able-bodied children (n = 25) (Van de Walle et al. 2012b). Mechanical work calculated from integration of joint powers for children with CP was found to be 1.5 times higher than able-bodied children (Van de Walle et al. 2012b). The increase in work was primarily due to increased movements of the trunk and head and not due to increased movements of the arms. The integrated power approach first obtains a summation of positive and negative net joint power for upper extremity (shoulder, elbow, wrist), lower extremities (ankle, knee, hip) and for the neck. These net joint powers are summed to provide a net joint work for the entire body (Van de Walle et al. 2012a). The integrated power approach has been validated against oxygen consumption for children with CP (Van de Walle et al. 2012a). The total net joint work and net oxygen cost when walking for children with CP was significantly higher than able-bodied

Oxygen Consumption in Cerebral Palsy

13

children and able-bodied adults. The total net joint work, which included calculations of head trunk and arms and not just the lower extremity, was significantly correlated with the oxygen cost of walking (Van de Walle et al. 2012a). What about children that cannot walk the 3–10 min required for obtaining a steady state of oxygen consumption? What about children who refuse to wear the equipment required for breath by breath oxygen consumption testing? Additionally, most clinicians do not have access to directly measure oxygen cost. Therefore, other indirect measures of energy cost have been reported for children with CP (Keefer et al. 2004; McDowell et al. 2005). These indirect measures of VO2 can be performed utilizing variables like heart rate or distance measured during submaximal exercise (Verschuren et al. 2010).

Shuttle Run/Ride Heart rate during a shuttle run test can be used to indirectly measure aerobic capacity in children with CP (GMFCS levels I–II) (Verschuren et al. 2010). This study provided percentile curves for comparison across different ages and heights (Verschuren et al. 2010). As age and height increased so did the number of shuttles performed (Verschuren et al. 2010). Similarly, assessing VO2 indirectly using HR during a shuttle ride test for 23 children with CP that use manual wheelchairs has been reported (Verschuren et al. 2013). The shuttle ride test was compared to VO2 directly measured while using an arm ergometer (Verschuren et al. 2013). The shuttle ride test was reliable (ICC = 0.99) for all VO2 variables. The shuttle ride test also produced VO2max vales of 26(5) ml/kg/min, which is similar to that measured using an arm ergometer 25.3(5.7) ml/kg/min. Stair Climbing Stair climbing can be used to measure mechanical efficiency in children with CP, using body weight and step height (kg/m) to measure the work done when climbing stairs. Typically developing children performed significantly more stair ascents (34.8[2.7] v. 6.4[7.7]), had approximately 2–3 ml/kg/min more VO2 and had increased gross mechanical energy (20% vs. 4%) when climbing stairs as compared to children with CP (GMFCS II-IV) (Bar-Haim et al. 2004). Stair climbing detected changes in post exercise training with children performing on average 1.3 more stair ascents and increasing gross mechanical efficiency by 2%(Bar-Haim et al. 2004). 6-Minute Walk Test For children with CP who cannot tolerate treadmill or cycle testing, waking on level ground during the 6-min walk test (6-mwt) can be used. The 6-mwt is a submaximal, indirect measure that uses distance walked as a predictor of aerobic capacity (Maher et al. 2008; Thompson et al. 2008). This test may be preferable for children with CP (GMFCS I–III) as it reflects activities of daily living and the ability to walk community distances (Thomas et al. 2004). In addition, a walking test on level ground is less expensive, simple and safer than traditional measures that require testing equipment (Nsenga Leunkeu et al. 2012).

14

H. White et al.

The 6-mwt has a reported ICC value of 0.98 overall for children 4–18 years of age with CP, GMFCS levels I-III (Maher et al. 2008; Thompson et al. 2008). Reported distances walked stratified by GMFCS level were 486.6(84.4)m for Level I, 312.9 (77)m for Level II and 240.2(121.1)m for Level III. The validity of the 6-mwt, with simultaneous direct collection of VO2, in children with CP, GMFCS levels I and II, as compared to a direct measure during cycle ergometery has been reported (Nsenga Leunkeu et al. 2012). For this study VO2 and peak HR results for the 6-mwt were 33.1(7.1)ml/kg/min 156.6(22.4)bpm and the cycle tests were 32.3(6.3)ml/kg/min) and 148.4(25.1)bpm. The 6-mwt was found to be valid and reliable for children with CP when measuring aerobic capacity during the 6-mwt (Nsenga Leunkeu et al. 2012).

One-Minute Walk Test An alternative to the 6 min walk test is the 1-min walk test which measures the distance walked in 1 min when a child walks at their maximum speed. The distance walked in 1 min was found to be inversely related to GMFM scores (section D r = 0.910, section E r = 0.872) and a measure of walking endurance (McDowell et al. 2005). Additionally, the average distance walked decreased with an increase in GMFCS levels (Level I = 100[12]m, Level II = 83[17]m, Level III 56[17]m, Level IV 19[7]m). A large prospective multi-center study also found decrease in walking distance with increase in GMFCS levels and reported minimum clinically important differences to be 5.1 – 9.0(m) for GMFCS levels I and II and 3.86.3 (m) for GMFCS levels III (Hassani et al. 2014). Additionally, the 600-Yard Walk-Run Test is a standardized physical fitness test that is reported to be highly correlated (r = 0.80) with VO2 in children with intellectual disabilities (Kerr et al. 2008). For children with CP, increasing GMFCS levels reported an increase in the time to complete the 600-Yard test (Mattsson and Andersson 1997). Energy Expenditure Index (EEI) Another indirect measure of VO2 uses changes in heart rate (HR) to measure the HRResting HR energy expenditure index (EEI), which is expressed as EEI ¼ Exercise walking Speed (beats/m) (Keefer et al. 2004). This measure is also known as the Physiological Cost HRResting HR Index (PCI) defined as PCI ¼ Finalwalking (beats/m) (Raja et al. 2007). Similar Speed to oxygen cost, EEI and PCI are a measure of the economy of walking, with a high values indicating poor walking economy (Rose et al. 1990; Provost et al. 2007). EEI and PCI use the HR response during walking/exercising to assess the energy cost and it is related to the change in velocity (Provost et al. 2007). EEI demonstrates a good correlation (r = 0.61) with oxygen consumption for children with cerebral palsy when walking at self-selected speeds (Norman et al. 2004). Furthermore, PCI was calculated for specified distances (50, 100 and 150 m) at self-selected pace in children with CP (Raja et al. 2007). The PCI when walking 50 m is reproducible (ICC 0.80–0.88) for able-bodied children and children diagnosed with CP. The PCI for able-bodied children (n = 100) increased from 0.10 beat/m to 0.11 beats/m when

Oxygen Consumption in Cerebral Palsy

15

walking on uneven ground and increased from 0.58 beats/m to 0.86 beats/m for children with CP (n = 100). Children with CP (spastic diplegia GMFCS level not reported) demonstrate a threefold increase in EEI compared to peers, and EEI is less in hemiplegic CP versus diplegic CP (Rose et al. 1990). This estimation of VO2 with HR can be achieved due to the linear relationship between HR and VO2 during exercise (Rose et al. 1990). Heart rate while walking at self-selected walking speed demonstrates good repeatability (ICC = 0.81–0.96) with minimal detectable change 9.2–14.2 beats per minute for able-bodied adults (Darter et al. 2013). However, variable results have been reported when comparing EEI and VO2 for children with CP walking at different speeds. One study reported a linear relationship (r = 0.84) between gross VO2 and HR for children with CP (GMFCS not reported) walking at different speeds (Rose et al. 1991). While another reported no relationship between gross EEI and gross VO2 (Keefer et al. 2004), however they did report a low linear relationship between net EEI and net VO2 (r = 0.50–0.64) across different walking speeds (GMFCSI-II). Furthermore, previous research reports that only 38% of subjects with CP (GMFCS I-II) demonstrate decreases in net VO2 and EEI with increases in walking speed (Keefer et al. 2004). The reminder of subjects demonstrated variable changes in net VO2 and EEI (Keefer et al. 2004). Because similar changes in EEI and net VO2 were not demonstrated by all participants, the authors concluded EEI is not a valid estimate of VO2 (Keefer et al. 2004). In another study, 10 children diagnosed with CP (GMFCS I–III) and 15 able-bodied children were assessed using a portable metabolic system when walking at their self-selected walking speed (Norman et al. 2004). A stronger correlation was found between EEI and oxygen consumption index for children with CP (GMFCS I-III) (r = 0.61) as compared to able-bodied children (r = 0.40) (Norman et al. 2004). A possible reason for the different results is that heart rate can be affected by numerous factors, including stress, anxiety, anticipation of exercise, other impairments and medications used by participants.

Interventions That Affect VO2 Measures Surgery Single-Event Multilevel Surgery (SEMLS) Effects on VO2 Single-event multilevel surgery (SEMLS) is often performed to improve gait for children diagnosed with CP (Marconi et al. 2014). Instrumented gait analysis using three-dimensional motion capture is recommended for developing appropriate treatment plan for ambulatory children diagnosed with CP (Marconi et al. 2014; Thomason et al. 2012). Instrumented gait analysis provides detailed information regarding joint motions, moments and powers resulting in a summary of the biomechanics of walking. However, it does not provide a measure of the energy cost of walking (Marconi et al. 2014).

16

H. White et al.

A systematic review of SEMLS literature for children with CP (GMFCS I-IV) reported overall kinematics, kinetics and energy efficiency of walking improved after SEMLS for children diagnosed with CP (McGinley et al. 2012). Additionally, energy, oxygen cost and oxygen consumption when walking increased with increases in GMFCS level (Kamp et al. 2014). In this study the children with CP (GMFCS level I) demonstrated similar mean oxygen cost to able-bodied children (0.25 ml/kg/m) (Kamp et al. 2014). However large inter-individual differences of energy cost were noted within each GMFCS level. Energy cost when walking is greater in children diagnosed with CP compared to typically developing children. This is believed to be due to increased mechanical work and exaggerated displacement of their center of mass when walking (Marconi et al. 2014). In a small sample (n = 10) of children with CP (GMFCS I–III) that were assessed before and after SEMLS, improvements in hip and knee motions, moments and powers were seen (Marconi et al. 2014). However, the children’s mechanical work (calculated from ground reaction forces, movement of center of mass and rotational and translational energy calculated for each segment) was not significantly different before and after surgery. Despite no change in mechanical work, oxygen cost when walking did significantly decreased (Marconi et al. 2014). Therefore, the authors proposed the decrease in energy cost was due to decrease in energy consumption needed to maintain upright posture when walking (Marconi et al. 2014). The oxygen cost of walking in patients who underwent femoral derotational osteotomies as part of SEMLS was compared to subjects who underwent SEMLS without femoral derotational osteotomies (McMulkin et al. 2015). The results of this study show significant decreases in net oxygen cost (0.03 ml O2/kg/m) for both groups for GMFCS levels I and II but a non-significant decrease (.05 ml O2/kg/m) for GMFCS level III. However, sample size for level III participants was 10 compared to 70 participants who were classified GMFCS levels I & II (McMulkin et al. 2015). In another study, the PCI in 35 children with CP (GMFCS levels not reported) decreased from 1.13 beats/m to 0.83 beats/m after undergoing SEMLS surgery ( p < 0.01) (Raja et al. 2007).

Rhizotomy Effects on VO2 For children diagnosed with CP, muscle spasticity is proposed to be due to damage to the motor cortex, which results in decreased cortical input to the corticospinal tract (Valle et al. 2007). A decrease in descending input to the spinal interneuron pool can result in increased activity of the gamma and alpha motor neurons (Valle et al. 2007; Steinbok 2007; Verrotti et al. 2006). The end result of this increased activity is muscle spasticity. Input to the spinal interneuron pool via the afferent nerves in the dorsal roots have a net excitatory effect on the efferent nerves output via the alpha motor neuron (Steinbok 2007). Therefore when a dorsal Rhizotomy surgery is performed (cutting 50%–70% of dorsal sensory roots L1-S2) the result is a decrease in the excitability of alpha motor neuron and decrease in muscle spasticity (Steinbok 2007). Results of a dorsal Rhizotomy are a permanent reduction in muscle spasticity with improvements in gait and other gross motor skills (Carraro et al. 2014). Except for excluding patients with dystonia, to date there is still no uniform criteria for

Oxygen Consumption in Cerebral Palsy

17

patient selection for dorsal Rhizotomy (Carraro et al. 2014). Two studies with small sample size reported children diagnosed with CP (GMFCS I–III) demonstrated significant decreases in stance phase knee flexion (crouch) and a trends of decreases in heart rate and oxygen consumption when walking 1 year after undergoing a dorsal Rhizotomy (Chan et al. 2008; Carraro et al. 2014).

Therapy Effects on VO2 Children classified as GMFCS levels III and IV must use an assistive device (walker or crutches) to ambulate. The two most common types of walkers are forward (walker in front of child) or reverse (walker behind child). Reverse walkers are supposed to encourage more upright trunk posture and less crouched gait patterns for children with CP (Park et al. 2001). One study assessed the differences between anterior and posterior walkers for ten children diagnosed with CP (GMFCS Level III) who used each type of walker (randomly assigned) for 1 month and then the used other type. A decrease in average oxygen cost and oxygen consumption was found when using a posterior walker compared to an anterior walker (Park et al. 2001). Therefore, children demonstrated a less crouched gait pattern and a more energy efficient gait when using a posterior walker compared to an anterior walker. Conversely, an older study assessed ten children with CP (GMFCS Level III) using both type of walkers (randomly assigned) on the same day and reported no difference in oxygen cost between the two walker types (Mattsson and Andersson 1997). Regardless of age, physical activity of persons with CP is reportedly lower than able-bodied person. A decrease in physical activity may increase the risk of cardiovascular and cardiopulmonary impairments. Therefore, it is important to understand why persons with CP demonstrate decreases in physical activity compared to their peers (Fowler et al. 2007). Previous research has demonstrated that children with CP have decreased aerobic capacity and increased energy cost when walking compared to typically developing children (Rose et al. 1990). A decrease in physical activity can lead to decreases in aerobic capacity which can result in the cyclical problem of a decreased aerobic capacity leading to a decrease in participation in physical activities. Short term benefits from cardiopulmonary and strength training has been documented for persons with CP. A randomized controlled trial was performed to assess the effects of cardiopulmonary and strength training for adolescents and young children diagnosed with CP (GMFCS I–IV) (Slaman et al. 2014). The control group continued regular care (approximately 2 h of traditional therapy), while the treatment group received specialized cardiopulmonary and strength training. Body composition, muscle strength and cardiopulmonary fitness were assessed at four intervals: prior to randomization, 3, 6 and 12 months after initial evaluation. The treatment group demonstrated 10%–30% increases in cardiopulmonary fitness at 6 months, while the control group was unchanged. Additionally, decreases were seen in skin fold thickness, blood pressure and total cholesterol for the treatment group after 1 year (Slaman et al. 2014).

18

H. White et al.

A 32% decrease in the EEI after body weight supported treadmill training has been reported (Provost et al. 2007). A 9% decrease in EEI after a 9 week circuit training and aerobic endurance training program (Gorter et al. 2009). Additionally, another study found that the distance walked during the 6-mwt significantly improved after an 8 week walking exercise program (Nsenga Leunkeu et al. 2012). Aerobic exercise in two additional studies reported improvements in VO2 for children with CP using stationary bikes or arm-ergometers (Fowler et al. 2007; Slaman et al. 2013). One study assessed adults (mean age 36  6 years) with CP (GMFCS levels I–III) who participated in a prospective study assessing oxygen consumption while walking at self-selected speed, peak oxygen consumption during progressive stress test (cycle ergometer), and daily walking time (activity monitor wore for 48 h on weekdays) (Slaman et al. 2013). There was not a relationship between daily walking time and peak oxygen consumption or walking consumption when walking. However, there was a significant negative relationship between physical strain while walking and total daily walking time. Walking speed for participants with CP was 32% lower and physical strain of participants with CP was twice that of able-bodied subjects. Therefore adults with CP used a larger proportion of their metabolic reserve for walking compared to able-bodied person resulting in them walking less with the total daily walking time averaged 1 h and 24 min (Slaman et al. 2013).

Conclusions Regardless of age, children, adolescents and adults diagnosed with CP have decreased physical activity and increased energy, oxygen cost and oxygen consumption (measured with direct and indirect methods) when walking, compared to ablebodied persons. Decreases in physical activity may increase the risk of cardiovascular and cardiopulmonary compromise in children and adults diagnosed with CP and these impairments may contribute to decreases in physical activity. This can cause capacity and participation to become circular problems, with capacity affecting participation and vice versa. However, surgical interventions (SEMLS, and Rhizotomy) and therapy have been reported to increase walking distances and decrease energy expended when walking for persons diagnosed with CP.

References Geoffrey E, Moore J, Larry Durstine, Patricia L, Painter American College of Sports Medicine ACMS’s exercise management for persons with chronic diseases and disabilities/American College of Sports Medicine. Champaign, IL: Human Kinetics; 2016 Walter R, Thompson, Neil F, Gordon Linda S, Pescatello, American College of Sports Medicine ACSM’s guidelines for exercise testing and prescription/American College of Sports Medicine. Philadelphia: Wolters Kluwer Health/Lippincott Williams and Wilkins; 2014. p 480 Augsburger S, Tylkowski C (2000) A comparison of volumetric oxygen consumption to gait mechanical energy in normal and pathological gait. In: Harris GF, Smith PA (eds) A new

Oxygen Consumption in Cerebral Palsy

19

millennium in clinical care and motion analysis technology. Institute of Electical and Electronics Engineers, Piscataway, pp 109–115 Bar-Haim S, Belokopytov M, Harries N, Frank A (2004) A stair-climbing test for ambulatory assessment of children with cerebral palsy. Gait Posture 20:183–188 Battley E (1995) The advantages and disadvantages of direct and indirect calorimetry. Thermochem Acta 250:337–352 Bowen TR, Lennon N, Castagno P et al (1998a) Variability of energy-consumption measures in children with cerebral palsy. J Pediatr Orthop 18:738–742 Bowen TR, Cooley SR, Castagno PW et al (1998b) A method for normalization of oxygen cost and consumption in normal children while walking. J Pediatr Orthop 18:589–593 Carraro E, Zeme S, Ticcinelli V et al (2014) Multidimensional outcome measure of selective dorsal rhizotomy in spastic cerebral palsy. Eur J Paediatr Neurol 18:704–713 Chan SH, Yam KY, Yiu-Lau BP et al (2008) Selective dorsal rhizotomy in Hong Kong: multidimensional outcome measures. Pediatr Neurol 39:22–32 Cook CE, (2008) Clinimetrics Coroner: The Minimal Clinically Important Change Score (MCID): A Necessary Pretense. J Man Manip Ther 16(4): E82–E83 Darter BJ, Rodriguez KM, Wilken JM (2013) Test-retest reliability and minimum detectable change using the K4b2: oxygen consumption, gait efficiency, and heart rate for healthy adults during submaximal walking. Res Q Exerc Sport 84:223–231 DeJaeger D, Willems PA, Heglund NC (2001) The energy cost of walking in children. Pflugers Arch 441:538–543 Fowler EG, Kolobe TH, Damiano DL et al (2007) Promotion of physical fitness and prevention of secondary conditions for children with cerebral palsy: section on pediatrics research summit proceedings. Phys Ther 87:1495–1510 Gorter H, Holty L, Rameckers EE et al (2009) Changes in endurance and walking ability through functional physical training in children with cerebral palsy. Pediatr Phys Ther 21:31–37 Hassani S, Krzak JJ, Johnson B et al (2014) One-Minute Walk and modified Timed Up and Go tests in children with cerebral palsy: performance and minimum clinically important differences. Dev Med Child Neurol 56:482–489 Johnston TE, Moore SE, Quinn LT, Smith BT (2004) Energy cost of walking in children with cerebral palsy: relation to the Gross Motor Function Classification System. Dev Med Child Neurol 46:34–38 Jones J, McLaughlin JF (1993) Mechanical efficiency of children with spastic cerebral palsy. Dev Med Child Neurol 35:614–620 Kamp FA, Lennon N, Holmes L et al (2014) Energy cost of walking in children with spastic cerebral palsy: relationship with age, body composition and mobility capacity. Gait Posture 40:209–214 Kay RM, Rethlefsen SA, Kelly JP, Wren TA (2004) Predictive value of the Duncan-Ely test in distal rectus femoris transfer. J Pediatr Orthop 24:59–62 Keefer DJ, Tseh W, Caputo JL et al (2004) Comparison of direct and indirect measures of walking energy expenditure in children with hemiplegic cerebral palsy. Dev Med Child Neurol 46:320–324 Kerr C, Parkes J, Stevenson M et al (2008) Energy efficiency in gait, activity, participation, and health status in children with cerebral palsy. Dev Med Child Neurol 50:204–210 Maher CA, Williams MT, Olds TS (2008) The six-minute walk test for children with cerebral palsy. Int J Rehabil Res 31:185–188 Maltais DB, Robitaille NM, Dumas F et al (2012) Measuring steady-state oxygen uptake during the 6-min walk test in adults with cerebral palsy: feasibility and construct validity. Int J Rehabil Res 35:181–183 Marconi V, Hachez H, Renders A et al (2014) Mechanical work and energy consumption in children with cerebral palsy after single-event multilevel surgery. Gait Posture 40:633–639 Mattsson E, Andersson C (1997) Oxygen cost, walking speed, and perceived exertion in children with cerebral palsy when walking with anterior and posterior walkers. Dev Med Child Neurol 39:671–676

20

H. White et al.

McDowell BC, Kerr C, Parkes J, Cosgrove A (2005) Validity of a 1 minute walk test for children with cerebral palsy. Dev Med Child Neurol 47:744–748 McGinley JL, Dobson F, Ganeshalingam R et al (2012) Single-event multilevel surgery for children with cerebral palsy: a systematic review. Dev Med Child Neurol 54:117–128 McMulkin ML, Gordon AB, Caskey PM et al (2015) Outcomes of orthopaedic surgery with and without an external femoral derotational osteotomy in children with cerebral palsy. J Pediatr Orthop Gait and Posture 41:608–612 Morris C (2007) Definition and classification of cerebral palsy: a historical perspective. Dev Med Child Neurol Suppl 109:3–7 Nene AV, Evans GA, Patrick JH (1993) Simultaneous multiple operations for spastic diplegia. Outcome and functional assessment of walking in 18 patients. J Bone Joint Surg Br 75:488–494 Norman JF, Bossman S, Gardner P, Moen C (2004) Comparison of the energy expenditure index and oxygen consumption index during self-paced walking in children with spastic diplegia cerebral palsy and children without physical disabilities. Pediatr Phys Ther 16:206–211 Nsenga Leunkeu A, Shephard RJ, Ahmaidi S (2012) Six-minute walk test in children with cerebral palsy gross motor function classification system levels I and II: reproducibility, validity, and training effects. Arch Phys Med Rehabil 93:2333–2339 Oeffinger DJ, Tylkowski CM, Rayens MK, Davis RF, Gorton GE, D’Astous J, Nicholson DE, Damiano DL, Abel MF, BAgley AM, Luan J (2004) Developmental Medicine and Child Neurology 46:311–319 Oeffinger D, Bagley A, Rogers S, Gorton G, Kryscio R, Abel M, Damiano D, Barnes D, Tylkowski C (2008) Outcome tools used for ambulatory children with cerebral palsy: responsiveness and minimum clinically important differences. Dev Med Child Neurol 50(12): 918–925 Palisano RJ, Hanna SE, Rosenbaum PL et al (2000) Validation of a model of gross motor function for children with cerebral palsy. Phys Ther 80:974–985 Park ES, Park CI, Kim JY (2001) Comparison of anterior and posterior walkers with respect to gait parameters and energy expenditure of children with spastic diplegic cerebral palsy. Yonsei Med J 42:180–184 Potter CR, Unnithan VB (2005) Interpretation and implementation of oxygen uptake kinetics studies in children with spastic cerebral palsy. Dev Med Child Neurol 47:353–357 Powers S, Howley E (2007) Exercise physiology: theory and application to fitness and performance. McGraw Hill, New York Provost B, Dieruf K, Burtner PA et al (2007) Endurance and gait in children with cerebral palsy after intensive body weight-supported treadmill training. Pediatr Phys Ther 19:2–10 Raja K, Joseph B, Benjamin S et al (2007) Physiological cost index in cerebral palsy: its role in evaluating the efficiency of ambulation. J Pediatr Orthop 27:130–136 Rodda JM, Graham HK, Nattrass GR et al (2006) Correction of severe crouch gait in patients with spastic diplegia with use of multilevel orthopaedic surgery. J Bone Joint Surg Am 88:2653–2664 Rose J, Gamble JG, Medeiros J et al (1989) Energy cost of walking in normal children and in those with cerebral palsy: comparison of heart rate and oxygen uptake. J Pediatr Orthop 9:276–279 Rose J, Gamble JG, Burgos A et al (1990) Energy expenditure index of walking for normal children and for children with cerebral palsy. Dev Med Child Neurol 32:333–340 Rose J, Gamble JG, Lee J et al (1991) The energy expenditure index: a method to quantitate and compare walking energy expenditure for children and adolescents. J Pediatr Orthop 11:571–578 Russell D, Rosenbaum P, Gowland C et al (1993) Gross motor function measure manual. McMaster University, Hamilton Scholtes VA, Becher JG, Beelen A, Lankhorst GJ (2006) Clinical assessment of spasticity in children with cerebral palsy: a critical review of available instruments. Dev Med Child Neurol 48:64–73 Slaman J, Bussmann J, van der Slot WM et al (2013) Physical strain of walking relates to activity level in adults with cerebral palsy. Arch Phys Med Rehabil 94:896–901

Oxygen Consumption in Cerebral Palsy

21

Slaman J, Roebroeck M, van der Slot W et al (2014) Can a lifestyle intervention improve physical fitness in adolescents and young adults with spastic cerebral palsy? A randomized controlled trial. Arch Phys Med Rehabil 95:1646–1655 Stallings VA, Zemel BS, Davies JC et al (1996) Energy expenditure of children and adolescents with severe disabilities: a cerebral palsy model. Am J Clin Nutr 64:627–634 Steinbok P (2007) Selective dorsal rhizotomy for spastic cerebral palsy: a review. Childs Nerv Syst 23:981–990 Thomas SS, Buckon CE, Piatt JH et al (2004) A 2-year follow-up of outcomes following orthopedic surgery or selective dorsal rhizotomy in children with spastic diplegia. J Pediatr Orthop B 13:358–366 Thomason P, Rodda J, Sangeux M et al (2012) Management of children with ambulatory cerebral palsy: an evidence-based review. Commentary by Hugh Williamson Gait Laboratory staff. J Pediatr Orthop 32(Suppl 2):S182–S186 Thompson P, Beath T, Bell J et al (2008) Test-retest reliability of the 10-metre fast walk test and 6-minute walk test in ambulatory school-aged children with cerebral palsy. Dev Med Child Neurol 50:370–376 Umberger BR, Augsburger S, Resig J et al (2013) Generation, absorption, and transfer of mechanical energy during walking in children. Med Eng Phys 35:644–651 Unnithan VB, Dowling JJ, Frost G, Bar-Or O (1996) Role of cocontraction in the O2 cost of walking in children with cerebral palsy. Med Sci Sports Exerc 28:1498–1504 Valle AC, Dionisio K, Pitskel NB et al (2007) Low and high frequency repetitive transcranial magnetic stimulation for the treatment of spasticity. Dev Med Child Neurol 49:534–538 Van de Walle P, Hallemans A, Schwartz M et al (2012a) Mechanical energy estimation during walking: validity and sensitivity in typical gait and in children with cerebral palsy. Gait Posture 35:231–237 Van de Walle P, Hallemans A, Truijen S et al (2012b) Increased mechanical cost of walking in children with diplegia: the role of the passenger unit cannot be neglected. Res Dev Disabil 33:1996–2003 van den Hecke A, Malghem C, Renders A et al (2007) Mechanical work, energetic cost, and gait efficiency in children with cerebral palsy. J Pediatr Orthop 27:643–647 Vargus-Adams JN, Majnemer A (2014) International Classification of Functioning, Disability and Health (ICF) as a framework for change: revolutionizing rehabilitation. J Child Neurol 29:1030–1035 Verrotti A, Greco R, Spalice A et al (2006) Pharmacotherapy of spasticity in children with cerebral palsy. Pediatr Neurol 34:1–6 Verschuren O, Bloemen M, Kruitwagen C, Takken T (2010) Reference values for aerobic fitness in children, adolescents, and young adults who have cerebral palsy and are ambulatory. Phys Ther 90:1148–1156 Verschuren O, Zwinkels M, Ketelaar M et al (2013) Reproducibility and validity of the 10-meter shuttle ride test in wheelchair-using children and adolescents with cerebral palsy. Phys Ther 93:967–974 Welch WA, Strath SJ, Swartz AM (2015) Congruent validity and reliability of two metabolic systems to measure resting metabolic rate. Int J Sports Med 36:414–418 Wood E, Rosenbaum P (2000) The gross motor function classification system for cerebral palsy: a study of reliability and stability over time. Dev Med Child Neurol 42:292–296

The Use of Kinematics for Pulmonary Volume Assessment Carlo Massaroni

Abstract

Since the movements of the lung are transmitted to the chest wall during breathing, the interest in technologies for the noninvasive monitoring of the kinematics of the thorax to study the breathing biomechanics is growing up. The analysis of the kinematics of the chest wall can be used to: (i) measure human chest wall movements, (ii) study the mechanics of breathing, and (iii) evaluate respiratory volumes during breathing. Differently from the spirometer and other equipment, tools based on the tracking of the chest wall deformities allow patients to perform breathing exercises without any kind of constraints and to investigate the unaltered breathing mechanics, as well as to separately study the behavior of each compartment of the chest wall. The most well-established technology to assess the breathing volume of the chest wall and of its compartments (i.e., pulmonary and abdominal rib cage and abdomen) is the optoelectronic plethysmography (OEP). OEP is a motion capture system designed and validated to track a number of photo-reflective markers placed on the human chest wall. By the reconstruction of the kinematics of the chest, OEP algorithm allows the computation of the subject’s respiratory volumes and other breathing-related features. Since 1990 a lot of research group have been developed and tested noninvasive optical technologies to assess respiratory pattern parameters, to measure asynchronies inside chest wall, to investigate patient respiratory strategies and volume moved, and to distinguish different respiratory disease. Specifically, the increasing use of OEP in clinical evaluation context in the respiratory field is widely proven by the growing number of published articles in the last few years.

C. Massaroni (*) Unit of Measurements and Biomedical Instrumentation, Campus Bio-Medico di Roma University, Rome, Italy e-mail: [email protected] # Springer International Publishing AG 2016 B. Müller, S.I. Wolf (eds.), Handbook of Human Motion, DOI 10.1007/978-3-319-30808-1_42-1

1

2

C. Massaroni

Its noninvasiveness and the possibility of the use of other equipment (e.g., pressure sensors, electromyography, electrocardiogram) and devices (e.g., cycle ergometer, treadmill) at the same time allow to study a wide range of patients with different physiological and clinical condition (included the no-collaborative ones) and provide new perspective on the evaluation of ventilatory parameters. Keywords

Breathing • Respiratory volumes • Breathing mechanics • Chest wall kinematics • Lung volumes

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . State of the Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Theory of OEP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Measurement Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Validity, Accuracy, Reliability, and applications of the Measurements . . . . . . . . . . . . . . . . . . . . . . . . . Clinical Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Limitation and Drawbacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusions/Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cross-Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2 3 3 4 10 12 14 14 14 15

Introduction In clinical practice and scientific research, the evaluation of the biomechanics of the chest wall, and the analysis of the breathing volumes by the use of optical and optoelectronic technologies is growing up. The indirect measurement of the ventilation by the external measurement of chest wall surface kinematics has some advantages compared to the classical tools used to measure the pulmonary ventilation (i.e., spirometer or a pneumotachograph). First of all, the optoelectronic technologies are less invasive than the flow measure that can appear to be an easy procedure but that can be affected by several issues as the need of the use of mouthpieces, face masks, and nose clips can thereby increase the tidal volume calculation. Secondly, the gas composition, temperature, humidity, as well as pressure generally influence the recording of lung volume collected with the pneumotachographs. Thirdly, the uncooperation of the patient have to be considered: technologies based on the flow integration during the time cannot be used or are extremely difficult in children or uncooperative adults, as well as in some research scenarios (e.g., intensive care unit). Since 1960 scientific community starts to study respiratory system as a model and founded its analysis on mechanical models of the different parts composing the system (i.e., airways, lungs, respiratory muscles, thoraco-abdominal wall and everything that surrounded the lungs). In the 1960s, Mead et al. published a series of scientific papers in which the possibility of measuring lung volume variations by the measure of thoracic and abdominal wall displacement were described (Mead 1963;

The Use of Kinematics for Pulmonary Volume Assessment

3

Mead et al. 1967; Konno and Mead 1967). He demonstrated a relationship between chest wall volume variations and its kinematics (Barnas et al. 1987). These studies gave the way to hundreds of studies on respiratory mechanics in the 1980s with works of Mead, Macklem, Peslin, and others who contributed to understand the respiratory mechanics, the mechanisms of pulmonary ventilation, and to know how the ventilatory pump acts on respiratory structures (with their elastic and resistive properties) to ventilate the lungs. Until the 1990s, studies were performed by the use of resistive and variable inductance displacement sensors or magnetometers stiffly apposed on the chest wall surface. The real technological breakthrough occurred in 1990 with Pedotti, who was the first to use a system based on motion analysis technology that provided the theoretical possibility of measuring and monitoring the kinematics of a number of points by photo-reflective markers. The thoraco-abdominal surface, the enclosed volume, and volume changes during breathing were measured for the first time (Ferrigno et al. 1994).

State of the Art Theory of OEP From a physiological point of view, the chest wall models is a three compartment model, composed by the pulmonary rib cage (RCp), the abdominal rib cage (RCa), and the abdomen (AB) as highlighted in Fig. 1. The three-compartment model allows to considered (i) that RCp and RCa are exposed at different pressures during the inspiration, (ii) that the diaphragm acts directly only on RCa, and that nondiaphragmatic inspiratory muscles act largely on RCp and not on RCa. Regarding the abdomen, AB volume change is defined as the volume swept by the abdominal wall (Konno and Mead 1967), and it is the result of the conjunct action of the diaphragm and expiratory abdominal muscles. While in healthy subjects the three compartments generally move synchronously, in different diseases, different asynchronous movement between the three chest wall compartments can be found. Total chest wall volume or global volume (VCW) is generally considered like the sum of VRCp, VRCa, and VAB. RCp is defined as the area extending from the clavicles to a line around the thorax at the level of the xifoid process (corresponding to the area of apposition of the diaphragm to the rib cage at the expiratory volume in sitting posture, confirmed by percussion); RCa as the area extending from this line to the costal margin anteriorly down from the xifosternum and to the level of the lowest point of the lower costal margin posteriorly; AB is defined as the area extending caudally from the lower rib cage to the level of the anterior superior iliac crest.

4

C. Massaroni

Fig. 1 3D human chest wall obtained after the triangles mesh. Markers are showed as black points; blue, green, and orange surfaces represent the pulmonary rib cage (RCp), the adbominal rib cage (RCa), and the abdomen (AB), respectively

Measurement Principles The OEP working principle is based upon the principle of 3D motion capture system. Six to eight TV cameras are placed in a dedicated room in a circular perimeter around the subject. Camera placement must ensure that a minimum of two cameras capture the markers during movement. IR-reflective passive markers are affixed to the skin of the subject with doublesided tape. For chest wall capture, 6 mm and 9 mm markers are usually used. Marker placement is a large source of potential error: investigators must have outstanding palpation skills to orient markers to body anatomy. Optimally, the same examiner sets up each subject to ensure consistent placement. The definition of the marker set prior to data collection determines the number of markers used. The full marker set, the most widely employed marker set in breathing analysis, uses 89 markers to define the three compartments. However, other marker sets can be used to compartmentalize the chest wall in newborns and to monitor the volumes in different postures than the standing one, like the supine or lateral ones. Before starting with the data collection, a calibration procedure is needed to minimize systematic errors that may occur due to effects such as lens distortion, lack of flatness of the imaging plane, warm-up effects, and unequal pixel spacing. Motion capture systems customarily use manufacturer-provided calibration cubes and/or calibration wands. Once the calibration procedures have been completed, parameters are stored for transformation of image coordinates into 3D marker coordinates. Therefore it is crucial that camera positions hold steady throughout the data collection. With multiple cameras, after the calibration, OEP reconstruction is the systematic integration of a flat image from each camera into a 3D coordinate system. OEP uses the direct linear transformation (DLT) as reconstruction technique (Chen et al. 1994). Markers are illuminated by stroboscopic light by means of a group of IR emitting diodes positioned circularly around each camera lens. The synchronization between

The Use of Kinematics for Pulmonary Volume Assessment

5

Workstation IR-photoreflective markers

3D markers coordinates

Geometrical model

Computation algorithm Volume calculation Chest wall compartmental VRCp VRCa volumes and volume VAB changes Cameras OEP Volume [L]

VRCp

18

VRCa

VAB

Vcw

10 5 0 10 20 30 40 50 60 time [s]

Fig. 2 Data collection carried out by using the OEP. Working flow shows the steps to obtain the chest wall compartmental volume and volume changes. Moreover, a typical breathing pattern during quiet breathing is reported in the figure as well

cameras is possible by the use of phase locked loop, which operate at frequency higher than 60 Hz (Fig. 2). Markers are automatically recognized by means of the dedicated computer algorithm by a pattern recognition technique for object identification in real time and according to shape and size, not exclusively to light intensity: the marker centroid is calculated with a resolution of about 1:65.536 of the field of view (Ferrigno and Pedotti 1985). A dedicated motion analyzer synchronizes input and output information to and from cameras, and then an ad-hoc designed software in the motion analyzer computes the 3D trajectories of each marker. To compute the chest volumes, geometrical models are developed by defining a closed surface starting with connecting each triplet of markers to arrange a triangle. From each closed surface, the area contained into this surface can be calculated. Figure 2 shows the placement and the relative displacement of the markers using an 89 marker set along the X, Y, and Z axes. Several geometrical models have been developed and validated to calculate the volume in both standing and supine positions. Chest wall model based on 32 markers, 54 markers, 86 markers, and 89 markers have been developed and validated in literature (Ferrigno et al. 1994; Aliverti et al. 2001; Cala et al. 1996; Massaroni et al. 2017). The model choice depends on the numbers of the markers

6

C. Massaroni

Fig. 3 89 marker protocol setup in the configuration generally used in the seated and standing assessment: (a) front view of the chest wall (42 markers), (b) and (c) Lateral views of the chest wall (5 lateral markers for each side), (d) back view of the chest wall (37 markers)

placed on the skin as well as on the number of compartment that one would like to monitor during the collection (e.g., only rib cage and abdomen, pulmonary and abdominal rib cage, both lower and upper abdomen contribution). Higher the number of marker on the skin, better the mesh of triangle allowing the compartmentalization of the chest. In the 89 marker set up, markers are placed in seven horizontal rows between the clavicles and the anterior superior iliac spine. Along the horizontal rows, markers are arranged anteriorly and posteriorly in five vertical rows and there is an additional bilateral row in the midaxillary line. The anatomical landmarks for the horizontal rows are clavicular line, manubriosternal joint, nipples, xiphoid process, lower costal margin, umbilicus, and anterior superior iliac spine. Landmarks for vertical rows are midlines, anterior and posterior axillary lines, midpoint of the interval between midline and the anterior and posterior axillary line, and midaxillary lines. An extra marker is added bilaterally at the midpoint between the xiphoid and the most lateral portion of the tenth rib to provide better detail of the costal margin; two markers are added in the region of overlying the lung apposed rib cage and in the corresponding posterior position (Fig. 3). Sitting, supine, and prone evaluations are carried out without posterior and anterior markers, respectively, with a 52 markers protocol. In newborns, the marker protocol is slightly different because the space limitation of the chest wall surface. 24 hemispheric reflective markers can be positioned on the anterior thoracic-abdominal surface from the clavicles to the anterior superior iliac spines by designing six horizontal lines and five vertical lines to define the rib cage and abdomen compartment. 52 markers can be used on infants where seven circumferential horizontal rows between the clavicles and the anterior superior iliac spine are considered, RCp is described by four levels of markers, and RCa by two. Volume calculation is based on the method to compute volumes using the Gauss theorem. For each triangle identified by three markers, the surface (S) and the

The Use of Kinematics for Pulmonary Volume Assessment

7

!

direction of normal vector ( n ) are calculated and the volume contained in this surface can be calculated using the Gauss theorem as in Eq. 1: ð

!

!

ð

F  n dS ¼ dV ¼ V S !

(1)

V

where F is an arbitrary vector, S is a closed surface, V is the volume closed by S, and ! n is the normal unit vector on S. This procedure allows the computation of the volume enclosed by the thoraco-abdominal surface. OEP chest wall model is a three compartment model. That model allows to consider (i) that RCp and RCa are exposed at different pressures during the inspiration, (ii) that the diaphragm acts directly only on RCa, and that nondiaphragmatic inspiratory muscles act largely on RCp and not on RCa. Regarding the abdomen, AB volume change is defined as the volume moved by the abdominal wall (Romagnoli et al. 2008; Konno and Mead 1967), and it is the result of the conjunct action of the diaphragm and expiratory abdominal muscles. While in healthy subjects the three compartments generally move synchronously, in different diseases, different asynchronous movement between the three chest wall compartments can be found. Total chest wall volume (VCW) is the sum of VRCp, VRCa, and VAB. One of the parameters measured by OEP is chest wall kinematics. The compartmental analysis results are useful to assess the contribution of each single compartment or of each hemithorax of the chest wall to the respiratory pattern. Results obtained from the breath-by-breath analysis of compartmental volumes can be further processed to assess if the thoracic-abdominal movement of the chest wall is synchronous (Fig. 4). Asynchrony is defined as the difference in time of expansion or retraction between the compartments. When this difference is so great that the movement among the compartments becomes opposite, paradoxical movement occurs. Konno and Mead described one of the first models of movement of the chest wall, assuming that during spirometry breathing the respiratory system is closed, and hence, it a has a single degree of freedom. In this case, volume variations of the rib cage must be equal and opposite to volume variations of the abdomen and the two compartments must move in phase. Within a given lung volume, there is a single linear relationship between the rib cage and the abdomen (isovolumic line). During quiet breathing, the coordination of diaphragm, intercostal, and scalene muscles allows the thoracic-abdominal system to move with one degree of freedom along its “relaxation line,” which results from plotting abdominal versus rib cage movement during lung passive inflation. Breathing along this line represents the most efficient way to exploit the respiratory system (Konno and Mead 1967). After the description of this model, several methods were designed in order to describe the synchrony of the thoracic-abdominal movement. The following are the most commonly used variables: phase angle and shift, inspiratory phase ratio (PhRIB), total phase ratio (PhRTB), expiratory phase ratio (PhREB), and the cross-correlation function (CCF).

8

C. Massaroni

Fig. 4 (a) Plot of rib cage (RC) versus abdominal (AB) excursion plot illustrating calculation of phase angle ϕ. s = maximal AB excursion, m = horizontal width of RC–AB loop at halfway between maximal and minimal RC excursion. (b) Time plot of AB and RC excursions showing phase shift, (c) Konno-Mead loop for subjects presenting synchronous chest wall motion, asynchronous motion (d), and paradoxical chest wall motion (e)

Another kinematic measure uses the phase angle analysis as a quantitative evaluation of the phase shift between the rib cage (RC, as the sum of RCp and RCa) and the AB and reflects the delay between excursions of the two compartments of the CW. It is measured in degrees ( ), ranging from 0 to 180 , where 0 represents perfect synchrony, and 180 represent paradoxical movement. The calculation of the phase angle is performed through equations extracted from Konno-Mead loop (Konno and Mead 1967), or Lissajous figure, in which movements of one compartment during one respiratory cycle are plotted against the excursion of a second compartment in an X-Y graph. In the case of angles lower than 90 , Eq. 1 is used: sin Φ ¼ m =s

(1)

while when angles between 90 and 180 occur, Eq. 2 is valid: Φ ¼ 180  μ where μ ¼ m =s .

(2)

The Use of Kinematics for Pulmonary Volume Assessment

9

The variable m is the width of the figure at the midpoint of the maximal excursion of the compartment represented in the Y axis, and s, the maximal excursion of the compartment shown in the X axis. The advantage of this kind of analysis is that data gathered throughout the respiratory cycle are evaluated. Phase angle analysis can be used when rib cage has an almost sinusoidal shape. Non-sinusoidal curves and/or figure-eight curves should be removed from analysis as they may affect the results. For curves with Ф greater than 20 , the direction of the curve can be identified. A clockwise curve indicates that the rib cage precedes the abdomen, and a counterclockwise curve indicates that the abdomen precedes the rib cage. In healthy subjects, the movement of thorax and abdomen during breathing is almost synchronous, and therefore it results in a line which is similar to a straight line. As the movement becomes asynchronous, the line becomes a loop which can range from an elliptical to a circular shape as the asynchrony worsens. In physiological conditions in quiet breathing, the abdominal compartment leads the rib cage. Other kinematic measures include Inspiratory phase ratio (PhRIB), Expiratory phase ratio (PhREB), and Total phase ratio (PhRTB). These values represent the percentage of time of the respiratory cycle in which the compartments of the RC and AB move asynchronously: 0% represents synchrony, whereas 100% indicates paradoxical movement. These variables quantify the asynchrony at each point of the respiratory cycle. No sinusoidal curves or Konno-Mead loops are required to perform this analysis (Aliverti et al. 2009). The cross-correlation function represents the delay in seconds among the compartments. When the movement is synchronous the delay is equal to 0 s. The higher is the cross-correlation function, the greater the asynchrony between the compartments of the CW (Millard 1999). The paradoxical inspiratory time is the parameter used to evaluate the asynchrony of the CW compartments in patients with COPD during exercise on a cycle ergometer. This variable is defined as the fraction of inspiratory time, in percentage, in which the volume of RCa decreases (Parreira et al. 2012). Breathing volumes, volume changes, and compartmental percentage contributions are further measure of chest wall function. Volume variations of the chest wall, of its three compartments, and of each hemithorax are calculated as the difference between the end-inspiratory volume and the end-expiratory volume of the same compartment. Different volume variables, measured in liters, can be assessed through OEP: tidal volume of CW and of its compartments, end-expiratory volume of CW and of its compartments, end-inspiratory volume of CW and of its compartments. Moreover, the volume changes are calculated as percentage (%) of contribution of each compartment to the chest wall tidal volume. If maximal inspirations are performed repeatedly during the exercise, changes in chest volume can also be calculated regarding the total lung capacity (TLC), and it is possible to assess restriction in vital capacity when the end-inspiratory volume is close to TLC. By the analysis of the volumes, it is also possible to indirectly estimate the following lung volumes: Expiratory Reserve Volume (ERV), Inspiratory Reserve Volume (IRV), Forced Expiratory Volume in the first second (FEV1). Within the respiratory cycle, by the kinematical analysis it is possible to evaluate the total time of the respiratory cycle, the inspiration and expiratory time, the ratio

10

C. Massaroni

between inspiratory time and total time of the cycle, respiratory rate, and minute ventilation (product of respiratory rate and tidal volume). The time variables measured in seconds by means of optical technology are: inspiratory time (Ti), expiratory time (Te), and total time of the respiratory cycle. Besides these, the following variables can be calculated: inspiratory time in relation to the total time (Ti/Ttot), RR in breathing incursions per minute, minute ventilation (VE) in liters per minute, and mean inspiratory flow and mean expiratory flow in liters per second (Parreira et al. 2012). Table 1 reports several parameters that can be estimated by the kinematics of the chest wall and their clinical significance, as reported in Wilhelm et al. (2003).

Validity, Accuracy, Reliability, and applications of the Measurements An important requirement for a breathing measurement system based on optoelectronic technology is the noninvasiveness, i.e., a decreasing degree of interference in the subject’s performance of a human natural movement. OEP performances have been widely evaluated by using ad-hoc designed calibrator systems and by the comparison with reference instruments (i.e., flowmeters and pneumotachometers). OEP is able to detect linear marker’s displacement higher than 30 μm, correspondent to a volume threshold around 8.92 mL. By increasing the numbers of the cameras adopted, OEP accuracy grows when spherical markers are placed on the skin. Moreover, the accuracy in the volume estimation appears to be not influenced by thorax’s movement, magnitude, and breathing rate. OEP validity in measuring lung volume changes has been evaluated in healthy seated and standing subjects in different experimental settings (e.g., quiet breathing, incremental exercises): maximum difference between spirometer and OEP was always 40° 0

20

40

60

80

100

Stride (%) Semitendinosus Healthy Subject Cobb angle < 20° Cobb angle [20-40°] Cobb angle > 40°

0

20

40 60 Stride (%)

80

100

Fig. 5 Typical trace of electromyographic activity of quadratus lumborum, erector spinae, gluteus medius, rectus femoris, and semitendinosus for a scoliosis patient from each scoliosis group compared to a normal subject, expressed as a function of normalized stride (expressed in %). The horizontal black bars represent the phasic activity of the muscles for the right side of the normal subject and convex side of patients. The horizontal gray bars represent the phasic activity of the muscles for the right side of the normal subject and concave side of patients (Reproduced from Mahaudens et al. 2009 with permission of Springer)

muscles raises the energetic cost of walking in scoliosis and decreases the efficiency of gait, as measured by total muscle work divided by the net energy cost (Mahaudens et al. 2010). It is unknown whether this prolonged activation is a result of neurological dysfunction or is simply a compensatory mechanism to maintain stability during walking. If the latter, the abnormalities in muscle activity would be considered a

Impact of Scoliosis on Gait

13

result of the deformity rather than part of its cause. However, if this were the case, the duration of muscle activity would likely be related to the severity of the scoliotic curvature, and the effect would be exacerbated in more extreme curves. Instead, the prolonged activation is observed similarly in all levels of curve severity, even those with only minor scoliosis (Mahaudens et al. 2009). This finding reduces the likelihood that the gait pathology is entirely a consequence of the spine and pelvis deformities, again suggesting a neuromuscular contribution.

Relationship of Gait Parameters to Curve Severity Many studies have investigated the relationship of the gait pathology to the severity and type of scoliotic curve. With patients walking at a fixed speed, Mahaudens et al. found no significant relationship of any of the aforementioned gait abnormalities to the degree of curvature; however, they did observe trends suggesting transverse pelvic motion decreases with greater curve severity (2009). In another large study of patients with thoracolumbar curves, where patients were permitted to walk at a self-selected speed, several gait variables did appear to be related to curve severity. Knee flexion at initial contact increased with the degree of deformity, while knee range of motion decreased. Additionally, reductions in cadence and pelvic range of motion were more pronounced in patients with more severe curves (Syczewska et al. 2012). Evidence of the relationship between kinetic abnormalities and curve severity is mixed as well. Schizas et al. determined there was no association between ground reaction force asymmetry and degree of spinal deformity (1998). However, in a study that considered multiple types of curvature, Park et al. established relationships between ground reaction force asymmetry and the severity of spinal curvature and pelvic tilt (2016). Amid conflicting findings, this relationship of the gait pathology to curve severity continues to be of great interest for clinicians. If the gait pathology is related to curve severity, then correction of the deformity, such as surgical or therapeutic interventions, may influence gait outcomes.

Response to Treatment In theory, it seems likely that interventions for scoliosis would affect performance in walking, though one might argue these effects could be either beneficial or deleterious. The most common treatments, orthotic bracing and surgical fusion of the spine, are partially aimed at restoring structural symmetry. However, both bracing and surgery impose a rigidity on the torso, which could exacerbate the preexisting stiffness during gait. The following section discusses the effect of both types of treatment on motion of the spine and distal segments during walking.

14

E.A. Rapp and P.G. Gabos

Fig. 6 Common bracing options for adolescents with idiopathic scoliosis: Boston brace (left) and Wilmington brace (right) (Courtesy of Dr. Peter Gabos, Nemours/ Alfred I. duPont Hospital for Children, Wilmington, DE)

Response to Bracing Bracing treatment is a common nonsurgical intervention for idiopathic scoliosis, but rarely results in any curve improvement. The treatment regimen typically requires the patient to wear a customized orthosis between 12 and 20 h a day. Various types of braces exist, some spanning the cervical spine to the sacrum, some shorter, some rigid, and some more flexible (Fig. 6). In the short term, both rigid and flexible braces appear to reduce motion of the hip and pelvis during gait (Wong et al. 2008). In contrast, after long-term orthotic treatment (6 months) and a substantial period between removing the brace and gait testing, treatment effects show increased pelvis and hip motion in the frontal plane (Mahaudens et al. 2014). As previously established, untreated scoliosis patients demonstrate restricted pelvis and hip motion compared to their healthy peers, and thus, an increase in these parameters represents progress toward a more normal pattern of walking. Additionally, a decrease in the abnormally high duration of activity of the erector spinae throughout the gait cycle is observed following long-term orthotic treatment (Mahaudens et al. 2014). In untreated patients, the excessive activity is theorized to provide a stiffening effect for balance, which may no longer be necessary following prolonged brace-wearing. Still, the energy cost of walking, which is elevated in scoliosis, does not appear significantly reduced after long-term bracing treatment.

Response to Surgery Surgical treatment in scoliosis is typically reserved for severe curves (those exceeding 50 ) (Weinstein et al. 2003). The most common technique, spinal fusion, involves the insertion of metal screws into vertebral pedicles and attachment of a rod spanning the length of the curve. This results in an immediate straightening and

Impact of Scoliosis on Gait

15

Fig. 7 Example of a 14-year-old girl with a combined right thoracic and left lumbar (Lenke 2C-[R]) curve operated on with an all pedicle screw construct. (a) Preoperative and (b) last follow-up frontal radiographs; (c) preoperative and (d) last follow-up sagittal radiographs (Courtesy of Dr. Peter Gabos, Nemours/Alfred I. duPont Hospital for Children, Wilmington, DE)

de-rotation of the spine (Fig. 7) with patients typically returning to full activity by 6 months (Lehman et al. 2015). Fusion surgery imposes a restriction of spinal range of motion in all three planes, the extent of which depends on the number of vertebrae involved in the fusion (Danielsson et al. 2006; Engsberg et al. 2003). While the reduction in spinal range of motion could theoretically exacerbate the stiffness observed in the gait of untreated scoliosis patients, it is believed that the structural correction can reduce energy demands, thereby increasing muscle efficiency. Results vary by the type of scoliotic curve. Surgical correction of thoracic curves seems to have little effect on most gait variables. The main result is a reduction of transverse plane shoulder motion (Mahaudens et al. 2010), which essentially corrects the asymmetrical forward rotation of the trunk and shoulders described by Kramersde Quervain et al. (2004). For thoracolumbar and lumbar curves, frontal plane pelvis and hip motion increases postoperatively. These results are similar to the long-term effects of bracing and represent a normalization of the motion of these segments during gait (Mahaudens et al. 2010). Additionally, there is a reduction in lateral center of mass displacement (i.e., sway) during walking, demonstrating potential evidence of better dynamic stability postsurgery (Paul et al. 2014). Fusion surgery does not appear to effect significant changes in muscular work or muscle activation timing. There may be a trend toward reduction of energy cost; however, the differences are not significant. Even postsurgery, adolescents with

16 4

Energy cost (J kg-1 m-1)

Fig. 8 Total energy cost. The mean (vertical bar chart) SD (vertical bars) are drawn in presurgery condition (white bar) and in postsurgery condition (gray-lined bar). The black bar represents the mean of norms (Adapted from Mahaudens et al. 2010 with permission of Springer)

E.A. Rapp and P.G. Gabos

2

0 AIS pre surgery

AIS post surgery

Norms

idiopathic scoliosis still walk with increased energy cost when compared to their healthy peers (Mahaudens et al. 2010; Fig. 8). Overall, while some abnormalities remain, treatment for scoliosis generally results in modified kinematics that better resemble motion in healthy walking, specifically increased motion of the hip and pelvis. Additionally, bracing treatment reduces excessive muscle activity, and surgical treatment appears to slightly improve efficiency of walking by reducing overall energy cost.

Summary and Conclusion Gait performance in scoliosis has been heavily researched, with most investigations reporting some abnormalities compared to healthy walking. Primary observations include reduced trunk and pelvic motion within the frontal and transverse planes, often reported clinically as a “stiffness” in ambulation. Various reports of asymmetry exist, the most consistent finding an uneven progression of the trunk in the transverse plane, with excessive forward rotation of the right shoulder throughout the gait cycle. There is some evidence of postural control deficits and asymmetries in lower limb kinematics and joint moments; however, these results vary throughout the literature. Muscle activity appears increased in duration throughout the gait cycle, and energy cost of walking is higher when compared to healthy adolescents. Treatment for scoliosis, both orthotic and surgical, has a positive effect on gait variables. Shoulder, pelvis, and hip motion improve toward a more normal pattern, and center of mass displacement is reduced. Furthermore, while overall levels are still higher than normal, treatment also alleviates some of the energy cost of walking for patients with scoliosis. Despite the extensive research, conclusions about how gait abnormalities relate to the origin or progression of scoliosis remain vague and often conflicted. Impaired gait does appear to be associated with idiopathic scoliosis. Still, the ideas that the gait pathology contributes to curve progression and that impaired gait and the spinal

Impact of Scoliosis on Gait

17

deformity are both secondary to some underlying neurological disorder are still largely based in theory. Continued research into motor control and somatosensory function may provide more insight into neurological relationships between the deformity and gait performance. In the meantime, gait analysis can still provide a valuable assessment of global function and evaluation of therapeutic and surgical outcomes in scoliosis.

References Ascani E et al (1986) Natural history of untreated idiopathic scoliosis after skeletal maturity. Spine 11(8):784–789 Asher MA, Burton DC (2006) Adolescent idiopathic scoliosis: natural history and long term treatment effects. Scoliosis [Online] 1. Available at: http://www.ncbi.nlm.nih.gov/pmc/arti cles/PMC1475645/. Accessed 12 Feb 2015 Barrack RL et al (1984) Proprioception in idiopathic scoliosis. Spine (Phila Pa 1976) 9(7):681–685 Chen P-Q et al (1998) The postural stability control and gait pattern of idiopathic scoliosis adolescents. Clin Biomech 13(1):S52–S58 Danielsson AJ, Romberg K, Nachemson AL (2006) Spinal range of motion, muscle endurance, and back pain and function at least 20 years after fusion or brace treatment for adolescent idiopathic scoliosis: a case-control study. Spine (Phila Pa 1976) 31(3):275–283 Engsberg JR et al (2003) Prospective comparison of gait and trunk range of motion in adolescents with idiopathic thoracic scoliosis undergoing anterior or posterior spinal fusion. Spine (Phila Pa 1976) 28(17):1993–2000 Giakas G et al (1996) Comparison of gait patterns between healthy and scoliotic patients using time and frequency domain analysis of ground reaction forces. Spine (Phila Pa 1976) 21 (19):2235–2242 Grivas TB et al (2010) Brace technology thematic series: the dynamic derotation brace. Scoliosis [Online] 1. Available at: http://www.ncbi.nlm.nih.gov/pubmed/20858270. Accessed 18 Feb 2015 Gum JL et al (2007) Transverse plane pelvic rotation in adolescent idiopathic scoliosis: primary or compensatory? Eur Spine J 16(10):1579–1586 Herzog W et al (1989) Asymmetries in ground reaction force patterns in normal human gait. Med Sci Sports Exerc 21(1):110–114 Konieczny MR, Senyurt H, Krauspe R (2013) Epidemiology of adolescent idiopathic scoliosis. J Child Orthop 7:3–9 Kramers-de Quervain IA et al (2004) Gait analysis in patients with idiopathic scoliosis. Eur Spine J 13(5):449–456 Lehman RA et al (2015) Return to sports after surgery to correct adolescent idiopathic scoliosis: a survey of the Spinal Deformity Study Group. Spine J 15(5):951–958 Mahaudens P, Thonnard JL, Detrembleur C (2005) Influence of structural pelvic disorders during standing and walking in adolescents with idiopathic scoliosis. Spine J 5(4):427–433 Mahaudens P et al (2009) Gait in adolescent idiopathic scoliosis: kinematics and electromyographic analysis. Eur Spine J 18(4):512–521 Mahaudens P et al (2010) Gait in thoracolumbar/lumbar adolescent idiopathic scoliosis: effect of surgery on gait mechanisms. Eur Spine J 19(7):1179–1188 Mahaudens P et al (2014) Effect of long-term orthotic treatment on gait biomechanics in adolescent idiopathic scoliosis. Spine J 14(8):1510–1519 Mallau S et al (2007) Locomotor skills and balance strategies in adolescents idiopathic scoliosis. Spine (Phila Pa 1976) 32(1):E14–E22 Mayo NE et al (1994) The Ste-Justine adolescent idiopathic scoliosis cohort study. Part III: back pain. Spine (Phila Pa 1976) 19(14):1573–1581

18

E.A. Rapp and P.G. Gabos

Park HJ et al (2015) Analysis of coordination between thoracic and pelvic kinematic movements during gait in adolescents with idiopathic scoliosis. Eur Spine J 25:385–393 Park YS et al (2016) Association of spinal deformity and pelvic tilt with gait asymmetry in adolescent idiopathic scoliosis patients: investigation of ground reaction force. Clin Biomech 36:52–57 Paul JC et al (2014) Gait stability improvement after fusion surgery for adolescent idiopathic scoliosis is influenced by corrective measures in coronal and sagittal planes. Gait Posture 40 (4):510–515 Perry J, Burnfield JM, Cabico LM (2010) Gait analysis: normal and pathological function, 2nd edn. SLACK, Thorofare Prince F et al (2010) Comparison of locomotor pattern between idiopathic scoliosis patients and control subjects. Scoliosis [Online] 1. Available at: http://www.ncbi.nlm.nih.gov/pmc/articles/ PMC2938665/. Accessed 1 Mar 2015 Saji M, Upadhyay S, Leong J (1995) Increased femoral neck-shaft angles in adolescent idiopathic scoliosis. Spine (Phila Pa 1976) 20(3):303–311 Schizas CG et al (1998) Gait asymmetries in patients with idiopathic scoliosis using vertical forces measurement only. Eur Spine J 7(2):95–98 Schlösser TPC et al (2014) How “idiopathic” is adolescent idiopathic scoliosis? A systematic review on associated abnormalities. PLoS One [Online] 9(5). Available at: http://www.ncbi. nlm.nih.gov/pmc/articles/PMC4018432/. Accessed 14 Jan 2015 Schwender JD, Denis F (2000) Coronal plane imbalance in adolescent idiopathic scoliosis with left lumbar curves exceeding 40 degrees: the role of the lumbosacral hemicurve. Spine (Phila Pa 1976) 25(18):2358–2363 Syczewska M et al (2012) Influence of the structural deformity of the spine on the gait pathology in scoliotic patients. Gait Posture 35(2):209–213 Weinstein SL et al (2003) Health and function of patients with untreated idiopathic scoliosis: a 50-year natural history study. JAMA 289(5):559–567 Wong MS et al (2008) The effect of rigid versus flexible spinal orthosis on the gait pattern of patients with adolescent idiopathic scoliosis. Gait Posture 27(2):189–195 Yang JH, Suh SW et al (2013) Asymmetrical gait in adolescents with idiopathic scoliosis. Eur Spine J 22(11):2407–2413

Concussion Assessment During Gait Robert D. Catena and Kasee J. Hildenbrand

Contents State of the Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 What is a Concussion? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Current Clinical Considerations in Concussion Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Steady-State Gait Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Functional Gait Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Dual-Task Gait Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

Abstract

The acute signs and symptoms (SS) of a concussion can vary widely between individuals. Clinicians currently use a variety of measures to diagnosis and manage both physical and cognitive SS associated with concussion. Balance is typically assessed using quick sideline measures in sports; however, researchers have found through more thorough assessments of dynamic balance during gait that SS may persist beyond those detected through typical assessment techniques. An appropriate gait assessment of concussion must be adequately complex to distinguish persistent balance deficits, but not so complex that healthy individuals would be challenged to maintain balance. A steady-state gait assessment may indicate conservative gait adaptations but will seldom yield distinct signs of continued dysfunction following concussion. Obstacle avoidance tasks demonstrate conservative gait adaptations long after other SS have resolved. Concussion R.D. Catena (*) Gait and Posture Biomechanics Lab, Washington State University, Pullman, WA, USA e-mail: [email protected] K.J. Hildenbrand Athletic Training Program, Washington State University, Pullman, WA, USA e-mail: [email protected] # Springer International Publishing AG 2016 B. Müller, S.I. Wolf (eds.), Handbook of Human Motion, DOI 10.1007/978-3-319-30808-1_69-1

1

2

R.D. Catena and K.J. Hildenbrand

typically results in balance deficits in divided attention dual-task paradigms even after a return to normal daily activities. Refinements of gait paradigms to be more specific and clinically useful define future advances in concussion assessment during gait. Keywords

Concussion assessment • Balance • Steady-state gait • Obstacle avoidance • Dual task • Balance deficits

State of the Art The state of concussion management and research has exploded over the last decade, with many different disciples using their unique lens to examine the issue. Clinicians may take the patient-centered perspective that focuses on specific symptoms, daily activities, and quality of life. Researchers use broad group comparisons to determine causation, correlations, diagnosis, and rehabilitative techniques. Each clinical and research discipline currently has its own criteria, which can make cross-discipline comparisons difficult. Historically, gait researchers have found significant differences when examining subjects with “mild” concussions versus “severe” concussion, while clinical professionals have now moved away from mild and severe dichotomization. Comparison of research between groups and between older and newer research is difficult with different groups occasionally changing the criteria by which they grade severity, measure symptoms, treat, and clear for activity. Gait analysis has the benefit of detecting changes from a concussion that will be present after other criteria have been used to return individuals to participation. Analysis of gait also allows treatment for patients to improve overall balance, movement efficiency, and overall quality of life. Currently many gait assessment techniques involve expensive and complex equipment, but strides are being made to use more widely available measures that can be implemented in a diagnostic or rehabilitative manner.

What is a Concussion? The Zurich Consensus Statement defines concussion as “a complex pathophysiological process affecting the brain, induced by traumatic biomechanical forces” (McCrory et al. 2013). The consensus statement goes further to indicate several common features that incorporate clinical, pathological, and biomechanical injury constructs, which can be helpful in discussing the nature of the concussion: 1. Concussion may be caused either by a direct blow to the head, face, or neck or a blow elsewhere on the body with an “impulsive” force transmitted to the head. 2. Concussion typically results in the rapid onset of short-lived impairment of neurologic function that resolves spontaneously.

Concussion Assessment During Gait

3

3. Concussion may result in neuropathological changes, but the acute clinical symptoms largely reflect a functional disturbance rather than a structural injury. 4. Concussion results in a graded set of clinical symptoms that may or may not involve loss of consciousness. Resolution of the clinical and cognitive symptoms typically follows a sequential course. In a small percentage of cases, however, post-concussive symptoms may be prolonged. 5. No abnormality on standard structural neuroimaging studies is seen in concussion. The acute clinical symptoms of a concussion vary widely between individuals, complicating a diagnosis. This variation between individuals is one of the most difficult problems clinicians and researchers face. Concussions differ from other injuries because often the symptoms are vague and no clear indicators exist to determine if an injury occurred. Common indicators for a concussion are physical signs (loss of consciousness), behavioral changes (irritability), cognitive impairment (slowed reaction time), or sleep disturbances (drowsiness) (McCrory et al. 2009). Concussion as defined above, along with the common symptoms, is the information medical professionals must rely on to diagnose and guide treatment. Immediately after a head impact occurs, there is a release of the excitatory amino acid neurotransmitters glutamine and aspartate (Grady 2010). These molecules lead to a loss of cell membrane integrity in the brain, which increases the concentration of sodium ions and decreases the concentration of potassium ions within cells. These changes in ion concentration affect the brain cells’ pH and lead to an increase in calcium ion concentration. These disturbances in the ion concentration lead to cellular damage resulting in the death of the affected brain cells (Grady 2010). Upon cell death, these brain cells release cytokines. Cytokines are responsible for the body’s inflammatory response, and they upregulate inflammation upon release within cells. This increase in inflammation is what is observed in the brain following a concussion injury and is responsible for added damage to the affected brain region (Grady 2010). The brain’s response to a concussion can be thought of in two separate parts. Initially, there is cellular damage resulting from the changes in ion concentration, such as sodium and potassium. These changes are responsible for the acute symptoms of concussion such as headaches and dizziness (Grady 2010). The second part is the inflammatory response caused by the release of cytokines within brain cells after a concussion. The timeline of this inflammation is the reason concussion symptoms often worsen in the 6–24 h post-injury (Grady 2010). The concussions that cause these chemical reactions are commonly referred to as mild traumatic brain injuries (mTBIs). Mild traumatic brain injury is technically not considered synonymous with concussion by many experts, but mTBI is often seen in the literature as a replacement for concussion. Research on concussions and mTBI has rapidly increased over the last decade due to specific findings that concussions may lead to degenerative brain diseases and cognitive impairments later in life. Sports-related concussions (SRC) within an athletic population have been estimated between 1.1 and 1.9 million annually, while estimates range from 22.5 % to

4

R.D. Catena and K.J. Hildenbrand

52.7 % of concussions likely not reported to a healthcare professional (Bryan et al. 2016). Evaluating for the incidence of concussions has an added challenge, as many athletes choose not to report their symptoms due to the restrictions from activity when suffering from a diagnosed concussion. Challenges in determining specific incidence rates of concussions also occur due to changes in the previous definition of SRC and difficulty in establishing appropriate reporting of SRC to coaches and healthcare professionals. Theye and Mueller (2004) state 20 % of head injuries (>300,000) are sports-related concussions. A study has shown that emergency department (ED) visits for SRC have risen over 200 % from 1997 to 2007, indicating a considerable increase in incidence (Schatz and Moser 2011). This growth can be attributed to both an increase in participation in sports and progression in knowledge of signs and symptoms of the condition, resulting in improved reporting of the condition (Register-Mihalik, et al. 2013). Another factor in the increase in reporting is the growth in awareness of concussions in the media, resulting in greater appreciation of its severity by the healthcare community.

Current Clinical Considerations in Concussion Management Certain licensed medical professionals such as athletic trainers in North America and medical doctors can both diagnose and determine qualifications for return to activity for those who have been afflicted with head trauma. One of the most widely used systems to diagnose concussions is the SCAT3 or Sport Concussion Assessment Tool – 3rd Edition (McCrory et al. 2013). The SCAT3 includes a system of questions that the afflicted person must answer. The person’s score at the end of the test is then compared to a baseline score or normative values if a baseline was not assessed. The baseline assessment should be conducted before the athlete experiences a concussion, since some athletes may already experience some symptoms naturally, such as having poor balance. The fewer the number of symptoms the athlete has, the lower the overall score on the SCAT3. Returning an athlete to activity requires a multifaceted approach with several measures of symptoms, cognitive function, balance, and ocular function all returning to baseline or “normed” values. The SCAT3 evaluates subjective symptoms as well, such as mood and nausea. Cognitive information is included, such as the current month, day, and year. The SCAT3 evaluates the athlete’s memory of sequences of words and numbers. Balance is also evaluated using the BESS test, or Balance Error Scoring System, where the athlete must balance on both feet, one foot, and then tandem stance. Another cognitive assessment tool commonly used to evaluate concussions is ImPACT or Immediate Post-Concussion Assessment and Cognitive Testing (Covassin et al. 2009). Many universities and some high schools pay to have access to this system, since it can track athletes’ records. Athletes should take the ImPACT as a baseline before the season begins to get a score, which is compared to normative ranges. Then, after an athlete sustains a concussion and his or her symptoms have subsided, the athlete retakes the ImPACT in order to gain an objective score of his or her cognitive function.

Concussion Assessment During Gait

5

Even with these diagnostic tools, clinicians are expected to follow a strict progression when returning afflicted athletes to sport participation (McCrory et al. 2013). Return-to-play progression may not begin until the athlete’s symptoms have returned to baseline and subsided. At this point, an athlete can begin to move through stages of light exercise, then progress to sport-specific activities, noncontact training, and integration into practice, and finally return to play. However, 24 h must pass between each stage, and if the athlete experiences any reoccurrence of symptoms, he or she must return to the previous stage. The extreme care taken with traumatic brain injuries may seem extensive, but these strict regimens are necessary. If athletes return to sport participation with unresolved symptoms, the results can be irreversible, like with second impact syndrome (Cantu 1998). This syndrome occurs when an individual suffering postconcussive symptoms participates in activity and receives a second head trauma. At this point, major brain swelling occurs and death is a possible outcome with a 50 % mortality rate. The cautious approach to returning an individual to activity is also used to minimize lingering symptoms past when normal resolution of symptoms should occur. Some medical professionals labeled this lingering of symptoms as postconcussion syndrome, though other professionals argue with the diagnosis. The important point for this discussion is that balance is often a symptom that can have lasting issues after other symptoms have resolved. Typically balance is clinically assessed quickly using a modified version of the BESS test or Romberg (standing on one leg with arms out to a T and eyes closed). Gait can be used to assess balance as well, but is not part of a typical concussion management program, especially within high school or college athletic programs. Specialized clinics for patients who have longer-lasting symptoms may use gait to detect differences beyond simple balance tests, and gait analysis can sometimes detect differences well beyond when an athlete may have returned to participation.

Steady-State Gait Assessment The most common form of gait assessment is a steady-state gait analysis. Gait requires a series of coordinating interjoint and interlimb movements to achieve movement of the center of mass while also balancing that center of mass appropriately to avoid a fall. Strength, reaction time, and coordination highlight some of the important motor components that are required during both of these two tasks in gait. Since balance incorporates many neuromuscular and neuropsychological components that can be affected by brain injury (Fig. 1), balance assessment is one of the most commonly performed and effective motor assessments following concussions. Balance deficits following concussion were widely recognized in the 1990s when about half of all surveyed concussed individuals (of all severity levels) reported dysfunction due to their injury in a 5 year post-TBI survey and physician assessment study appraising balance impairment (Hillier et al. 1997). This, along with some

6

R.D. Catena and K.J. Hildenbrand

Fig. 1 A conceptual model of the postural control system (Reproduced from Maki and McIlroy 1996)

high-profile cases of concussions to children and famous athletes, paved the way for concussion balance research through the next decade. As indicated in Fig. 1 (Maki and McIlroy 1996), balance control can be modulated by a number of different factors. Concussion affects balance at the central nervous system level but can be measured through gait assessment of the mechanical output. Research findings are mixed for effects at any particular sensory system in this balance pathway. Some research indicates no particular deficit in any one sensory input to balance following traumatic brain injury, but still a level of imbalance corresponding with injury severity (Mrazik et al. 2000). Others suggest vestibular dysfunction resulting from concussion (Aligene and Lin 2013; Alsalaheen et al. 2010; Corwin et al. 2015; Fife and Kalra 2015; Murray et al. 2014). While sensory organization tests are a typical method for isolating the effect of vestibular dysfunction on balance, dysfunction can be evident through, and must be accounted for, in clinical tests involving balance such as a gait assessment. When performing gait, a kinematic asymmetry may be most easily detectable with unilateral vestibular dysfunction. Others have used more intricate measures of center of mass motion to detect dysfunction (Deshpande and Patla 2005). Stability, posture, and balance are sometimes considered synonymous. For our purposes, we will differentiate these terms to clearly define the research from here on. “Posture” is an instantaneous pose of the body and the many joint positions that create that pose. “Balance” is the instantaneous measure of the propensity to fall through measures of the body center of mass with respect to a fulcrum (center of pressure) or base of support (area contained within the feet). “Stability” represents the consistency of a repeated cyclical action, and in the case of postural stability, that

Concussion Assessment During Gait

7

is the consistency of body motions at the joint level or of the whole body through a measure of the center of mass or more commonly measured center of pressure over a time period. These are referred to as “nonlinear analyses.” The interpretation of consistency versus randomness in movements in healthy human motion versus deficient motion is being explored by a number of research groups, and so the clinical applications of these nonlinear measurements are yet to be fully understood. We suggest that the reader consults additional sources such as Cavanaugh et al. (2005) about how nonlinear measurement techniques can provide clinically relevant information particular to different populations. Standing balance tests represent some of the earliest motor tests conducted in a concussed population to detect persistent balance deficits. More severe forms of brain injury will manifest symptoms in basic standing tests. Balance tests have indicated increased postural sway following concussions compared to healthy balance in clinical (Geurts et al. 1996; McCrea et al. 2003) and functional testing (Zhang et al. 2002). These tests will typically only detect immediate symptoms; however, individuals that have a history of concussive events can present persistent short-term (Gao et al. 2011; Quatman-Yates et al. 2015) and long-term (De Beaumont et al. 2011, 2013; Sosnoff et al. 2011) postural stability irregularities compared to healthy individuals. It typically takes more demanding motor tests to detect any balance symptoms from more mildly concussed individuals. In conducting a gait assessment, it is proper to first consider your goal. One goal may be to simply detect lingering motor (or other) performance symptoms that indicate that the concussed individual is not completely healthy (De Beaumont et al. 2011). Alternatively, your goal may be to detect particular lingering symptoms that could affect performance in a functional activity. Some forms of gait assessment may satisfy both of these goals but with reduced power. The more mildly concussed your group or individual is, the more precise you need your gait assessment to be. Secondly, you must develop an appropriate gait assessment considering the sensitivity and specificity needed to distinguish lingering symptoms for a particular level of concussion severity and duration since the concussive event. Since the persistence of symptoms is in question, the appropriate gait assessment must be robust enough to examine a range of likely symptomology. It includes an appropriate range of difficulties and considers how performance may be influenced by assessment duration, assessment time, and assessment environment. Balance is a measure of the center of mass with respect to the base of support. Most tests of balance in a clinical setting are either standing balance, which can be measured accurately without a direct measure of the center of mass, or involve costly tools that can better predict the center of mass motion. The challenge with a balance measure during gait is that a simpler measure, like how center of pressure is typically used in standing tests, doesn’t as accurately predict center of mass motion in this more dynamic task. The benefit of being able to measure balance during gait is that it provides the best indication of likelihood of a fall or other loss-of-balance injuries. Subsequent injury following concussions can be in the form of physical injury to the body (Brooks et al. 2016) and the potential for additional concussive events that compound the deleterious effects (CDC 1997).

8

R.D. Catena and K.J. Hildenbrand

Balance in gait additionally highlights different components of balance in Fig. 1 and, to an amplified level, compared to standing balance tests. Volitional movements and feedforward corrections are more prevalent in gait than they are in standing; thus, there is more reliance on the cerebral cortex, the very component concussion affects. Coordination of the musculoskeletal linkage and processing visual and somatosensory information are also more changing in gait than in standing. Gait is a more unbalancing task than standing due to these factors and therefore may be a more appropriate test of balance for mildly concussed individuals or individuals further into their rehabilitation from a severe concussion. Even though balance is a direct measure of the center of mass with respect to the base of support, there is no perfect way to measure center of mass location in the human body during dynamic tasks. With a full-body motion capture system and previously established anthropometric models of segment inertial parameters, you can estimate the center of mass location changes during gait. Use of anthropometry to aid in determining the body center of mass doesn’t quite provide perfectly accurate information; however, the young adult concussed population (less so for adolescent concussed individuals) has plenty of anthropometry studies to validate their use, and there are methods by which you can optimize the data to better fit a specific individual (Pavol et al. 2002). A motion capture system can also track the base of support so that measures of the center of mass with respect to the base of support can measure balance. Without a motion capture system for identifying center of mass location, force plates can be used to estimate the center of mass motion. The projection of the center of mass onto the ground (center of gravity) can be used for several measures of balance and measured by double integration of force plate data (Zatsiorsky and King 1998). The problem with using the center of gravity is that it doesn’t account for three-dimensional motion of the center of mass, which can affect balance. Center of pressure alone (collected by force plate or pressure mat) has also been used to estimate gait balance. The center of gravity is encompassed and controlled by the center of pressure and ground reaction force during steady state gait, so center of pressure will always overestimate the actual motion of the center mass. Outside of this level of equipment sophistication, spatiotemporal measures of movement have been used in the past as measures of balance in gait but are several levels away from being true measures of balance. Research into balance assessment during gait following concussions began from several groups in the early 2000s. Mayo Clinic researchers used clinical assessments, standing posturography (with the sensory organization test), and gait analysis to compare assessment techniques following traumatic brain injury (TBI) (Basford et al. 2003; Kaufman et al. 2006). They demonstrated a high correlation between physical impairments described by the TBI participants and reduced scores on the sensory organization test designed to isolate vestibular input. Physical disability scores also correlated with reduced center of mass motion in the anterior/posterior direction. Functional disability index correlated with increased center of mass motion in the mediolateral direction. TBI participants also showed reduced gait A/P motion and increased gait M/L motion compared to healthy controls. This

Concussion Assessment During Gait

9

analysis indicates the significance of a gait analysis in assessments of perceived impairments in a more severe TBI group. For the more mildly concussed adult, a simple gait test doesn’t clearly discern persistent balance deficits. Within 48 h of a concussive event, adults will display a slowed gait, but no indications of balance deficits (Parker et al. 2005). By a week post-concussion, single-task gait is indistinguishable to healthy adults just like typical neuropsychological tests (Parker et al. 2006). Concussed adolescents however are more likely to present with balance deficits in single-task gait (Howell et al. 2013b). It is important to note that high-impact athletes that may frequently encounter “subconcussive” blows to the head could present gait balance deficits even without a medical diagnosis of concussion (Parker et al. 2008). To accurately detect persistent balance symptoms in a clinical setting, it is important to consider baseline performance of your patient. In research, baseline information is ideal, but at minimum this previous research indicates that athletic participation is important to consider in group comparisons.

Functional Gait Assessment Physical obstacles that must be negotiated during daily gait include curbs, stairs, unstable surfaces, traffic, and a variety of other obstacles. Our ability to reorient our attention to these particular obstacles factors into gait performance (Catena et al. 2009b). Concussed individuals demonstrate deficits in spatially orienting attention in both auditory and visual tasks (Breton et al. 1991; Cremona-Meteyard et al. 1992; Daffner et al. 2000; Halterman et al. 2006). In this process, we must disengage, shift, and reengage attention (Posner 1980) through unique neuronal pathways. Broad posterior parietal lobe damage has been linked to disengagement of attention (Posner et al. 1984). The superior parietal gyrus is linked to shifting attention (Vandenberghe et al. 2001). The intraparietal sulcus is involved in shifting and refocusing attention (Yantis et al. 2002) but also the superior colliculus and lateral pulvinar when distractions are present (Posner and Petersen 1990). Obstacle avoidance tasks are a typical functional activity added to gait assessment to increase the balance complexity and make the task more indicative of everyday hazards to injury. The complexity of the obstacle crossing task can be modulated to the expected ability of your population by making the task more or less physically demanding (Chou et al. 2004) or more perceptually demanding (Baker and Cinelli 2014). Compared to clinical assessment techniques of balance, such as the Berg Balance Test, an obstacle crossing task was better in distinguishing TBI individuals from healthy controls with slower gait velocities, increased obstacle clearance, and decreased stride lengths indicating the TBI group adopted a cautious gait during obstacle crossing (McFadyen et al. 2003). Others have shown similar results from an obstacle crossing task (Fait et al. 2013; Martini et al. 2011; Vallee et al. 2006). Measures of whole body center of mass motion of TBI patients during the crossing of several different obstacle heights show that participants with TBI and healthy controls have similar gait patterns during unobstructed walking, indicating that

10

R.D. Catena and K.J. Hildenbrand

normal level walking may not be as sensitive in detecting long-term changes in dynamic balance (Chou et al. 2004). On the other hand, obstructed walking resulted in slowed gait velocities and shorter stride lengths (indicating more cautious gait) and increased mediolateral swaying motion (indicating a lack of balance control) 2 years after a TBI (Chou et al. 2004). The balance effects of an obstacle crossing during gait are essentially equivalent between more mildly concussed individuals within 48 h of injury and healthy individuals (Catena et al. 2007a), as both groups tend to be taxed with the challenge of maintaining balance during obstacle crossing depending on the obstacle height. However, the potential for a trip is higher after a recent concussion (indicated by lower foot clearances and higher trip rates) for individuals that also have deficits in spatially orienting attention (Catena et al. 2009b). Similar to long-term performance in TBI individuals, mildly concussed individuals adopt a more conservative obstacle crossing strategy (indicated both in balance and obstacle clearance measures) as concussion symptoms subside several weeks following injury compared to their 48 h performance (Catena et al. 2009a) and compared to healthy individuals (Catena et al. 2009a; Sambasivan et al. 2015). On the other hand, balance deficits are more likely to be elicited by gait tasks that require a cognitive reaction to a suddenly presented perturbation (Powers et al. 2014).

Dual-Task Gait Assessment While broad neuropsychological tests have become the standard method for assessing persistent concussive symptoms for a clinical return-to-activity decision (Resch et al. 2013), it is important to note that widely used neuropsychological clinical tests don’t measure all cognitive deficits to the same degree (Choe and Giza 2015) and broad tests of cognition don’t always present the same findings, as there are specific tests that focus on particular cognitive components (Keightley et al. 2009). Nevertheless, there is the potential to refine current cognitive testing to become an even better measure of persistent symptoms. In doing so, the chance that a patient is involved in another deleterious subsequent concussive event is reduced. One area to improve current testing is to focus on enhancing testing methodologies that correlate with (to potentially predict) motor performance as concussion-induced motor deficiencies could result in subsequent injuries (Brooks et al. 2016; Herman et al. 2015). Unlike many neurological pathologies, concussions don’t present any consistently localized cognitive symptoms. Instead, axonal injury is diffuse, and so are cognitive symptoms. Cognitive processing distribution throughout the brain obviously leads to questions about the probability of any one component affected by a single biomechanical force with an intricate direction, magnitude, and point of application. Cognitive deficits could include (but not necessarily limited to or necessarily include in specific individuals) executive dysfunction, slower reaction times, decreased focus, reduced working memory, reduced attention capacity, and inability to shift attention.

Concussion Assessment During Gait

11

Gait is not an automated task in which no attentional resources are needed. Decreased attention capacity has been eluded to as a major determinant in reduced gait performance following a concussion (Catena et al. 2011). No matter the general theory of divided attention to which you prescribe, there is an abundance of evidence to suggest an interaction between gait performance and cognition through attention in healthy individuals (de Bruin and Schmidt 2010; Hegeman et al. 2012; Lajoie et al. 1993; Szturm et al. 2013). Accomplishing a dual task, with reasonable success in both simultaneously performed tasks, is even less automatic when challenged by a deficit that can affect performance of either task (Brown et al. 1999; Vaportzis et al. 2015; Yardley et al. 2001). Concussion, directly affecting cognition and neurophysiology, challenges an individual to complete both simple and more complex dualtask scenarios (Bernstein 2002; De Monte et al. 2005; Tapper et al. 2016; Vilkki et al. 1996). Dual-task gait research has provided even more scientific evidence of divided attention deficits. Executive control over cognitive processes allows individuals to achieve goals by planning, focusing, and coordinating actions. Executive dysfunction has been consistently reported as deficient following concussion (Hart et al. 2005; Howell et al. 2013a; Moore et al. 2016; Serino et al. 2006; Tapper et al. 2016). In particular, sustained attention [primarily controlled in the right frontal areas (Posner and Petersen 1990; Sturm et al. 1999; Wang et al. 2005; Wilkins et al. 1987)] on gait performance is important for populations at risk of fall, injury, or re-injury due to a fall. This is particularly important in gait when balance has been compromised. Sustained attention does not seem to be deficient shortly after concussion when balance is compromised (Halterman et al. 2006; van Donkelaar et al. 2005, 2006), but there is some research that indicates a positive relationship between concussion and lapses in attention long after a reported concussion occurred (Killgore et al. 2016; Pontifex et al. 2012). In sustaining attention, executive control involves resolving conflicting information. Concussed individuals experience conflict resolution deficits up to a month post-injury (Chan 2002; Chan et al. 2003; Halterman et al. 2006; Larson et al. 2011; Moore et al. 2014). The anterior cingulate cortex seems to be primarily responsible for conflict resolution in such tasks (Posner and Rothbart 1998; Swick and Jovanovic 2002) and more specifically the mid-dorsal region (Swick and Jovanovic 2002). The dorsal prefrontal cortex has also been linked to the actual selection response in more difficult tasks (MacDonald et al. 2000). Including a cognitive component to balance tests is one alternative to provide increased task complexity to tease out milder symptoms or concussion symptoms over longer duration. Distractions clearly play a role when assessing balance deficits following concussion (Rahn et al. 2015). Cognitive performance is correlated to balance performance following concussion (Alsalaheen et al. 2016). And through attention, cognitive and balance performance interacts to diminish the performance of either, or both, following a concussion (Catena et al. 2007a). Immediate balance deficits are commonly observed in a dual-task paradigm (Sosnoff et al. 2008). Month-long dual-task balance deficits are occasionally evidenced in the literature as well (Dorman et al. 2015). Cognition may also interact with other motor

12

R.D. Catena and K.J. Hildenbrand

components, or even specific motor components used in balance, following concussion (Brown et al. 2015). Gait with a simultaneously performed cognitive task (dual task) has become the functional paradigm of interest in the concussion research over the last decade. These types of paradigms have been described as most similar to real-world conditions (Cock et al. 2003; Weerdesteyn et al. 2003) when we are often performing cognitive processing along with gait. In performing a dual-task analysis, the cognitive task can be modulated to involve or exclude particular sensory and cognitive tracts along with modulation of task complexity to fit your population. This is on top of the modulation that can be made to the gait task as described for obstacle crossing above. As such, there are a wide variety of paradigms described throughout the research literature similar to the wide variety of dual-task situations faced daily. Consider that concussion is a diffuse injury that can affect multiple areas of the brain, when picking the correct dual-task paradigm. Differences between TBI patients and healthy individuals are mixed when combining obstacle avoidance with a cognitive secondary task compared to just singletask obstacle crossing (Chiu et al. 2013; Martini et al. 2011; McFadyen et al. 2009; Vallee et al. 2006). Cognitive tasks may interfere with motor performance through peripheral sensory distraction or through central nervous system attention division in dual-task paradigms. Peripheral sensory distractions, for example, when visual cognition tracts are tasked simultaneously with an inherently visual motor task like obstacle crossing, typically want to be avoided as they are a challenge regardless of injury or injury severity (Fait et al. 2013). Along with picking the correct mode of cognitive task, it is important to consider the correct amplitude of cognitive complexity. While a visual cognition and obstacle crossing task may present as too complex for even healthy individuals, some tasks like a simple reaction time test during gait may present as too simple for mild concussion patients (Catena et al. 2007b). A continuous choice mental task has typically shown to be best at discriminating between symptomatic and asymptomatic individuals. A dual-task paradigm that includes steady-state gait and a variety of continuous mental tasks results in few changes for healthy individuals but reduced performance on cognitive tasks and increased spatial-temporal gait variability for severe TBI patients, similar to individuals after stroke and subarachnoid hemorrhage (Haggard et al. 2000). Following mild TBI, spatial-temporal variables only seem to be sensitive to conservative control of dual-task gait immediate after injury (Parker et al. 2005) and don’t seem to be sensitive enough to detect any lingering single-task gait performance deficits several weeks after concussion (Howell et al. 2013b; Parker et al. 2006; Sambasivan et al. 2015). However, continued balance deficits measured by center of mass motion can still be detected several weeks following a concussion using a dual-task paradigm (Parker et al. 2006). There is evidence to suggest that balance deficits are even more apparent following an adolescent concussion (Howell et al. 2015a) and can be prolonged by a return to activity too soon (Howell et al. 2015b, c; Parker et al. 2008).

Concussion Assessment During Gait

13

Future Directions Published research is skewed toward positive results. It is not clear how likely it is for a concussed individual to present with deficits in a gait assessment. There is immediate need for epidemiological research to uncover how likely concussions are to cause gait deficits. This research could also account for other standard assessment techniques and, in doing so, better inform clinicians about a comprehensive assessment for concussion management. Ideally clinicians could accurately measure all cognitive and motor deficits following a concussion both quickly and cost-effectively. Unlike cognitive testing, gait assessment is neither quick nor cheap. Individual-specific results are required in a clinical setting, so baseline information and multiple testing post-concussion are crucial for diagnostic comparisons. Future research needs to advance gait assessment techniques so that they can be implemented in a quick and cost-effective manner. Mobile technology, through inertial measurement units (IMUs) or force platforms, could provide a route as technology advances for more cost-effective precise measurement advances. Research is also exploring augmented reality, which may refine our ability to provide more realistic dual-task scenarios during gait in the lab or in the clinic. Simultaneously, researchers need to direct efforts toward correlations of attention components to gait performance to refine dual-task paradigms. Concussions are a neurophysiological phenomenon currently tested via cognitive assessment, but motor deficits are often as crucial to a return to normal activity as cognitive deficits as they result in subsequent injuries (Brooks et al. 2016; Herman et al. 2015). If research advances in predicting gait performance from cognitive assessment results, more costly and time-consuming motor tests may be avoided altogether, but as of now, the perfect cognitive test to make this motor performance prediction has yet to be created. A team approach is crucial in both developing new measurement techniques and refining current techniques. The partnership between a clinician and a gait researcher in writing this chapter highlighted to us the importance for both to be equally involved in future collaborative research into concussion assessment during gait. One such difficulty is the use of terminology between the fields and roles. Clinicians have abandoned the idea of “grading” concussions or labeling them as mild versus severe, while researchers designing experiments may continue to divide subjects into those suffering from “mild” and “severe” concussions. This difference in terminology can make translation of research results to clinical practice more difficult. Work must continue with both sides collaborating together in a common language with the patients’ health and well-being as the focus.

References Aligene K, Lin E (2013) Vestibular and balance treatment of the concussed athlete. NeuroRehabilitation 32:543–553. doi:10.3233/nre-130876

14

R.D. Catena and K.J. Hildenbrand

Alsalaheen BA et al (2010) Vestibular rehabilitation for dizziness and balance disorders after concussion. J Neuro Phys Ther 34:87–93. doi:10.1097/NPT.0b013e3181dde568 Alsalaheen BA, Whitney SL, Marchetti GF, Furman JM, Kontos AP, Collins MW, Sparto PJ (2016) Relationship between cognitive assessment and balance measures in adolescents referred for vestibular physical therapy after concussion. Clin J Sport Med 26:46–52. doi:10.1097/ jsm.0000000000000185 Baker CS, Cinelli ME (2014) Visuomotor deficits during locomotion in previously concussed athletes 30 or more days following return to play. Physiol Reports 2 doi: 10.14814/phy2.12252 Basford JR et al (2003) An assessment of gait and balance deficits after traumatic brain injury. Arch Phys Med Rehabil 84:343–349 Bernstein DM (2002) Information processing difficulty long after self-reported concussion. J Int Neuropsychol Soc 8:673–682 Breton F, Pincemaille Y, Tarriere C, Renault B (1991) Event-related potential assessment of attention and the orienting reaction in boxers before and after a fight. Biol Psychol 31:57–71 Brooks MA, Peterson K, Biese K, Sanfilippo J, Heiderscheit BC, Bell DR (2016) Concussion increases odds of sustaining a lower extremity musculoskeletal injury after return to play among collegiate athletes. Am J Sports Med 44:742–747. doi:10.1177/0363546515622387 Brown LA, Shumway-Cook A, Woollacott MH (1999) Attentional demands and postural recovery: the effects of aging. J Gerontol 54A:M165–M171 Brown JA, Dalecki M, Hughes C, Macpherson AK, Sergio LE (2015) Cognitive-motor integration deficits in young adult athletes following concussion. BMC Sports Sci Med Rehabil 7:25. doi:10.1186/s13102-015-0019-4 Bryan MA, Rowhani-Rahbar A, Comstock RD et al (2016) Sports-and recreation-related concussions in US youth. Pediatrics 138(1), e20154635 Cantu RC (1998) Second-impact syndrome. Clin Sport Med 17(1):37–44 Catena RD, van Donkelaar P, Chou LS (2007a) Altered balance control following concussion is better detected with an attention test during gait. Gait Posture 25:406–411 Catena RD, van Donkelaar P, Chou LS (2007b) Cognitive task effects on gait stability following concussion. Exp Brain Res 176:23–31. doi:10.1007/s00221-006-0596-2 Catena RD, van Donkelaar P, Chou L-S (2009a) Different gait tasks distinguish immediate vs. longterm effects of concussion on balance control. J Neuroeng Rehabil 6:25–25 Catena RD, van Donkelaar P, Halterman CI, Chou LS (2009b) Spatial orientation of attention and obstacle avoidance following concussion. Exp Brain Res 194:67–77 Catena RD, van Donkelaar P, Chou LS (2011) The effects of attention capacity on dynamic balance control following concussion. J Neuroeng Rehabil 8:8. doi:10.1186/1743-0003-8-8 Cavanaugh JT, Guskiewicz KM, Stergiou N (2005) A nonlinear dynamic approach for evaluating postural control: new directions for the management of sport-related cerebral concussion. Sports Med 35:935–950 CDC (1997) Sports-related recurrent brain injuries – United States. Morb Mortal Wkly Rep 46:224–227 Chan RC (2002) Attentional deficits in patients with persisting postconcussive complaints: a general deficit or specific component deficit? J Clin Exp Neuropsychol 24:1081–1093 Chan RC, Hoosain R, Lee TM, Fan YW, Fong D (2003) Are there sub-types of attentional deficits in patients with persisting post-concussive symptoms? A cluster analytical study. Brain Injury 17:131–148 Chiu SL, Osternig L, Chou LS (2013) Concussion induces gait inter-joint coordination variability under conditions of divided attention and obstacle crossing. Gait Posture 38:717–722. doi:10.1016/j.gaitpost.2013.03.010 Choe MC, Giza CC (2015) Diagnosis and management of acute concussion. Semin Neurol 35:29–41. doi:10.1055/s-0035-1544243 Chou LS, Kaufman KR, Walker-Rabatin AE, Brey RH, Basford JR (2004) Dynamic instability during obstacle crossing following traumatic brain injury. Gait Posture 20:245–254. doi:10.1016/j.gaitpost.2003.09.007

Concussion Assessment During Gait

15

Cock J, Fordham C, Cockburn J, Haggard P (2003) Who knows best? Awareness of divided attention difficulty in a neurological rehabilitation setting. Brain Inj 17:561–574 Corwin DJ, Wiebe DJ, Zonfrillo MR, Grady MF, Robinson RL, Goodman AM, Master CL (2015) Vestibular deficits following youth concussion. J Pediatr 166:1221–1225. doi:10.1016/j. jpeds.2015.01.039 Covassin T, Elbin RJ III, Stiller-Ostrowski JL et al (2009) Immediate post-concussion assessment and cognitive testing (ImPACT) practices of sports medicine professionals. J Athl Train 44 (6):639–644 Cremona-Meteyard SL, Clark CR, Wright MJ, Geffen GM (1992) Covert orientation of visual attention after closed head injury. Neuropsychologia 30:123–132 Daffner KR et al (2000) The central role of the prefrontal cortex in directing attention to novel events. Brain 123:927–939 De Beaumont L et al (2011) Persistent motor system abnormalities in formerly concussed. J Athl Train 46:234–240 De Beaumont L, Tremblay S, Henry LC, Poirier J, Lassonde M, Theoret H (2013) Motor system alterations in retired former athletes: the role of aging and concussion history. BMC Neurol 13:109. doi:10.1186/1471-2377-13-109 de Bruin ED, Schmidt A (2010) Walking behaviour of healthy elderly: attention should be paid. Behavior Brain Funct 6:59. doi:10.1186/1744-9081-6-59 De Monte VE, Geffen GM, May CR, McFarland K, Heath P, Neralic M (2005) The acute effects of mild traumatic brain injury on finger tapping with and without word repetition. J Clin Exp Neuropsychol 27:224–239. doi:10.1080/13803390490515766 Deshpande N, Patla AE (2005) Dynamic visual-vestibular integration during goal directed human locomotion. Exp Brain Res 166:237–247. doi:10.1007/s00221-005-2364-0 Dorman JC, Valentine VD, Munce TA, Tjarks BJ, Thompson PA, Bergeron MF (2015) Tracking postural stability of young concussion patients using dual-task interference. J Sci Med Sport 18:2–7. doi:10.1016/j.jsams.2013.11.010 Fait P, Swaine B, Cantin JF, Leblond J, McFadyen BJ (2013) Altered integrated locomotor and cognitive function in elite athletes 30 days postconcussion: a preliminary study. J Head Trauma Rehabil 28:293–301. doi:10.1097/HTR.0b013e3182407ace Fife TD, Kalra D (2015) Persistent vertigo and dizziness after mild traumatic brain injury. Ann N Y Acad Sci 1343:97–105. doi:10.1111/nyas.12678 Gao J, Hu J, Buckley T, White K, Hass C (2011) Shannon and Renyi entropies to classify effects of mild traumatic brain injury on postural sway. PLoS One 6, e24446. doi:10.1371/journal. pone.0024446 Geurts AC, Ribbers GM, Knoop JA, Limbeek JV (1996) Identification of static and dynamic postural instability following traumatic brain injury. Arch Phys Med Rehabil 77:639–644 Grady M (2010) Concussion in the adolescent athlete. Curr Probl Pediatr Adolesc Health Care 40 (7):154–169 Haggard P, Cockburn J, Cock J, Fordham C, Wade D (2000) Interference between gait and cognitive tasks in a rehabilitating neurological population. J Neurol Neurosurg Psychiatry 69:479–486 Halterman CI, Langan J, Drew A, Rodriguez E, Osternig LR, Chou LS, van Donkelaar P (2006) Tracking the recovery of visuospatial attention deficits in mild traumatic brain injury. Brain 129:747–753 Hart T, Whyte J, Kim J, Vaccaro M (2005) Executive function and self-awareness of “real-world” behavior and attention deficits following traumatic brain injury. J Head Trauma Rehabil 20:333–347 Hegeman J, Weerdesteyn V, van den Bemt B, Nienhuis B, van Limbeek J, Duysens J (2012) Dualtasking interferes with obstacle avoidance reactions in healthy seniors. Gait Posture 36:236–240. doi:10.1016/j.gaitpost.2012.02.024

16

R.D. Catena and K.J. Hildenbrand

Herman DC, Zaremski JL, Vincent HK, Vincent KR (2015) Effect of neurocognition and concussion on musculoskeletal injury risk. Curr Sports Med Rep 14:194–199. doi:10.1249/ jsr.0000000000000157 Hillier SL, Sharpe MH, Metzer J (1997) Outcomes 5 years post-traumatic brain injury (with further reference to neurophysical impairment and disability). Brain Inj 11:661–675 Howell D, Osternig L, Van Donkelaar P, Mayr U, Chou LS (2013a) Effects of concussion on attention and executive function in adolescents. Med Sci Sports Exerc 45:1030–1037. doi:10.1249/MSS.0b013e3182814595 Howell DR, Osternig LR, Chou LS (2013b) Dual-task effect on gait balance control in adolescents with concussion. Arch Phys Med Rehabil 94:1513–1520. doi:10.1016/j.apmr.2013.04.015 Howell DR, Osternig LR, Chou LS, Chou LS (2015a) Adolescents demonstrate greater gait balance control deficits after concussion than young adults. Am J Sports Med 43:625–632. doi:10.1177/ 0363546514560994 Howell DR, Osternig LR, Chou LS (2015b) Return to activity after concussion affects dual-task gait balance control recovery. Med Sci Sports Exerc 47:673–680. doi:10.1249/ mss.0000000000000462 Howell DR, Osternig LR, Christie AD, Chou LS (2015c) Return to physical activity timing and dual-task gait stability are associated 2 months following concussion. J Head Trauma Rehabil 31 (4):262–268. doi:10.1097/htr.0000000000000176 Kaufman KR, Brey RH, Chou LS, Rabatin A, Brown AW, Basford JR (2006) Comparison of subjective and objective measurements of balance disorders following traumatic brain injury. Med Eng Phys 28:234–239. doi:10.1016/j.medengphy.2005.05.005 Keightley M et al (2009) Paediatric sports-related mild traumatic brain injury. BMJ Case Rep. doi:10.1136/bcr.06.2008.0148 Killgore WD, Singh P, Kipman M, Pisner D, Fridman A, Weber M (2016) Gray matter volume and executive functioning correlate with time since injury following mild traumatic brain injury. Neurosci Lett 612:238–244. doi:10.1016/j.neulet.2015.12.033 Lajoie Y, Teasdale N, Bard C, Fleury M (1993) Attentional demands for static and dynamic equilibrium. Exp Brain Res 97:139–144 Larson MJ, Farrer TJ, Clayson PE (2011) Cognitive control in mild traumatic brain injury: conflict monitoring and conflict adaptation. Int J Psychophysiol 82:69–78. doi:10.1016/j. ijpsycho.2011.02.018 MacDonald AW 3rd, Cohen JD, Stenger VA, Carter CS (2000) Dissociating the role of the dorsolateral prefrontal and anterior cingulate cortex in cognitive control. Science 288:1835–1838 Maki BE, McIlroy WE (1996) Postural control in the older adult. Clin Geriatr Med 12:635–658 Martini DN, Sabin MJ, DePesa SA, Leal EW, Negrete TN, Sosnoff JJ, Broglio SP (2011) The chronic effects of concussion on gait. Arch Phys Med Rehabil 92:585–589. doi:10.1016/j. apmr.2010.11.029 McCrea M et al (2003) Acute effects and recovery time following concussion in collegiate football players: the NCAA Concussion Study. JAMA 290:2556–2563 McCrory P, Meeuwisse W, Johnston K (2009) Consensus statement on concussion in sport 3rd international 3rd international conference on concussion in sport held in Zurich, November 2008. Clin J Sport Med 19(3):185–200 McCrory P, Meeuwisse WH, Aubry M et al (2013) Consensus statement on concussion in sport: the 4th international conference on concussion in sport held in Zurich, November 2012. Brit J of Sport Med 47(5):250–258 McFadyen BJ, Swaine B, Dumas D, Durand A (2003) Residual effects of a traumatic brain injury on locomotor capacity: a first study of spatiotemporal patterns during unobstructed and obstructed walking. J Head Trauma Rehabil 18:512–525 McFadyen BJ, Cantin JF, Swaine B, Duchesneau G, Doyon J, Dumas D, Fait P (2009) Modalityspecific, multitask locomotor deficits persist despite good recovery after a traumatic brain injury. Arch Phys Med Rehabil 90:1596–1606. doi:10.1016/j.apmr.2009.03.010

Concussion Assessment During Gait

17

Moore RD, Hillman CH, Broglio SP (2014) The persistent influence of concussive injuries on cognitive control and neuroelectric function. J Athl Train 49:24–35. doi:10.4085/1062-605049.1.01 Moore DR, Pindus DM, Raine LB, Drollette ES, Scudder MR, Ellemberg D, Hillman CH (2016) The persistent influence of concussion on attention, executive control and neuroelectric function in preadolescent children. Int J Psychophysiol 99:85–95. doi:10.1016/j.ijpsycho.2015.11.010 Mrazik M, Ferrara MS, Peterson CL, Elliott RE, Courson RW, Clanton MD, Hynd GW (2000) Injury severity and neuropsychological and balance outcomes of four college athletes. Brain Inj 14:921–931 Murray NG, Ambati VN, Contreras MM, Salvatore AP, Reed-Jones RJ (2014) Assessment of oculomotor control and balance post-concussion: a preliminary study for a novel approach to concussion management. Brain Inj 28:496–503. doi:10.3109/02699052.2014.887144 Parker TM, Osternig LR, Lee HJ, Donkelaar P, Chou LS (2005) The effect of divided attention on gait stability following concussion. Clin Biomech (Bristol, Avon) 20:389–395. doi:10.1016/j. clinbiomech.2004.12.004 Parker TM, Osternig LR, Van Donkelaar P, Chou LS (2006) Gait stability following concussion. Med Sci Sports Exerc 38:1032–1040. doi:10.1249/01.mss.0000222828.56982.a4 Parker TM, Osternig LR, van Donkelaar P, Chou LS (2008) Balance control during gait in athletes and non-athletes following concussion. Med Eng Phys 30:959–967. doi:10.1016/j. medengphy.2007.12.006 Pavol MJ, Owings TM, Grabiner MD (2002) Body segment inertial parameter estimation for the general population of older adults. J Biomech 35:707–712 Pontifex MB, Broglio SP, Drollette ES, Scudder MR, Johnson CR, O'Connor PM, Hillman CH (2012) The relation of mild traumatic brain injury to chronic lapses of attention. Res Q Exerc Sport 83:553–559. doi:10.1080/02701367.2012.10599252 Posner MI (1980) Orienting of attention. Q J Exp Psychol 32:3–25 Posner MI, Petersen SE (1990) The attention system of the human brain. Annu Rev Neurosci 13:25–42 Posner MI, Rothbart MK (1998) Attention, self-regulation and consciousness. Phil Trans R Soc Lond B Biol Sci 353:1915–1927 Posner MI, Walker JA, Friedrich FJ, Rafal RD (1984) Effects of parietal injury on covert orienting of attention. J Neurosci 4:1863–1874 Powers KC, Kalmar JM, Cinelli ME (2014) Dynamic stability and steering control following a sport-induced concussion. Gait Posture 39:728–732. doi:10.1016/j.gaitpost.2013.10.005 Quatman-Yates CC, Bonnette S, Hugentobler JA, Mede B, Kiefer AW, Kurowski BG, Riley MA (2015) Postconcussion postural sway variability changes in youth: the benefit of structural variability analyses. Pediatr Phys Ther 27:316–327. doi:10.1097/pep.0000000000000193 Rahn C, Munkasy BA, Barry Joyner A, Buckley TA (2015) Sideline performance of the balance error scoring system during a live sporting event clinical. J Sport Med 25:248–253. doi:10.1097/ jsm.0000000000000141 Register-Mihalik JK, Guskiewicz KM, McLeod TC et al (2013) Knowledge, attitude, and concussion-reporting behaviors among high school athletes: a preliminary study. J Athl Train 48(5):645–653 Resch J et al (2013) ImPact test-retest reliability: reliably unreliable? J Athl Train 48:506–511. doi:10.4085/1062-6050-48.3.09 Sambasivan K, Grilli L, Gagnon L (2015) Balance and mobility in clinically recovered children and adolescents after a mild traumatic brain injury. J Pediatr Rehabil Med 8:335–344. doi:10.3233/ prm-150351 Schatz P, Moser R (2011) Current issues in pediatric sports concussion. Clin Neuropsychol 25 (6):1042–1057 Serino A, Ciaramelli E, Di Santantonio A, Malagù S, Servadei F, Làdavas E (2006) Central executive system impairment in traumatic brain injury. Brain Inj 20:23–32

18

R.D. Catena and K.J. Hildenbrand

Sosnoff JJ, Broglio SP, Ferrara MS (2008) Cognitive and motor function are associated following mild traumatic brain injury. Exp Brain Res 187:563–571. doi:10.1007/s00221-008-1324-x Sosnoff JJ, Broglio SP, Shin S, Ferrara MS (2011) Previous mild traumatic brain injury and postural-control dynamics. J Athl Train 46:85–91. doi:10.4085/1062-6050-46.1.85 Sturm W et al (1999) Functional anatomy of intrinsic alertness: evidence for a fronto-parietalthalamic-brainstem network in the right hemisphere. Neuropsychologia 37:797–805 Swick D, Jovanovic J (2002) Anterior cingulate cortex and the stroop task: neuropsychological evidence for topographic specificity. Neuropsychologia 40:1240–1253 Szturm T, Maharjan P, Marotta JJ, Shay B, Shrestha S, Sakhalkar V (2013) The interacting effect of cognitive and motor task demands on performance of gait, balance and cognition in young adults. Gait Posture 38:596–602. doi:10.1016/j.gaitpost.2013.02.004 Tapper A, Gonzalez D, Roy E, Niechwiej-Szwedo E (2016) Executive function deficits in team sport athletes with a history of concussion revealed by a visual-auditory dual task paradigm. J Sports Sci 1–10 doi:10.1080/02640414.2016.1161214 Theye F, Mueller KA (2004) “Heads up”: concussions in high school sports. Clin Med Res 2(3):165–171 Vallee M, McFadyen BJ, Swaine B, Doyon J, Cantin JF, Dumas D (2006) Effects of environmental demands on locomotion after traumatic brain injury. Arch Phys Med Rehabil 87:806–813. doi:10.1016/j.apmr.2006.02.031 van Donkelaar P, Langan J, Rodriguez E, Drew A, Halterman C, Osternig LR, Chou L-S (2005) Attentional deficits in concussion. Brain Inj 19:1031–1039 van Donkelaar P, Osternig LR, Chou L-S (2006) Attentional and biomechanical deficits interact after mild traumatic brain injury. Exerc Sport Sci Rev 34:77–82 Vandenberghe R, Gitelman DR, Parrish TB, Mesulam MM (2001) Functional specificity of superior parietal mediation of spatial shifting. Neuroimage 14:661–673 Vaportzis E, Georgiou-Karistianis N, Churchyard A, Stout JC (2015) Effects of task difficulty during dual-task circle tracing in Huntington's disease. J Neurol 262:268–276. doi:10.1007/ s00415-014-7563-9 Vilkki J, Virtanen S, Surma-Aho O, Servo A (1996) Dual task performance after focal cerebral lesions and closed head injuries. Neuropsychologia 34:1051–1056 Wang J, Rao H, Wetmore GS, Furlan PM, Korczykowski M, Dinges DF, Detre JA (2005) Perfusion functional MRI reveals cerebral blood flow pattern under psychological stress. Proc Natl Acad Sci U S A 102:17804–17809 Weerdesteyn V, Schillings AM, van Galen GP, Duysens J (2003) Distraction affects the performance of obstacle avoidance during walking. J Mot Behav 35:53–63 Wilkins AJ, Shallice T, McCarthy R (1987) Frontal lesions and sustained attention. Neuropsychologia 25:359–365 Yantis S, Schwarzbach J, Serences JT, Carlson RL, Steinmetz MA, Pekar JJ, Courtney SM (2002) Transient neural activity in human parietal cortex during spatial attention shifts. Nat Neurosci 5:995–1002 Yardley L, Gardner M, Bronstein A, Davies R, Buckwell D, Luxon L (2001) Interference between postural control and mental task performance in patients with vestibular disorder and healthy controls. J Neurol Neurosurg Psychiatry 71:48–52 Zatsiorsky VM, King DL (1998) An algorithm for determining gravity line location from posturographic recordings. J Biomech 31:161–164 Zhang L, Abreu BC, Gonzales V, Huddleston N, Ottenbacher KJ (2002) The effect of predictable and unpredictable motor tasks on postural control after traumatic brain injury. NeuroRehabilitation 17:225–230

Functional Dystonias Jessica Pruente and Deborah Gaebler-Spira

Abstract

Dystonia is a movement disorder characterized by involuntary muscle contractions resulting in twisting movements and abnormal postures. This movement disorder can cause significant impairments during functional tasks including gait, mobility, and reaching. Dystonia must be distinguished from other hypertonic movement disorders, spasticity, and rigidity, to guide treatment and management options. Several clinical measurement scales have been developed to identify dystonia and rate its severity; these can be easily adapted for use in motion analysis labs. Additionally, the use of motion analysis kinetics, kinematics, and surface EMG has increased in use for monitoring dystonia. This chapter will discuss the common etiologies of dystonia, clinical scales used for diagnosis and efficacy of treatments, and the role of instrumented gait analysis, kinetics, and kinematics in the evaluation of dystonia. Keywords

Barry-Albright Dystonia Scale • Co-contraction • Dystonia • Gait analysis • Hyperkinetic • Hypertonia asessment tool • Involuntary movements • Kinematics • Motion analysis • Movement disorder • Overflow muscle activation • Spasms

Electronic supplementary material: Supplementary material is available in the online version of this chapter at ▶ 10.1007/978-3-319-30808-1_70-1. Videos can also be accessed at http://www. springerimages.com/videos/. J. Pruente (*) • D. Gaebler-Spira Shirley Ryan Ability Lab, Chicago, IL, USA e-mail: [email protected]; [email protected]

# Springer International Publishing AG 2016 B. Müller, S.I. Wolf (eds.), Handbook of Human Motion, DOI 10.1007/978-3-319-30808-1_70-1

1

2

J. Pruente and D. Gaebler-Spira

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 State of the Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Pathology and Functional Effects of Dystonia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Etiologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Clinical Scales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Kinetics/Kinematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Summarizing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

Introduction Dystonia is a movement disorder characterized by involuntary muscle contractions resulting in twisting movements and abnormal postures. Dystonia is often initiated or worsened by volitional activity and is associated with overflow muscle activation. Dystonia is highly variable but can impair a person’s function in walking, hand manipulation, and speech due to reduced quality and speed of voluntary movement (van der Kamp et al. 1989; Agostino et al. 1992; Inzelberg et al. 1995; Curra et al. 2000; Gregori et al. 2008). Utilization of motion analysis for gait or arm function is particularly helpful because dystonia is characterized as a movement disorder and also impacts muscle tone (Sanger 2015). The motion analysis laboratory captures data from various perspectives; this fosters integration of a methodical clinical examination, video review of gait, kinetics and kinematics, motion trajectories, muscle activation patterns, and co-contractions. In this chapter, we will review the definition of dystonia, the clinical tests discriminating dystonia from spasticity, severity rating scales for dystonia, etiologies, the concepts of co-contraction and dynamic motor control, as well as the kinetic and kinematic characteristics of functional dystonias. Though dystonia can be easily and commonly recognized during the review of motion analysis, little literature exists to provide a conclusive approach to reporting dystonia during motion analysis. Recommendations for inclusion of clinical scales, close observation of the video, as well as known gait characteristics of dystonia will be discussed for consideration. Motion analysis provides a unique quantifiable understanding of both motor control and the impact of dystonia on function.

State of the Art Information obtained through motion analysis provides insights into the pathology and functional effects of dystonia. In children with cerebral palsy, the use of motion analysis is frequent for planning interventions based on gait deviations or functional improvements for hand use. Though no published motion analysis criteria for

Functional Dystonias

3

dystonia exists, common findings reported between children and adults include a high variability in step length and base of support. Surface electromyography data likewise suggests diagnosis of dystonia through co-contraction, overflow muscle activity, and increased muscle activity during volitional tasks. Further advancements in the use of formal motion analysis are needed to improve confidence in diagnosis and treatment, either surgical or medical, for dystonia.

Pathology and Functional Effects of Dystonia Definitions Dystonic syndromes are some of the more commonly observed movement disorders with an estimated prevalence of 2–50 per million in early-onset dystonia and 30–7,320 per million in late-onset dystonias (Carecchio et al. 2015). Secondary dystonia has been increasingly recognized in children with cerebral palsy. Since proper treatment of dystonia is available, it is crucial to be able to identify the movement disorder and the confounding hypertonic muscle abnormalities frequently coexisting in children with cerebral palsy. Dystonia is defined as a movement disorder in which involuntary sustained or intermittent muscle contractions cause twisting and repetitive movements, abnormal postures, or both. Dystonia falls into the category of hypertonic and hyperkinetic movement disorders, see Fig. 1. It is frequently worsened with voluntary activity and may be reduced or absent when at rest. Various organization taxonomies have been proposed to categorize movement disorders. One such schema breaks disorders into those of tone, inhibition, execution, and planning of movements (Sanger 2003). Dystonia is therefore both a disorder of tone and of execution. Dystonia can have many clinical expressions; this includes dystonic spasms, tremor, repetitive movements, abnormal fixed postures, and hypertonia. An important distinction is delineating the difference between dystonia, spasticity, and rigidity. Spasticity refers to hypertonia in which resistance to movement increases with speed of stretch and varies with direction of movement. Rigidity is resistance to movement that does not depend upon movement speed or direction. Dystonia is more complex to define and includes resistance to joint movements that do not depend on speed, co-contraction of agonist, and antagonist muscle groups and is worsened by voluntary activity. Overflow movements may also suggest the presence of dystonia. Dystonia quite often presents in similar patterns despite a wide range of diagnoses and etiologies. One example is neck or back extension, variable scoliosis or kyphosis, ulnar wrist deviation and flexion, and finger flexion or extension (Sanger 2004). In the lower extremity, dystonic posturing often includes knee extension, plantarflexion, and inversion of the foot. Common dystonic syndromes include hand cramps, blepharospasm, torticollis, opisthotonus, and more generalized dystonias involving multiple extremities and the torso.

4

J. Pruente and D. Gaebler-Spira

Hypertonic

Hyperkinetic

Negative

Spasticity

Chorea

Weakness

Dystonia

Dystonia

Selective Motor Control

Rigidity

Athetosis

Ataxia

Myoclonus

Dyspraxia

Tremor Bradykinesia Tics Balance Stereotyples

Fig. 1 Positive and negative symptoms of hypertonia. Dystonia is associated with both hypertonic and hyperkinetic movements

Treatment and management of these various hyperkinetic and hypertonic movement disorders is very much dependent upon specific type. Certain medications, such as trihexyphenidyl, and surgical procedures, such as deep brain stimulation, have better success rates with dystonia.

Etiologies Early-onset dystonias refer to presentation prior to age 26. The dystonic movement disorders can be further divided into subsets of inherited and acquired dystonias. DYT1, also known as Oppenheim’s dystonia, is the most common of the inherited early-onset dystonias, with a worldwide frequency of 1:160,000 cases (Carecchio et al. 2015; de Carvalho Aguiar and Ozelius 2002). A GAG deletion in the TOR1A gene was identified as the cause of DYT1 dystonia. DYT1 is an autosomal dominant trait with reduced penetrance of about 30%. Clinical presentation typically occurs by age 12 with involvement of a single extremity; this generalizes to the remainder of extremities and the trunk in about 50% of cases within a few years. Treatment remains symptomatic with oral medications and in select refractory cases with deep brain stimulation. DYT 2 is an autosomal recessive trait of unknown genetic etiology. It is an early-onset dystonia; DYT6 dystonia is also an autosomal dominant inherited form of dystonia. This involves a variety of different mutations in the THAP1 gene. DYT6 dystonia also presents in childhood through adolescence.

Functional Dystonias

5

Clinical presentation typically involves oromandibular, craniocervical, or laryngeal dystonias. The second major subset of dystonias are secondary or acquired dystonias. The most common etiology is related to acquired brain injuries. Dystonia is second only to post-traumatic tremor in the movement disorder sequelae of severe traumatic brain injury (Krauss and Jankovic 2002; Sanger 2015). It is thought to be related to involvement of the basal ganglia, caudate, and putamen, and there is some evidence to suggest cerebellar and thalamic involvement (Skogseid 2014). Traumatic and hypoxic brain injuries can lead to acquired dystonias in adults. The PAID syndrome, or paroxysmal autonomic instability and dystonia syndrome, occurs after hypoxic injuries, such as those sustained in cardiac arrest. In children, the most common cause of acquired dystonia is cerebral palsy, and in fact, dyskinetic cerebral palsy is the second largest CP type (Monbaliu et al. 2016). Dystonia may be related to hypoxic ischemic injuries or prematurity in these cases. Other etiologies include autoimmune disorders such as anti-N-methyl-D-aspartate receptor (NMDAR) encephalitis and autoimmune basal ganglia encephalitis (van Egmond et al. 2015). Dystonia can also be induced by a variety of drugs and toxins including levodopa, dopamine antagonists, selective serotonin reuptake inhibitors, buspirone, cocaine, monoamine oxidase inhibitors, carbon monoxide, manganese, and cyanide (Phukan et al. 2011). Inborn errors of metabolism may also cause dystonic movement disorders; if not treated early, this can be found in organic acidurias, glut-1 deficiency, and lysosomal storage disorders. A third category of dystonia refers to dystonia-plus syndromes. Dystonia-plus syndromes are dystonic syndromes that also include other neurological complaints most commonly parkinsonism or myoclonus. Examples of this include doparesponsive dystonia, rapid-onset dystonia parkinsonism, and myoclonus dystonia syndrome. These probably represent just 5% of childhood-onset dystonias.

Clinical Scales Appropriate diagnosis of dystonia can be difficult to achieve owing to the presence of coexisting movement disorders and spasticity in many of those affected. There are several clinical scales in use today to aid in the measurement of dystonia. Clinical scales were first utilized for primary dystonias. The Burke Fahn Marsden (BFM) scale, published in 1985, was the first utilized scale for rating generalized dystonia, hemidystonia, and segmental dystonia (Krystkowiak et al. 2007). The BFM dystonia scale evaluates the presence of dystonia in nine body regions, eyes, mouth, neck trunk, and right and left arms and legs. This scale identifies both provoking factors and severity factors in each region and includes a separate disability rating. Scores range from 0 to 120, with higher scores indicating more severe dystonia. This scale has demonstrated good inter- and intra-rater reliability. This scale remains in use in both clinical and research setting to track dystonia over time and in response to treatments.

6

J. Pruente and D. Gaebler-Spira

As secondary dystonias and available treatments have evolved, the need to discriminate dystonia from other hypertonic syndromes led to the development of additional measurement scales. The Barry–Albright Dystonia Scale was developed to improve reliability and measurement of secondary dystonias (Barry et al. 1999; Pavone et al. 2013). Assessment of dystonia in this population can be more difficult secondary to presence of coexisting brain injury or cognitive impairment. This scale consists of assessing secondary dystonia in eight regions, eyes, mouth, neck, trunk, and the four extremities. Scores range from 0 to 32, and higher scores indicate more severe dystonia. Finally, the Unified Dystonia Rating Scale measures severity and duration of dystonia in 14 body regions, eyes and upper face, lower face, jaw and tongue, larynx, neck, trunk, shoulder/proximal arm, distal arm/hand, proximal leg, and distal leg/foot. This scale has high internal consistency and inter-rater reliability in primary dystonias (Goetz et al. 2008; Monbaliu et al. 2010). In children, assessment of movement disorders can be more difficult owing to difficulty with exam instructions. Several scales have been developed to address this group. The hypertonia assessment tool or HAT is one such scale. This scale was developed to help differentiate the different types of hypertonia; dystonia, spasticity, and rigidity (Jethwa et al. 2010; Pavone et al. 2013). The HAT is a seven-item clinical assessment tool designed for children aged 4–19. It involves a binary rating scale for three presentations of dystonia, two of spasticity and two of rigidity. This scale has fair inter-rater reliability and moderate test-retest reliability for dystonia. The Movement Disorder-Childhood Rating Scale was likewise designed for clinical evaluation of movement disorders in ages 4–19 (Battini et al. 2015). The movement disorder portion of this scale assesses dystonia at rest and during functional tasks in the eye/orbital region, face, neck, perioral, trunk, upper extremities, and lower extremities. The childhood rating scale includes assessment of motor function (including head control, sitting position, standing position, walking, reaching, grasping, and handwriting), oral/verbal function, self-care, and attention/alertness. This has been demonstrated to have high inter-rater reliability and high internal consistency. Dystonia can be difficult to measure accurately given its variable presentation, dynamic changes, and coexistence with other movement disorders. Newer therapeutic options including intrathecal baclofen and deep brain stimulation have increased the widespread utilization of these scales in order to ensure appropriate patient selection and to monitor response to treatment. New scales continue to be developed in response to the need for improved measurement of dystonia in response to treatments. The dyskinesia impairment scale was more recently developed and includes two subscales, for dystonia and choreoathetosis, and evaluates both duration and amplitude (Elegast Monbaliu et al. 2012). This was demonstrated to have good inter-rater reliability and internal consistency in initial studies (Fig. 2).

Functional Dystonias

7 HYPERTONIA ASSESSMENT TOOL (HAT)

HAT ITEM 1. Increased involuntary movements/postures of the designated limb with tactile stimulus of a distal body part

SCORING GUIDELINES (0=negative or 1=positive)

SCORE 0=negative 1=positive (circle score)

0= No involuntary movements or postures observed

0

1= Involuntary movements or postures observed

1

0= No involuntary movements or postures 2. Increased involuntary movements/postures with purposeful observed movements of a distal body part 1= Involuntary movements or postures observed

0

3. Velocity dependent resistance to stretch

4. Presence of a spastic catch

TYPE OF HYPERTONIA

1

0= No increased resistance noticed during fast stretch compared to slow stretch

0

1= Increased resistance noticed during fast stretch compared to slow stretch

1

0= No spastic catch noted

0

1= Spastic catch noted

1

0= Equal resistance not noted with bi-directional 5. Equal resistance to passive stretch movement during bi-directional movement of a 1= Equal resistance noted with bi-directional joint movement 0= No increased tone noted with purposeful 6. Increased tone with movement of movement a distal body part 1= Greater tone noted with purposeful movement 0= Limb returns (partially or fully) to original 7. Maintenance of limb position position after passive movement 1= Limb remains in final position of stretch

DYSTONIA

DYSTONIA

SPASTICITY

SPASTICITY

0 1

RIGIDITY

0 1

DYSTONIA

0 1

RIGIDITY

Fig. 2 The hypertonia assessment tool (Fehlings et al. 2010)

Kinetics/Kinematics While clinical scales can aid the clinician in diagnosis of dystonia, objective measurement using gait labs, kinetics, and kinematics also has an important role in dystonia assessment and determining therapeutic effect. Animal models were first used to characterize gait changes associated with dystonia. In a rat model of dystonia, decreased walking speed, increased hind limb spread, and increased step length ratio variability were consistent with dystonia. Of these, step length ratio variability was the most sensitive for detecting dystonia (Chaniary et al. 2009). Further data in rat models of dystonia and ataxia also demonstrate coactivation of muscle and similar changes to gait parameters (Scholle et al. 2010). Clinical studies in pediatrics are somewhat limited with the largest amount of data pertaining to upper extremity dystonias. Kinematics data have been collected in children with cerebral palsy during reach and grasp activities. Those with dystonia have slower movements during reaching and decreased coordination of movement. Different kinematics were obtained in the three CP subtypes, spastic, dystonic, and mixed, and may be useful in distinguishing between movement disorders (Butler et al. 2010; Kukke et al. 2015). The kinematic dystonia measure collects kinematic data during an upper extremity finger tapping task has been demonstrated to correlate with the Barry–Albright Dystonia Scale and may improve quantitative assessment of dystonia (Kawamura et al. 2012). A rest-tap test involving repeated tapping in one limb during assessment of the contralateral limb for dystonia

8

J. Pruente and D. Gaebler-Spira

Fig. 3 Dystonia kinematics. Displacement of the shoulder, elbow, and wrist joints over time during the rest-tap paradigm. (a) represents a control subject. (b) represents a cerebral palsy subject with low dystonia. (c) represents a cerebral palsy subject with high dystonia (Figure reproduced with permission from Pediatric Neurology, Can Spasticity and Dystonia Be Independently Measured in Cerebral Palsy (Gordon et al. 2006))

demonstrates a different kinematic pattern compared to controls. This is characterized by overflow in the contralateral limb and increased joint excursion, see Fig. 3 (Gordon et al. 2006). Gait analysis in children that includes surface EMG suggests that co-contraction and increased resistance to external motion and slow velocities are present with dystonia (Lebiedowska et al. 2004). Dystonia presents in EMG data as an increased number of muscles responding during volitional activity but did not respond at rest or during quick stretch. This is in contrast to spasticity, which demonstrated brief bursts of EMG activity during quick stretch, but low activity levels during rest or volitional activity. A specific electromyographic protocol suggests that lower extremity assessment during rest, quick stretch, and five volitional tasks can detect different muscle activation patterns in spasticity and dystonia and might provide objective data for diagnosis (Beattie et al. 2016). Please see the example videos for two demonstrations of dystonic gait (online only). Figure 4 demonstrates EMG tracings during motion analysis that can help to distinguish dystonia from spasticity (Reference: personal communication with Dr. Jules Becher). Dystonia can be inferred by careful analysis of EMG patterns as well as by kinematic evidence of posturing of extremities during motion analysis. With respect to the electromyographic patterns, typically in dystonia, the raw or rectified tracing displays a peak and valley pattern or an inconsistent pattern of activation. This is in contrast to the muscle patterns of spasticity that are more consistent with the amplitude of the electromyographic signal remaining constant (see Fig. 4b). In adults, head and neck kinematics is useful for accurate description of severity of cervical dystonia as a baseline for treatment effects. Fastrack allows the extraction of kinematic information (i.e., posture, angular range of motion, movement times, angular velocity) about head deviations (Galardi et al. 2003; Jordan et al. 2000). As

Functional Dystonias

a 2000

9

L Anterior Tibialis

mV

–2000

L Gastrocnemius 2000

mV

–2000

L Rectus Femoris 2000

mV –2000

b

L Anterior Tibialis

1000 mV

–1000 1000

L Gastrocnemius

mV

–1000 1000

mV –1000

Fig. 4 (continued)

L Rectus Femoris

10

J. Pruente and D. Gaebler-Spira

c 1000

L Anterior Tiabialis

mV

–1000

1000

L Gastrocnemius

mV

–1000 1000

L Rectus Femoris

mV

–1000

Fig. 4 EMG patterns obtained during motion analysis using surface EMG pads. (a) tracing shows the anterior tibialis, gastrocnemius, and rectus femoris in a child with mild dystonia. (b) tracing shows the same muscles in a more typical pattern for spasticity. (c) tracing shows an example of coexisting spasticity and dystonia

in dystonic gait, head and neck asymmetry or the symmetry index is increased for rotation and lateral flexion to those with dystonia compared to those unaffected (Boccagni et al. 2008). The use of lower extremity motion analysis for adults with dystonia has been used to evaluate and document effects of treatment for disorders characterized by dystonia. In a patient with dopamine-responsive dystonia, 3D motion analysis accurately documented changes in gait with reduction of dystonia by medication. When the dystonia was reduced, the gait pattern demonstrated an increase in the walking speed, explained by a significant increase in step frequency and length. With improvement of dystonia, the asymmetry decreased, as did the step width. The gait analysis allowed clinicians to quantify the effects of dystonia on gait (Rebour et al. 2015). MAC for an adult patient with DYT-1 before and after DBS determined which involuntary movements were related to postural instability and gait disturbance. Neck and trunk markers add value and allow discrimination of the cervical dystonic posture, on balance or gait. Prior to DBS, posture and gait were asymmetrical and unstable. Functional body balance was controlled by changes of symptoms, with partial corrections of neck and spinal alignments in a static posture. The patient was

Functional Dystonias

11

better able to maintain the stability of center of mass and center of pressure. The neck angles remained abnormal with specific motions during gait compared to the spine while maintaining improved gait. Functional improvements of gait were captured by gait parameters including increasing of cadence (step rate) and walking speed, increased step length, reduction of a wide base, and extension of single support time and symmetry (Nakao 2011). These two case reports illustrate the power of motion analysis in capturing the effects of dystonia on gait rather than clinical measures of speed and distance tests.

Conclusion Motion analysis assists with planning surgical decisions and in establishing energy cost of walking. When dystonia is present on motion analysis data, this information informs the surgeon and could be useful with surgical decision-making. Whereas predictable outcomes are likely with spastic gait patterns, the child with dystonia will have more variation in surgical outcomes. Thus far, the published use of motion analysis in adults most often documents the treatment effects of various interventions. The physical therapy evaluation at the time of the motion analysis typically includes ROM, strength, and estimate of spasticity. By including the HAT and a severity rating of dystonia the clinical picture can then be associated with the biomechanics of gait. The video review portion of gait analysis is critical as dystonia will be apparent during gait with atypical trunk, arm, and hand postures identified. A severity rating can then be validated from the PT evaluation. Identifying dystonia by motion analysis can theoretically quantify and validate severity rating of dystonia (Sanger et al. 2010). Including the HAT and severity rating as a routine part of the clinical motion analysis is a good first step in validating the typical findings of asymmetry, variability, and EMG firing patterns reported in children with dystonia. With dystonia in the motion analysis lab, common findings reported in children and adults include a high variability in step length and base of support. Surface electromyography data likewise suggests diagnosis of dystonia through co-contraction, overflow muscle activity, and increased muscle activity during volitional tasks. Formal motion analysis has the potential to accurately quantify the motion deviation and determine changes of movement trajectories following treatment. Motion analysis for clinical and research practice promises quantifiable insight into the neural mechanisms of hypertonia and hyperkinetic syndromes during functional tasks.

Summarizing • Dystonia is a common hypertonic and hyperkinetic movement disorder that can have profound impact on gait and upper extremity function.

12

J. Pruente and D. Gaebler-Spira

• Diagnosis of dystonia relies upon visual observations of gait and functional tasks, dystonia severity rating scales, and regimented physical examination. • Distinguishing dystonia from other movement disorders, especially spasticity, informs treatment and management decisions. • Characteristic changes in motion analysis associated with dystonia include variable step length and base of support, muscle co-contraction, overflow muscle activity during functional tasks, and increased muscle activity during volitional tasks. • Motion analysis laboratories may play more of a role in the future in diagnosing dystonia, assessing treatment effects, and in surgical/treatment planning.

References Agostino R, Berardelli A, Formica A, Accornero N, Manfredi, M (1992). Sequential arm movements in patients with Parkinson’s disease, Huntington’s disease and dystonia. Brain 115 (Pt 5):1481–1495 Barry MJ, Van Swearingen JM, Albright AL (1999) Reliability and responsiveness of the BarryAlbright dystonia scale. Dev Med Child Neurol. 41(6), 404–411 Battini R, Olivieri I, Di Pietro R, Casarano M, Sgandurra G, Romeo DM, Cioni G (2015) Movement disorder-childhood rating scale: a sensitive tool to evaluate movement disorders. Pediatr Neurol 53(1):73–77. doi:10.1016/j.pediatrneurol.2015.02.014 Beattie C, Gormley M, Wervey R, Wendorf H (2016) An electromyographic protocol that distinguishes spasticity from dystonia. J Pediatr Rehabil Med 9(2):125–132. doi:10.3233/PRM160373 Boccagni C, Carpaneto J, Micera S, Bagnato S, Galardi G (2008) Motion analysis in cervical dystonia. Neurol Sci 29(6):375–381. doi:10.1007/s10072-008-1033-z Butler EE, Ladd AL, Lamont LE, Rose J (2010) Temporal-spatial parameters of the upper limb during a Reach & Grasp Cycle for children. Gait Posture 32(3):301–306. doi:10.1016/j. gaitpost.2010.05.013 Carecchio M, Zorzi G, Nardocci N (2015) Inherited isolated dystonia in children. J Pediatr Neurol 13(04):174–179. doi:10.1055/s-0035-1558863 Chaniary KD, Baron MS, Rice AC, Wetzel PA, Ramakrishnan V, Shapiro SM (2009) Quantification of gait in dystonic Gunn rats. J Neurosci Methods 180(2):273–277. doi:10.1016/j. jneumeth.2009.03.023 Curra A, Berardelli A, Agostino R, Giovannelli M, Koch G, Manfredi M (2000). Movement cueing and motor execution in patients with dystonia: a kinematic study. Mov Disord 15(1):103–112 de Carvalho Aguiar PM, Ozelius LJ (2002) Classification and genetics of dystonia. Lancet Neurol 1 (5):316–325. doi:10.1016/s1474-4422(02)00137-0 Fehlings, D., Switzer, L, Jethwa, A, Mink, J, Macarthur, C, Knights, S, & Fehlings, T. (2010). Hypertonia assessment tool: user manual p.1–10 Galardi G, Micera S, Carpaneto J, Scolari S, Gambini M, Dario P (2003) Automated assessment of cervical dystonia. Mov Disord 18(11):1358–1367. doi:10.1002/mds.10506 Goetz CG, Nutt JG, Stebbins GT (2008) The unified dyskinesia rating scale: presentation and clinimetric profile. Mov Disord 23(16):2398–2403. doi:10.1002/mds.22341 Gordon LM, Keller JL, Stashinko EE, Hoon AH, Bastian AJ (2006) Can spasticity and dystonia be independently measured in cerebral palsy? Pediatr Neurol 35(6):375–381. doi:10.1016/j. pediatrneurol.2006.06.015 Gregori B, Agostino R, Bologna M, Dinapoli L, Colosimo C, Accornero N, Berardelli A (2008). Fast voluntary neck movements in patients with cervical dystonia: a kinematic study before and

Functional Dystonias

13

after therapy with botulinum toxin type A. Clin Neurophysiol. 119(2):273–280. doi:10.1016/j. clinph.2007.10.007 Inzelberg R, Flash T, Schechtman E, Korczyn AD (1995). Kinematic properties of upper limb trajectories in idiopathic torsion dystonia. J Neurol Neurosurg Psychiatry. 58(3):312–319 Jethwa A, Mink J, Macarthur C, Knights S, Fehlings T, Fehlings D (2010) Development of the hypertonia assessment tool (HAT): a discriminative tool for hypertonia in children. Dev Med Child Neurol 52(5):e83–e87 Jordan K, Dziedzic K, Jones PW, Ong BN, Dawes PT (2000) The reliability of the threedimensional FASTRAK measurement system in measuring cervical spine and shoulder range of motion in healthy subjects. Rheumatology (Oxford) 39(4):382–388 Kawamura A, Klejman S, Fehlings D (2012) Reliability and validity of the kinematic dystonia measure for children with upper extremity dystonia. J Child Neurol 27(7):907–913. doi:10.1177/0883073812443086 Krauss JK, Jankovic J (2002) Head injury and posttraumatic movement disorders. Neurosurgery 50 (5):927–939. Krystkowiak P, du Montcel ST, Vercueil L, Houeto JL, Lagrange C, Cornu P, ... Group S (2007) Reliability of the Burke-Fahn-Marsden scale in a multicenter trial for dystonia. Mov Disord 22 (5):685–689. doi:10.1002/mds.21392 Kukke S, Curatalo L, de Campos A, Hallett M, Alter K, Damiano D (2015) Coordination of reachto-grasp kinematics in individuals with childhood-onset dystonia due to hemiplegic cerebral palsy. IEEE Trans Neural Syst Rehabil Eng. doi:10.1109/tnsre.2015.2458293 Lebiedowska MK, Gaebler-Spira D, Burns RS, Fisk JR (2004) Biomechanic characteristics of patients with spastic and dystonic hypertonia in cerebral palsy. Arch Phys Med Rehabil 85 (6):875–880 Monbaliu E, Ortibus E, Roelens F, Desloovere K, Deklerck J, Prinzie P, ... Feys H (2010) Rating scales for dystonia in cerebral palsy: reliability and validity. Dev Med Child Neurol 52 (6):570–575. doi:10.1111/j.1469-8749.2009.03581.x Monbaliu E, Ortibus ELS, de Cat JOS, Dan B, Heyrman L, Prinzie P, ... Feys H (2012) The dyskinesia impairment scale: a new instrument to measure dystonia and choreoathetosis in dyskinetic cerebral palsy. Dev Med Child Neurol 54(3):278–283. doi:10.1111/j.14698749.2011.04209.x Monbaliu E, de Cock P, Ortibus E, Heyrman L, Klingels K, Feys H (2016) Clinical patterns of dystonia and choreoathetosis in participants with dyskinetic cerebral palsy. Dev Med Child Neurol. 58(2):138–144. doi:10.1111/dmcn.12846 Nakao S (2011) Gait and posture assessments of a patient treated with deep brain stimulation in dystonia using three-dimensional motion analysis systems. J Med Investig 58:264–272 Pavone L, Burton J, Gaebler-Spira D (2013) Dystonia in childhood: clinical and objective measures and functional implications. J Child Neurol 28(3):340–350. doi:10.1177/0883073812444312 Phukan J, Albanese A, Gasser T, Warner T (2011) Primary dystonia and dystonia-plus syndromes: clinical characteristics, diagnosis, and pathogenesis. Lancet Neurol 10(12):1074–1085. doi:10.1016/s1474-4422(11)70232-0 Rebour R, Delporte L, Revol P, Arsenault L, Mizuno K, Broussolle E, ... Rossetti Y (2015) Doparesponsive dystonia and gait analysis: a case study of levodopa therapeutic effects. Brain Dev 37 (6):643–650. doi:10.1016/j.braindev.2014.09.005 Sanger TD (2003) Pediatric movement disorders. Curr Opin Neurol 16(4):529–535. doi:10.1097/ 01.wco.0000084233.82329.0e Sanger TD (2004) Toward a definition of childhood dystonia. Curr Opin Pediatr 16:623–627 Sanger T (2015) Movement disorders in Cerebral Palsy. J Pediatr Neurol 13(04):198–207. doi:10.1055/s-0035-1558866 Sanger TD, Chen D, Fehlings DL, Hallett M, Lang AE, Mink JW, ... Valero-Cuevas F (2010) Definition and classification of hyperkinetic movements in childhood. Mov Disord 25 (11):1538–1549. doi:10.1002/mds.23088

14

J. Pruente and D. Gaebler-Spira

Scholle HC, Jinnah HA, Arnold D, Biedermann FH, Faenger B, Grassme R, ... Schumann NP (2010) Kinematic and electromyographic tools for characterizing movement disorders in mice. Mov Disord 25(3):265–274. doi:10.1002/mds.22933 Skogseid IM (2014) Dystonia – new advances in classification, genetics, pathophysiology and treatment. Acta Neurol Scand Suppl 198:13–19. doi:10.1111/ane.12231 van der Kamp W, Berardelli A, Rothwell JC, Thompson PD, Day BL, Marsden CD (1989). Rapid elbow movements in patients with torsion dystonia. J Neurol Neurosurg Psychiatry. 52 (9):1043–1049 van Egmond ME, Kuiper A, Eggink H, Sinke RJ, Brouwer OF, Verschuuren-Bemelmans CC, ... de Koning TJ (2015) Dystonia in children and adolescents: a systematic review and a new diagnostic algorithm. J Neurol Neurosurg Psychiatry 86(7):774–781. doi:10.1136/jnnp-2014309106

Functional Effects of Ankle Sprain Ilona M. Punt and Lara Allet

Contents State of the Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Symptoms and Functional Deficits Related to an Ankle Sprain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Risk Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Epidemiology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Health-Care Costs Related to Ankle Sprain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Assessments Needed for Proper Decision Making . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Clinical Exam . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Specific Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Functional Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Patient Reported Outcome Measures (PROMs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Treatment Modalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2 2 3 3 3 4 4 4 6 7 11 11 12 13

Abstract

Ankle sprain is one of the most common sports-related injuries and can lead to recurrences and chronic ankle instability (CAI). In the acute phase, ankle sprain I.M. Punt (*) Department of Epidemiology, Maastricht University, CAPHRI, Maastricht, The Netherlands Department of Physical Therapy, University of Applied Sciences of Western Switzerland, Carouge, Switzerland e-mail: [email protected] L. Allet Department of Physical Therapy, University of Applied Sciences of Western Switzerland, Carouge, Switzerland Department of Community Medicine, Geneva University Hospitals and University of Geneva, Geneva, Switzerland e-mail: [email protected] # Springer International Publishing AG 2016 B. Müller, S.I. Wolf (eds.), Handbook of Human Motion, DOI 10.1007/978-3-319-30808-1_72-1

1

2

I.M. Punt and L. Allet

patients experience mostly pain, limited ankle mobility, and reduced ankle muscle strength. CAI patients have a history of their ankle “giving way” and/or “feeling unstable,” after at least one significant ankle sprain. They continue to suffer from pain and impaired performance during functional tasks. Both acute ankle sprains and CAI have a negative influence on daily life activities such as walking, sportsrelated activities such as jump landings, as well as on patients’ perception of health and function. Functional deficits should be carefully assessed for appropriate clinical decision making and to propose the most suitable, individualized (physiotherapeutic) intervention. Acute ankle sprains are first treated according to the rest, ice, compression, and elevation (RICE) protocol. Nonsteroidal antiinflammatory drugs may also be recommended for pain management. A short period of immobilization by means of a lower leg cast can facilitate rapid decrease in pain and swelling. Afterward, functional exercise therapy is recommended. In the case of CAI, patients should wear external ankle support during sporting activities to reduce the risk of recurring sprains and undergo exercise therapy including balance and muscle strengthening exercises. New technologies could be implemented in future rehabilitation programs in order to offer athletes greater flexibility in terms of training time and more varied, sports-related, exercises at home. Keywords

Ankle sprain • Chronic ankle instability • Clinical exam • Gait • Balance • Jump • Patient reported outcome measures • Treatment

State of the Art Definition Ankle sprain is defined as a partial or complete tear of the ligaments of the ankle due to sudden stretching. The most common mechanism causing lateral ankle sprain is excessive and explosive inversion and some degree of plantar flexion of the rear-foot on the tibia (Balduini and Tetzlaff 1982) during gait, cutting maneuvers during sports, jump landings, or stepping off an uneven surface (Bullock-Saxton et al. 1994; Hertel 2008; Wikstrom et al. 2006). In particular, athletes playing indoor/ court sports (i.e., basketball, volleyball, tennis), field-based sports (i.e., soccer), or long-distance running have an increased risk of ankle sprain injuries (Doherty et al. 2014b; Nery et al. 2016). After an ankle sprain injury, the lateral ligaments of the ankle are the most frequently injured, in particular the anterior talofibular ligament (ATFL), followed by injuries to the calcaneal fibular ligament (CFL) (Martin et al. 2013). The severity of an ankle sprain can be graded as follows:

Functional Effects of Ankle Sprain

3

• Grade I: mild damage to the fibers of the ligament without ligamentous laxity of the affected joint • Grade II: partial tear of the ligament with abnormal laxity of the ankle joint • Grade III: complete rupture of the ankle ligament (Birrer et al. 1999) Assessing the grade of an ankle sprain is important to make the appropriate decision about future treatment strategy. The more severe the grade, the more time the patient will need to fully recover.

Symptoms and Functional Deficits Related to an Ankle Sprain Ankle sprain patients experience in the acute phase mostly pain, limited ankle mobility, and reduced ankle muscle strength. These symptoms negatively influence daily life activities such as gait, balance performance, and sports-related activities such as jump landings (Aiken et al. 2008; Hertel 2000; Rose et al. 2000). Persons with chronic ankle instability (CAI) have a history of their ankle “giving way” and/or “feeling unstable” after at least one significant ankle sprain that was associated with inflammatory response. These individuals experience pain and demonstrate impaired performance during functional tasks (Mcgovern and Martin 2016). CAI patients also find activities like (single-leg) balance performance, gait, and sports-related activities difficult to perform.

Risk Factors Risk factors and mechanisms which potentially contribute to recurrent ankle sprains include altered intrinsic body functions such as decreased proprioception in the ankle ligaments, muscle weakness, limited range of motion of the ankle joint, and extrinsic factors such as inappropriate footwear (Van Rijn et al. 2008; Mckeon and Hertel 2008a). A history of ankle sprain is in itself a risk factor for a re-sprain and may lead to mechanical or functional instability resulting in CAI (Van Rijn et al. 2008; Hertel 2002).

Epidemiology Lateral ankle sprain injury is the most common sports-related acute injury and occurs predominantly in persons aged 15–19 years (Fong et al. 2007; Hootman et al. 2007; Waterman et al. 2010). The incidence rate is 11.55 per 1000 exposures (Doherty et al. 2014b). Despite various treatment modalities, persons with a history of ankle sprain are known to present higher risk for re-spraining their ankle and developing mechanical or functional instability, resulting in chronic ankle instability (CAI) (Van Rijn et al. 2008; Hertel 2002). Up to 34% of patients report recurrent ankle sprains during the first year after the initial injury (Van Rijn et al. 2008). Up to 74% of all ankle sprain patients also experience residual symptoms such as pain, swelling,

4

I.M. Punt and L. Allet

peroneal muscle weakness, or neuromuscular dysfunctions (Hertel 2000), all of which make patients susceptible for further injury and negatively influence activities of daily life (ADL) and sport performance.

Health-Care Costs Related to Ankle Sprain Ankle sprains lead to high direct and indirect health-care costs (Nazarenko et al. 2013; Verhagen et al. 2005). In the United States (US), the estimated costs are between 318 and 914 US dollars per acute ankle sprain (Nazarenko et al. 2013). In the Netherlands, the costs are estimated to be 360 euros per sprain (Verhagen et al. 2005). Another Dutch study calculated direct health-care costs of patients visiting the emergency department after a ligamentous ankle injury. These costs are estimated to be 684 euros per injury (De Boer et al. 2014). Costs increase with patients’ age, in particular for ambulance care, home care and rehabilitation. Since the introduction of new guidelines in the Netherlands, patients with minor injuries are able to visit the general practitioner 24 h a day, 7 days a week. Consequently, fewer ankle sprain patients visited the emergency department (De Boer et al. 2014).

Assessments Needed for Proper Decision Making Whenever a patient visits a medical doctor or physical therapist after suffering an ankle injury, the physician and/or therapist inquires about the accident occurrence and performs a clinical exam. In particular, the physician and/or physical therapist inquire about perceived pain intensity and if the patient has a history of “giving way” and/or “feelings of instability” which might indicate CAI. The degree of swelling is measured, and the remaining range of motion (ROM) and muscle strength evaluated. Physicians/therapists then check for functional deficits such as deficits in balance, gait, and sports-related tasks (e.g., jump landings). Patient reported outcome measures (PROMs), assessing patients’ perception of their health and function, complete the examination.

Clinical Exam A clinical exam serves to develop an individualized evidence-based rehabilitation plan that supports recovery while decreasing the risk of reinjury (Mcgovern and Martin 2016). During the clinical exam, the physician or physical therapist inquires about pain intensity during rest and activity, observes the patient in the standing and lying down positions, and screens for swelling, hematoma, bruising, and deformity. The physician or therapist also checks postural issues such as overpronation of the foot or difficulty in putting the foot on the floor. The entire ankle joint is then palpated to assess skin temperature (which can increase due to acute inflammation) and swelling and ascertain whether the ligaments are painful to touch. Then the

Functional Effects of Ankle Sprain

5

Fig. 1 The figure-of-eight method for measuring ankle edema

therapist measures the degree of edema, range of motion (ROM), and muscle strength and checks for functional deficits.

Swelling The gold standard to measure edema is the water displacement method (Mawdsley et al. 2000; Mckay et al. 2001), but this method may be too time-consuming for efficient clinical use. However, Mawdsley et al. (2000) and Watson et al. (2008) showed recently that the figure-of-eight method is valid when assessing ankle edema in a clinical setting and that the inter-rater reliability of this method is excellent (ICC > 0.99, SEM of  0.2 cm). The therapist wraps a tape measure around standardized anatomical landmarks near the ankle, and the distance provides a circumferential estimate of volume (Fig. 1). Range of Motion (ROM) Ankle mobility can be assessed actively and passively in the classical way using a goniometer. However, measuring dorsiflexion in standing simulates the ROM required for functional tasks. This is particularly relevant because the torques applied to the ankle during weight bearing are clearly greater than in non-weight-bearing conditions and the resulting measurement may be more indicative of the range available for functional activities (Bennell et al. 1998; Bohannon et al. 1989). To measure the ankle dorsiflexion range during weight bearing, the participant stands on an apparatus consisting of a horizontal footplate attached to a vertical board. Participants align the big toe and heel of the test leg over a line marked along the center of the footplate. Participants are instructed not to lift the test heel, which is checked by the examiner who gently palpates for lifting while the participant moves his knee forward into a lunge position until the patella touches the midline of the vertical board (Bennell et al. 1998; Bohannon et al. 1989). To prevent forward movement of the big toe as the knee moves forward over the foot, a block is placed in front of the big toe. The distance (in cm) from the vertical board to the big toe is measured.

6

I.M. Punt and L. Allet

Muscle Strength Classical manual muscle testing in a sitting position is indicated to test strength in dorsiflexion, inversion, inversion with dorsiflexion, eversion, and eversion with plantar flexion. However, using a handheld dynamometer could further improve the measuring of muscle strength. The strength of the gastrocnemius and soleus can also be tested in a standing position. To test the gastrocnemius, the patient stands on the test limb with the knee extended. Patients may place one or two fingers on a table to assist with balance. The patient actively raises his heel from the floor 20 consecutive times without resting through full range of plantar flexion. To test the soleus the same procedure is used; only the patient stands on the test limb with his knee slightly flexed (Hislop et al. 2013; Spink et al. 2010).

Specific Tests After these classical tests, specific ankle tests are used to assess the integrity of the ligaments or the damage sustained. • Anterior drawer test: used to assess the ATFL. The patient sits with his knee flexed in order to relax the gastrocnemius and soleus muscles. The ankle is in 10 plantar flexion. The heel is held and forcefully pulled forward with one hand, while the other hand applies proximal counterpressure (Fig. 2). The test is positive when the injured ankle shows severe anterior subluxation compared to the noninjured ankle (Welck et al. 2015). • Talar tilt test: used to test the CFL. The patient is positioned with his knee flexed. The heel is grasped and the talus tilted into varus (Fig. 3). A normal degree of tilt is 0–23 . The injured side is compared to the noninjured ankle (Welck et al. 2015).

Fig. 2: Anterior drawer test

Functional Effects of Ankle Sprain

7

Fig. 3: Talar tilt test

• External rotation test: used to assess the syndesmosis. The leg is stabilized proximally to the ankle joint while grasping the plantar aspect of the foot and rotating the foot externally. The test is positive when painful (Alonso et al. 1998). • Squeeze tests: also used to assess the syndesmosis. The fibula and tibia are compressed at midcalf. The test is positive when it elicits pain (Alonso et al. 1998; Welck et al. 2015).

Functional Tests Physical therapists should use functional tests to assess how the patient moves and how the ankle injury affects balance, walking, and jumping. The severity of the ankle injury dictates which tests are selected.

Balance Performance Balance deficits have been found to be present after acute ankle sprains (Mckeon and Hertel 2008a) for up to 1 week both in the injured ankle and in the noninjured ankle (Evans et al. 2004). Balance performance can be assessed using clinical tests or laboratory analyses using force platforms. A frequently used dynamic clinical test is the Star Excursion Balance Test (SEBT), a series of single-limb squats using the non-stance limb to reach maximally to touch a point along one of eight designated lines on the ground (Fig. 4) (Gribble et al. 2012). The SEBT showed that acute ankle sprain patients have a shorter anterior reach/leg length compared to healthy controls (Pourkazemi et al. 2016; Akbari et al. 2006). Force plate data are most often characterized by the analysis of the trajectory of the center of pressure (COP). Parameters that are frequently chosen to assess balance performance are COP range, length, and speed. The review of Mckeon and Hertel (2008a) showed that COP range, length, and speed of the injured ankle were increased after an acute lateral ankle sprain compared to healthy controls.

8

I.M. Punt and L. Allet

Anterior

Anteriorlateral

Anteromedial

Lateral

Medial

Posteromedial

Posterolateral Posterior

Fig. 4: Reach direction for left ankle stance of the Star Excursion Balance Test (SEBT). Directions are labeled based on the reach direction from the stance limb

Although patients significantly improve postural control (e.g., COP range, length, as well as speed) during the first 4 weeks after an ankle sprain (Evans et al. 2004; Hertel et al. 2001), they frequently present residual functional deficits (muscle strength, mobility) and impaired postural control after this period (Hertel et al. 2001; Holme et al. 1999). Genthon et al. (2010) showed that ankle sprain patients present asymmetric balance in bipedal stance during the first 10 days after injury. From day 10 to day 30, bipedal balance improved and returned to normal after 30 days (Genthon et al. 2010). However, deficits may become more evident while balancing on one leg (Mckeon and Hertel 2008a; Rozzi et al. 1999; Wikstrom et al. 2009a). The study of Hertel et al. (2001), for example, showed that during a singleleg balance test, COP length and speed were greater in the injured ankle compared to the noninjured ankle for up to 4 weeks after injury. However, both parameters significantly improved 4 weeks after injury compared to the day after the injury (Hertel et al. 2001). Using the noninjured leg as a reference is not recommended to estimate residual deficit because it is assumed that central neural changes after unilateral lateral ankle trauma affect motor control of both extremities (Holme et al. 1999). The meanings differ as regard to the balance impairments for CAI patients. Dynamic clinical tests (i.e., SEBT) in CAI patient showed that presenting deficits in ankle dorsiflexion ROM leads to difficulties with the anterior reach direction of the SEBT (Basnett et al. 2013; Munn et al. 2010; Arnold et al. 2009). Meta-analyses studying force plate data concluded that ankle instability leads to impaired balance

Functional Effects of Ankle Sprain

9

performance (Arnold et al. 2009; Munn et al. 2010). These meta-analyses further stated that it remains unclear whether these differences in balance preexisted or were the consequence of the ankle injury (Arnold et al. 2009). Furthermore, no definitive conclusion could be made based on a systematic review from Mckeon and Hertel (2008a). They compared the COP performance achieved with the injured ankle of CAI patients with the COP performance of healthy controls or the one achieved with the uninjured ankle (Mckeon and Hertel 2008a).

Gait Performance Temporal-Spatial Acute ankle sprain patients demonstrated disturbed gait parameters such as slower gait speed, shorter step length, shorter single support time, as well as disturbed symmetry for single support time (Crosbie et al. 1999; Punt et al. 2015). For example, healthy persons demonstrate on average a walking speed of 1.29  0.17 m/s, while ankle sprain patients only walk 1.12  0.25 m/s 4 weeks after the initial injury (Punt et al. 2015). Punt et al. (2015) showed that decreased walking speed was correlated to increased pain levels and deficits in dorsiflexion muscle strength measured with a handheld dynamometer. In contrast, CAI patients show similar gait speed compared to healthy age-matched controls (Monaghan et al. 2006).

Kinematics For normal gait, which is one of the most frequent activities, a minimum ankle dorsiflexion of 10 has been shown to be necessary (Riener et al. 2002). In acute ankle sprain patients, Crosbie et al. (1999) showed that the degree of maximum passive dorsiflexion of the ankle measured was correlated with gait speed, step length, and symmetry for single support time. However, Punt et al. (2015) found no difference between ankle sprain patients and healthy controls for maximum dorsiflexion during the stance phase of gait while walking at a self-selected walking speed. In contrast, Punt et al. (2015) found that the maximum plantar flexion was reduced on the injured side of patients ( 14.2  7.9) compared to healthy controls ( 18.7  8.0). Doherty et al. (2015a) described similar findings when comparing the injured ankle with the noninjured side. In addition, the timing of maximum plantar flexion was delayed at the injured ankle compared with that of the healthy group (Punt et al. 2015). Furthermore acute ankle sprain patients demonstrated increased ankle inversion with a greater inversion moment (Delahunt et al. 2006; Doherty et al. 2015a; Monaghan et al. 2006). While walking with a similar gait speed, CAI patients showed more ankle inversion compared to healthy controls (Monaghan et al. 2006; Delahunt et al. 2006). In addition, CAI patients inverted the ankle at a rate of 0.5 rad/s around heel strike, while healthy controls slowly everted their ankle at a rate of 0.1 rad/s (Monaghan et al. 2006).

10

I.M. Punt and L. Allet

Kinetics Ankle sprain patients demonstrated lower maximum concentric dorsiflexion power compared with the healthy controls. They also demonstrated lower maximum eccentric plantar flexion power compared with the healthy subjects. Furthermore, the maximum moment was lower in the ankle sprain group compared with the healthy group (Punt et al. 2015). These findings indicate that patients with an ankle sprain use a more conservative and secure gait pattern, characterized by slow self-selected walking speed. This might have been even more marked if patients had been tested with stricter requirements, such as faster walking speed, running, or irregular walking surfaces. CAI patients showed an evertor moment directly after heel strike, whereas healthy controls showed an inventor moment while walking at a similar gait speed. Joint power also differed between these two groups after heel strike. CAI patients showed concentric power generation, while healthy controls showed eccentric power generation (Monaghan et al. 2006).

Sports-Related Tasks Previous studies showed that ankle sprain patients displayed reduced ankle plantar flexion while performing a bipedal drop vertical jump 2 weeks, but also 6 months, after the initial injury, compared to healthy controls (Doherty et al. 2014a, 2015b). Similar findings were found when patients performed a single-leg jump 4 weeks after the injury compared to a control group (Allet et al. 2016). Thus, perhaps the altered movement pattern of ankle sprain patients represents a security strategy to avoid recurrences. Increased dorsiflexion of the ankle brings the joint into a more closed-packed position that protects the lateral ligament complex more efficiently. This evasive movement avoids the typical ankle sprain injury mechanism, a combination of inversion and plantar flexion of the ankle (Balduini and Tetzlaff 1982). Nevertheless, abnormal foot positioning at initial contact might lead to faulty neuromuscular preprogramming of ankle joint movement, thereby, in the long term, contributing to the development of CAI (Hertel 2008). Alternatively, dorsiflexion could be a precaution behavior of ankle sprain patients. Ankle sprain patients may not jump as high as healthy persons, and therefore their toes would not be able to clear the floor. Studies including CAI patients showed divergent results ranging from increased ankle dorsiflexion (Caulfield and Garrett 2002) to decreased dorsiflexion when patients performed a single-leg drop landing (Ashtonmiller et al. 1996). Studies investigating neuromuscular control mechanisms in CAI patients also reported reduced activation of the peroneus muscle before initial contact during a single-leg drop landing compared to healthy subjects (Caulfield et al. 2004; Delahunt et al. 2006). CAI patients, whose landing phase while running and stop-jump maneuvers was evaluated, showed a more inverted ankle, reduced muscle co-contraction, and decreased dynamic stiffness in the ankle joint during landing phase compared to healthy controls (Lin et al. 2011).

Functional Effects of Ankle Sprain

11

Patient Reported Outcome Measures (PROMs) Patient reported outcome measures (PROMs) are instruments assessing patients’ perception of their health and function. Ankle PROMs typically include questions addressing pain, mobility, function, and quality of life. PROMs are important but should supplement, rather than replace, existing measures of quality and performance. Several validated PROMs are available for foot and ankle disorders (Eechaute et al. 2007; Weel et al. 2015; Martin and Irrgang 2007). The most commonly used PROMs related to foot and ankle disorders are the following: • Ankle Joint Functional Assessment Tool (AJFAT): it contains five impairment items (pain, stiffness, stability, strength, rolling over), four activity-related items (walking on uneven ground, cutting when running, jogging, and descending stairs), and one overall quality item. The maximal total score of the AJFAT is 40 points, and the minimum is 0 points (Rozzi et al. 1999). • Functional Ankle Ability Measure (FAAM): this self-report questionnaire consists of a 21-item, activities of daily living (ADL) subscale and an 8-item sportsrelated subscale. The final score ranges from 0 to 100 for ADL as well as sports, higher scores indicating higher levels of function (Martin et al. 2005). • Foot and Ankle Disability Index (FADI): this 34-item questionnaire is divided into two subscales: the FADI and the FADI Sport. The FADI contains 4 painrelated items and 22 activity-related items. The FADI Sport contains eight activity-related items. The scores of the FADI and FADI Sport are then transformed into percentages (Hale and Hertel 2005). • Functional Ankle Outcome Score (FAOS): this 42-item questionnaire is divided into five subscales, pain, other symptoms, ADL, sport and recreation function, and foot and ankle-related quality of life. Final scores are then transformed into ratings from 0 to 100 (worst to best score) (Roos et al. 2001). According to Eechaute et al. (2007), the FADI and the FAAM are considered as the most appropriate self-report tools to quantify functional disabilities in patients with CAI. CAI patients demonstrate worse FADI and FADI Sport scores compared to healthy controls (Wikstrom et al. 2009b).

Treatment Modalities Appropriate management of lateral ankle sprain is vital for a successful recovery. Acute ankle sprains can be managed initially according to the rest, ice, compression, and elevation (RICE) protocol, and the use of nonsteroidal anti-inflammatory drugs may be recommended for pain management (Kerkhoffs and Van Dijk 2013). A short period (1 nm/kg. In hip arthritis, difficulty with stair climbing has been consistently reported, due to the weakness of the major hip muscles leading to reduced muscle moments (Fig. 7) and an altered power generation (Fig. 8). Power generation is seen to reduce significantly in hip arthritis which would lead to compensation strategies to achieve the range of motion required to initiate stair climbing. The inability of muscles to produce enough power for the increased range of motion required during stair climbing is compensated by a reduced stair climbing speed in these patients. All of the above results in patients adopting altered angular and loading strategies when both ascending and descending stairs, which may lead to bilateral asymmetries. Following THA, joint muscle moment and power are seen to improve significantly, returning the power generation to near normal requirements for stair climbing. However, the improvement is not comparable to a healthy hip. Note that the transverse pain moments, though significantly lower compared to the other two planes, could have some effect due to the altered pattern during both ascending and descending stairs.

Effects of Total Hip Arthroplasty on Gait

13

Fig. 7 Hip joint external moments curve during ascend (top) and descend (bottom) of stairs for healthy adults (gray), patients with hip arthritis (black), and patients following 1-year total hip arthroplasty (red). Results are from the Motion Analysis Laboratory, at Mayo Clinic, Rochester, MN

Fig. 8: Hip joint power during ascend and descend of stairs for healthy adults (gray), patients with hip arthritis (black), and patients following 1-year total hip arthroplasty (red). Results are from the Motion Analysis Laboratory, at Mayo Clinic, Rochester, MN

Future Work Research suggests that investigating the extent of remaining preoperative gait patterns at each joint would help improve rehabilitation, by reducing preoperative gait abnormalities as early as possible. Large-scale studies are therefore needed, assessing all lower extremity joints both pre- and postoperatively for both short- and long-term THA outcomes. There is also a need for a simplified objective gait score,

14

S. Chopra and K.R. Kaufman

assessing the severity of gait alterations in patients with hip arthritis which would assist clinicians in planning treatments, accordingly.

Cross-References ▶ 3D Dynamic Probablistic Pose Estimation Using Cameras and Reflective Markers ▶ 3D Kinematics of Human Motion ▶ Arthrokinematics and Joint Morphology ▶ Gait Parameters Estimated Using Magneto and Inertial Measurement Units ▶ Interpreting Spatiotemporal Parameters, Symmetry and Variability in Clinical Gait Analysis ▶ Pressure Platforms ▶ Three-dimensional Human Kinematics Estimation Using Magneto and Inertial Measurement Units

References Akiyama K, Nakata K, Kitada M, Yamamura M, Ohori T, Owaki H, Fuji T (2016) Changes in axial alignment of the ipsilateral hip and knee after total hip arthroplasty. Bone Joint J 98-B:349–358 Andriacchi TP, Andersson GB, Fermier RW, Stern D, Galante JO (1980) A study of lower-limb mechanics during stair-climbing. J Bone Joint Surg Am 62:749–757 Barrett WP, Turner SE, Leopold JP (2013) Prospective randomized study of direct anterior vs postero-lateral approach for total hip arthroplasty. J Arthroplast 28:1634–1638 Beaulieu ML, Lamontagne M, Beaulé PE (2010) Lower limb biomechanics during gait do not return to normal following total hip arthroplasty. Gait Posture 32:269–273 Bejjani FJ, Lockett R, Pavlidis L (1992) Videofluoroscopy system for in vivo motion analysis. Google Patents Ben-Galim P, Ben-Galim T, Rand N, Haim A, Hipp J, Dekel S, Floman Y (2007) Hip-spine syndrome: the effect of total hip replacement surgery on low back pain in severe osteoarthritis of the hip. Spine (Phila Pa 1976) 32:2099–2102 Centers For Disease Control and Prevention (2010) National Hospital Discharge Survey: 2010 table, procedures by selected patient characteristics – number by procedure category and age Della Croce U, Cappuzzo A, Kerrigan DC (1999) Pelvis and lower limb anatomical landmark calibration precision and its propagation to bone geometry and joint angles. Med Biol Eng Comput 37:155–161 Evenson KR, Buchner DM, Morland KB (2012) Objective measurement of physical activity and sedentary behavior among US adults aged 60 years or older. Prev Chronic Dis 9:E26 Fiorentino NM, Kutschke MJ, Atkins PR, Foreman KB, Kapron AL, Anderson AE (2016) Accuracy of functional and predictive methods to calculate the hip joint center in young non-pathologic asymptomatic adults with dual fluoroscopy as a reference standard. Ann Biomed Eng 44:2168–2180 Foucher KC, Wimmer MA (2012) Contralateral hip and knee gait biomechanics are unchanged by total hip replacement for unilateral hip osteoarthritis. Gait Posture 35:61–65 Hagstromer M, Oja P, Sjostrom M (2007) Physical activity and inactivity in an adult population assessed by accelerometry. Med Sci Sports Exerc 39:1502–1508 Harding P, Holland AE, Delany C, Hinman RS (2014) Do activity levels increase after total hip and knee arthroplasty? Clin Orthop Relat Res 472:1502–1511

Effects of Total Hip Arthroplasty on Gait

15

Horstmann T, Listringhaus R, Haase GB, Grau S, Mundermann A (2013) Changes in gait patterns and muscle activity following total hip arthroplasty: a six-month follow-up. Clin Biomech (Bristol, Avon) 28:762–769 Lin BA, Thomas P, Spiezia F, Loppini M, Maffulli N (2013) Changes in daily physical activity before and after total hip arthroplasty. A pilot study using accelerometry. Surgeon 11:87–91 Louriuro A, Mills PM, Barrett RS (2013) Muscle weakness in hip osteoarthritis: a systematic review. Arthritis Care Res 65:340–352 Lubbeke A, Zimmermann-Sloutskis D, Stern R, Roussos C, Bonvin A, Perneger T, Peter R, Hoffmeyer P (2014) Physical activity before and after primary total hip arthroplasty: a registry-based study. Arthritis Care Res 66:277–284 Madsen MS, Ritter MA, Morris HH, Meding JB, Berend ME, Faris PM, Vardaxis VG (2004) The effect of total hip arthroplasty surgical approach on gait. J Orthop Res 22:44–50 OECD (2015) Health at a glance 2015: OECD indicators (summary). OECD Publishing, Paris Perry J, Burnfield J (2010) Gait analysis: normal and pathological function. J Sports Sci Med 9:353 Queen RM, Butler RJ, Watters TS, Kelley SS, Attarian DE, Bolognesi MP (2011) The effect of total hip arthroplasty surgical approach on postoperative gait mechanics. J Arthroplast 26:66–71 Queen RM, Newman ET, Abbey AN, Vail TP, Bolognesi MP (2013) Stair ascending and descending in hip resurfacing and large head total hip arthroplasty patients. J Arthroplast 28:684–689 Queen RM, Appleton JS, Butler RJ, Newman ET, Kelley SS, Attarian DE, Bolognesi MP (2014) Total hip arthroplasty surgical approach does not alter postoperative gait mechanics one year after surgery. PM R 6:221–226; quiz 226 Queen RM, Attarian DE, Bolognesi MP, Butler RJ (2015) Bilateral symmetry in lower extremity mechanics during stair ascent and descent following a total hip arthroplasty a one-year longitudinal study. Clin Biomech (Bristol, Avon) 30:53–58 Reardon K, Galea M, Dennett X, Choong P, Byrne E (2001) Quadriceps muscle wasting persists 5 months after total hip arthroplasty for osteoarthritis of the hip: a pilot study. Intern Med J 31:7–14 Roberts A (2010) Gait analysis: normal and pathological function (2nd edition). Bone Joint J 92-B(8):1184 Shrader W, Bhowmik-Stoker M, Jacofsky MC, Jacofsky DJ (2009) Gait and stair function in total and resurfacing hip arthroplasty: a pilot study. Clin Orthop Relat Res 467:1476–1484 Tao W, Liu T, Zheng R, Feng H (2012) Gait analysis using wearable sensors. Sensors (Basel, Switzerland) 12:2255–2283 Umeda N, Miki H, Nishii T, Yoshikawa H, Sugano N (2009) Progression of osteoarthritis of the knee after unilateral total hip arthroplasty: minimum 10-year follow-up study. Arch Orthop Trauma Surg 129:149–154 Van Den Bogert AJ, Read L, Nigg BM (1999) An analysis of hip joint loading during walking, running, and skiing. Med Sci Sports Exerc 31:131–142 Watelain E, Dujardin F, Babier F, Dubois D, Allard P (2001) Pelvic and lower limb compensatory actions of subjects in an early stage of hip osteoarthritis. Arch Phys Med Rehabil 82:1705–1711 Weber T, Al-Munajjed AA, Verkerke GJ, Dendorfer S, Renkawitz T (2014) Influence of minimally invasive total hip replacement on hip reaction forces and their orientations. J Orthop Res 32:1680–1687 Wesseling M, Meyer C, Corten K, Simon JP, Desloovere K, Jonkers I (2016) Does surgical approach or prosthesis type affect hip joint loading one year after surgery? Gait Posture 44:74–82 Yoshimoto H, Sato S, Masuda T, Kanno T, Shundo M, Hyakumachi T, Yanagibashi Y (2005) Spinopelvic alignment in patients with osteoarthrosis of the hip: a radiographic comparison to patients with low back pain. Spine (Phila Pa 1976) 30:1650–1657

The Effects of Ankle Joint Replacement on Gait Justin Michael Kane, Scott Coleman, and James White Brodsky

Abstract

Patients with ankle arthritis experience significant disability. In patient-centered surveys, the mental and physical disability is equivalent to that associated with end-stage hip arthritis (Glazebrook et al. Bone Joint Surg Am 90(3):499–505, 2008). End-stage ankle arthritis has a number of features differentiating it from hip and knee arthritis. Unlike the primary osteoarthritis of the hip and knee, 70% of ankle arthritis is post-traumatic in nature (Saltzman et al. Iowa Orthop J 25:44–46, 2005). Given the high degree of success seen with total hip arthroplasty (THA) and total knee arthroplasty (TKA), the first generation of total ankle arthroplasty (TAA) was introduced in the 1970s. Unfortunately, early TAA did not have the same success as other arthroplasty procedures, and the procedure was largely abandoned in favor of return to ankle arthrodesis (Bolton-

J.M. Kane (*) Baylor University Medical Center, McKinney, TX, USA Faculty, Foot and Ankle Fellowship Program, Baylor University Medical Center, Dallas, TX, USA Orthopedic Associates of Dallas, Dallas, TX, USA e-mail: [email protected] S. Coleman Department of Orthopaedics, Baylor University Medical Center, Dallas, TX, USA Department of Orthopedics, Baylor Scott & White, Dallas, TX, USA e-mail: [email protected] J.W. Brodsky Faculty, Foot and Ankle Fellowship Program, Baylor University Medical Center, Dallas, TX, USA University of Texas Southwestern Medical School, Dallas, TX, USA Texas A&M HSC College of Medicine, Bryan, TX, USA e-mail: [email protected] # Springer International Publishing AG 2016 B. Müller, S.I. Wolf (eds.), Handbook of Human Motion, DOI 10.1007/978-3-319-30808-1_83-1

1

2

J.M. Kane et al.

Maggs et al. J Bone Joint Surg Br 67:785–790, 1985, Dini and Bassett Clin Orthop 146:228–230, 1980, Kitaoka and Patzer J Bone Joint Surg Am 76:974–979, 1994, Stauffer and Seagal Clin Orthop 160:217–221, 1981, Kofoed and Sorensen J Bone Joint Surg 80-B:328–332, 1998, Demottaz et al. J Bone Joint Surg Am 61 (7):976–988, 1979). While arthrodesis was more reliable with fewer complications than arthroplasty, arthrodesis has the limitation of loss of ankle motion, especially in cases in which there is severe arthritis but moderate residual joint motion. Complications of arthrodesis include nonunion and malunion, and there are reports of arthritis at adjacent joints and residual abnormalities of gait (Beischer et al. Foot Ankle Int 20:545–553, 1999, Buchner and Sabo Clin Orthop Relat Res 406:155–164, 2003, Buck et al. J Bone Joint Surg 69-A:1052–1062, 1987, Coester et al. J Bone Joint Surg 83-A:219–228, 2001, Fuchs et al. J Bone Joint Surg 85-B:994–998, 2003). The complications associated with ankle arthrodesis, coupled with the desire to more normally replicate ankle biomechanics, have led to renewed interest in TAA in recent years, with encouraging reports of early- and mid-term results and improved survivorship compared to first-generation series. Even though TAA does not yet have the longevity of THA and TKA, there are many early and intermediate-term reports of high levels of patient satisfaction, pain relief and patient function, and variable survivorship at 80–95%, depending on the length of follow-up (Haddad et al. J Bone Joint Surg Am 89:1899–1905, 2007a, Gougoulias et al. Clin Orthop 468:199–208, 2010). One of the postulated benefits of TAA is preservation of tibiotalar motion. A number of studies have demonstrated improvements in gait following TAA. While TAA does not restore normal gait, patients have improvements in nearly all parameters of gait with many approaching normal controls (Valderrabano et al. Clin Biomech 22:894–904, 2007, Doets et al. Foot Ankle Int 28(3):313–322, 2007, Singer et al. J Bone Joint Surg Am 95(e191):1–10, 2013, Flavin et al. Foot Ankle Int 34(1):1340–1348, 2013). More importantly, when gait studies have compared TAA with preoperative function, TAA offers a significant improvement (Singer et al. J Bone Joint Surg Am 95(e191):1–10, 2013, Flavin et al. Foot Ankle Int 34(1):1340–1348, 2013, Queen et al. J Bone Joint Surg Am 96:987–993, 2014a, Piriou et al. Foot Ankle Int 29(1):3–9, Brodsky et al. J Bone Joint Surg Am 93:1890–1896, 2011, Queen et al. Foot Ankle Int 33(7):535–542, 2012, Queen et al. Clin Biomech 29:418–422, 2014b).

Keywords

Total ankle arthroplasty • Implant design • Clinical outcomes • Patient-reported outcomes • Gait analysis

Contents State of the Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Biomechanics and Anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

The Effects of Ankle Joint Replacement on Gait

3

Clinical Outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Gait Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Temporal-Spatial Parameters of Gait . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Kinematic Parameters of Gait . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Kinetic Parameters of Gait . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Case Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Conclusion/Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

State of the Art First-generation total ankle replacement implant designs were broadly categorized as constrained or unconstrained. Both designs produced poor results. While unconstrained designs allowed more mobility and decreased strain at the boneimplant interface, they had a high level of instability and high rate of wear on the polyethylene component resulting in an unacceptable rate of early failure. Constrained designs reduced instability and allowed a more even force distribution on the polyethylene, but motion was limited to the sagittal plane, and the restricted motion resulted in an increased strain at the bone-implant interface leading to early loosening (Bolton-Maggs et al. 1985; Dini and Bassett 1980; Kitaoka and Patzer 1994; Stauffer and Seagal 1981; Kofoed and Sorensen 1998; Lewis 1994; Kakkar and Siddique 2011). A better understanding of biomechanics at the ankle has led to the development of the current generation of implants used today (Reggiani et al. 2006; Espinoza et al. 2010; Leardini et al. 2014). Of the many TAA implants on the world market today, all are either two component or three component. In two-component designs, the polyethylene is fixed to the tibial component, and motion occurs between it and the talar component. In three-component designs, the polyethylene is a mobile bearing allowing motion at the interfaces between the polyethylene and both the tibial and talar components. The rationale for the three-component design is to achieve a more normal restoration of multiplane motion. A lack of conclusive evidence exists to support superiority of either implant design.

Biomechanics and Anatomy The tibiotalar joint is a highly constrained articulation comprised of the tibia, fibula, and dorsal surface of the talus. Poor understanding of the highly congruous joint mechanics, the complexities of ankle motion, and the stresses on the implant-bone interface contributed to early implant failures. A thorough understanding of the anatomy and biomechanics is integral when considering designs of total ankle prostheses (Deland et al. 2000; Gill 2002). The three-dimensional osteology of the joint is such that with dorsiflexion there is a component of external rotation and abduction and with plantarflexion there is a

4

J.M. Kane et al.

component of internal rotation and adduction. The dorsal aspect of the talus has two domes with differing radii. The medial dome has a smaller radius of curvature than the lateral which explains the secondary motion seen at the tibiotalar joint (Barnett and Napier 1952). The articular surface of the talus is wedge shaped with a wider anterior and tapers posteriorly in an asymmetric manner (Barnett and Napier 1952; Inman 1991). As the tibiotalar joint plantar flexes, the axis of rotation shifts from anterior to posterior. While range of motion in the sagittal plane is 70 (Tooms 1987), studies have suggested the working range of motion is limited to 25 (15 of plantarflexion and 10 of dorsiflexion) during the stance phase of gait (Stauffer 1979). During gait, a vertical load of greater than five times the body weight is transmitted at the tibiotalar joint (Stauffer et al. 1977). The same vertical load is experienced after TAA, and shear forces can exceed two to three times the body weight during the gait cycle (Stauffer et al. 1977). Given the significant forces exerted at the tibiotalar joint and the small weight-bearing surface of only 7 cm2 (Hintermann 2005), design considerations are paramount to implant survivorship.

Clinical Outcomes In recent studies of TAA, investigators have reported good to excellent outcome in almost 80% of cases at greater than 2-year follow-up (Haddad et al. 2007b). In a meta-analysis conducted by Haddad et al., the American Orthopedic Foot and Ankle Society (AOFAS) hind-foot score was 78.2. They found a revision rate of only 7% and a 10-year survival rate of 77% (Haddad et al. 2007b). In a study by Queen et al. comparing two-component and three-component implants, the mean improvement in AOFAS score was from 32.9 to 65 at 1 year with additional improvement to 81.4 at 2 years. Improvement in the Foot and Ankle Disability Index (FADI) was from 60.0 to 17.4 at 1 year and 13.3 at 2 years. Both Short Musculoskeletal Function Assessment (SMFA) function and bother scores improved at 1 year with the function score improving from 27.3 to 11.8 and the bother score improving from 32 to 12.9. The two-component cohort experienced greater improvement in SF36 total score compared to the three-component cohort, while the three-component group exhibited a greater reduction in visual analog scale (VAS) pain scores. Unfortunately, the difference between the two cohorts was unable to conclusively determine which was more efficacious in the treatment of end-stage ankle arthritis (Queen et al. 2014a).

Gait Analysis End-stage ankle arthritis is not only subjectively debilitating. Numerous studies have demonstrated abnormality of each parameter of gait. In order to fully appreciate the disability associated with end-stage ankle arthritis and the improvements gained with

The Effects of Ankle Joint Replacement on Gait

5

total ankle arthroplasty, it is essential to understand the basic measurements of gait analysis. The parameters of gait are divided into three categories. Temporal-spatial parameters (TSPs) of gait are the most clinically appreciable measurements and are the most intuitive and easily appreciated (chapter “▶ Interpreting Spatiotemporal Parameters, Symmetry and Variability in Clinical Gait Analysis”). TSPs are considered the “vital signs of gait” (Kirtley 2006), measuring the speed of walking and its component parts, including support time for each limb. Walking speed is the product of step length and cadence (steps/ minute). It is important to keep in mind that when evaluating TSPs, a number of factors (i.e., pain, stiffness) influence function irrespective of ankle pathology (Valderrabano et al. 2007; Kawamura et al. 1991; McIntosh et al. 2006; Ilgin et al. 2011; Inam et al. 2010; Ledoux et al. 2006; Potter et al. 1995; Queen and Nunley 2010). Kinematic parameters of gait measure movement, separate from the forces driving locomotion. Kinematics entails the displacement of an anatomic segment (linear or angular), velocity, or acceleration (chapter “▶ 3D Kinematics of Human Motion”) (Kirtley 2006). Most modern gait analyses rely on motion-tracking devices to measure specific angles and trajectories as they pertain to the anatomic areas of interest. While traditional gait analysis measured motion using a model which treats the foot as a single segment, recent work has focused on multi-segment kinematic models to distinguish motion between and among the different parts of the foot in the coronal, axial, and sagittal planes (chapter “▶ Kinematic Foot Models for Instrumented Gait Analysis”) (Mayich et al. 2014; Novak et al. 2014). Kinetic parameters measure the forces generated with gait and are expressed both as power and moment (force plus direction). Ground reaction forces, joint moments, and joint mechanical power are all measured within the context of the kinetic parameters of gait analysis (chapters “▶ Interpreting Ground Reaction Forces in Gait” and “▶ Interpreting Joint Moments and Powers in Gait”) (Queen and Nunley 2010).

Temporal-Spatial Parameters of Gait In patients with end-stage ankle arthritis, stride length, cadence, walking speed, and support time for the affected limb are all abnormal (Valderrabano et al. 2007; Stauffer et al. 1977; Khazzam et al. 2006). While some of these changes may be attributable to the pathology itself, it is important to consider that alterations in temporal-spatial parameters of gait have been noted as a protective strategy to reduce the load and therefore pain on the affected joint (Mundermann et al. 2004). A number of studies compared gait before and after total ankle arthroplasty and demonstrated improvements in the temporal-spatial parameters of gait after surgery. Brodsky et al. demonstrated statistically and clinically significant improvements in gait after ankle replacement with the STAR prosthesis. Outcomes were recorded for the affected limb, and stride length (+17 cm), cadence (+12.9 steps/min), and walking speed (25.6 cm/s) were all measured to be improved ( p < 0.001) (Brodsky

6

J.M. Kane et al.

et al. 2011). Valderrabano et al. reported similar results with the HINTEGRA prosthesis. Improvements in stride length (+5 cm), cadence (+5.6 steps/min), walking speed (+12 cm/s), and support time ( 0.04 s) for the affected limb were detected. For the unaffected limb, improvements in stride length (+7 cm), cadence (+4.8 steps/ min), walking speed (+12 cm/s), and support time ( 0.07 s) were detected. While these values did not reach those of healthy controls, the temporal-spatial parameters measured did improve both statistically and clinically significantly and more closely replicated the parameters of healthy controls (Valderrabano et al. 2007). Queen et al. reported similar improvements across temporal-spatial parameters of gait. Stride length (+13.9 cm), walking speed (+31 cm/s), and double-limb support time ( 6%) had all statistically and clinically significant improvements (Queen et al. 2012). These results parallel those of earlier authors who also found statistically and clinically significant improvements in stride length, cadence, and walking speed even though improvements did not reach the level of healthy controls (Doets et al. 2007; Dyrby et al. 2004; Houdijk et al. 2008; Ingrosso et al. 2009). In studies comparing total ankle arthroplasty with tibiotalar arthrodesis, a number of important differences were noted. Flavin et al. studied the preoperative and postoperative gait of patients who had total ankle arthroplasty and ankle arthrodesis, evaluating the improvements in each group and comparing them as well. Both groups had significant improvements in multiple parameters of gait following surgery, even though neither group approached the function of normal controls. Certain parameters of gait were superior in each group compared to the other (Flavin et al. 2013). This parallels the results of other authors who report similar postoperative temporal-spatial parameters of gait that are improved over the gait of patients with end-stage ankle arthritis but are not equivalent to the gait of healthy controls (Singer et al. 2013; Hahn et al. 2012).

Kinematic Parameters of Gait Numerous studies have reduction of movement in all three planes, in patients with end-stage ankle arthritis (Valderrabano et al. 2007; Stauffer et al. 1977; Khazzam et al. 2006). Predictably, the greatest loss of motion is in the sagittal plane (dorsiflexion/plantarflexion). Enthusiasm surrounding total ankle arthroplasty is centered on its theoretical ability to preserve and/or recreate normal joint kinematics. While studies have failed to demonstrate a restoration of ankle motion to that of healthy controls, improvements in patients with end-stage ankle arthritis have demonstrable improvement in motion. Brodsky et al. reported an improvement in total sagittal range of motion in the ankle (+3.7 ), knee (+6.6 ), and hip (+4.9 ), all of which were statistically significant. In this study the increase in tibiotalar motion was predominantly in plantarflexion (Brodsky et al. 2011). In a study of the two-component Salto Talaris TAA, there was also increase in ROM (+4.8 ) with the increase occurring predominantly in dorsiflexion (Choi et al. 2013). Valderrabano et al. reported improvements

The Effects of Ankle Joint Replacement on Gait

7

in ankle plantarflexion (+4.3 ), inversion movement (+1.1 ), and adduction movement (+1.6 ). While these values failed to reach range of motion in healthy controls, a statistically significant improvement was noted (Valderrabano et al. 2007). Brodsky et al. also compared the affected limb to the contralateral extremity using multisegment three-dimensional gait analysis. Range of motion in the sagittal, coronal, and axial plane was statistically and clinically less than the unaffected limb (Brodsky et al. 2013). In comparisons between total ankle arthroplasty and tibiotalar arthrodesis, kinematic parameters of gait reveal significant improvements for both cohorts postoperatively, with neither group reaching the values seen in healthy controls for any of the parameters. Flavin et al. reported superior sagittal plane dorsiflexion compared to patients undergoing TAA compared to arthrodesis, while those undergoing arthrodesis had superior coronal plane eversion. However, neither procedure resulted in normalization of kinematic parameters of gait (Flavin et al. 2013). Singer et al. compared postoperative results of total ankle arthroplasty and tibiotalar arthrodesis with healthy controls. Total sagittal range of motion was greater in the arthroplasty group than the arthrodesis group (+4.4 ) but was still not normal. Coronal plane of motion improved to a greater extent in the arthroplasty group (+2.5 ) but was still less than that in healthy controls. Tibial rotation was greater in the arthroplasty group (+1.4 ) and was similar to the control group (Singer et al. 2013). Hahn et al. reported a net increase in total sagittal range of motion of +3 in patients undergoing total ankle arthroplasty with the majority of the improvement gained in plantarflexion, while patients undergoing tibiotalar arthrodesis exhibited decreased sagittal range of motion of 3 (Hahn et al. 2012).

Kinetic Parameters of Gait The kinetic abnormalities associated with end-stage ankle arthritis present distinct patterns compared to arthritis of the hips and knees. Ankle power undoubtedly is reduced as a direct result of the abnormal temporal-spatial and kinematic parameters of gait. Moment is also abnormal. The largest reduction in moment is seen in the axial plane, whereas in knee osteoarthritis the largest reduction is seen in coronal plane moment (Mundermann et al. 2004; Hurwitz et al. 2002). The differences seen with moment may be a result of the etiology of the arthritis in the ankle compared to the hip and knee. The vast majority of patients with end-stage ankle arthritis have a traumatic etiology, whereas patients with hip and knee arthritis usually have primary osteoarthritis. This post-traumatic origin may be associated with muscle and softtissue compromise resulting in reduced function or may be a protective mechanism to reduce the complex ankle joint in an effort to minimize pain. Kinetic analysis has traditionally required that the foot be treated as a single segment in the gait model. Brodsky et al. reported improvements in both ankle power (+0.31 W/kg) and ankle plantarflexion moment (+0.21 Nm/kg) after total ankle replacement with the Scandinavian total ankle replacement (Brodsky et al. 2011). Valderrabano et al. reported similar results – ankle power improved (+0.75 W/kg), ankle plantarflexion

8

J.M. Kane et al.

moment improved (+0.04 Nm/kg), and ankle inversion moment improved (+0.03 Nm/kg). While these results were statistically and clinically significant, they did not reach the values of the unaffected limb (Valderrabano et al. 2007). Queen et al. also reported improvements in kinetic parameters of gait after total ankle arthroplasty. Peak anterior ground reaction force and posterior ground reaction force exhibited statistically and clinically significant improvements as did ankle plantarflexion moment (Queen et al. 2012). In a comparison of total ankle arthroplasty and tibiotalar arthrodesis, Flavin et al. reported that even though results did not reach statistical significance, vertical ground-reactive forces after arthroplasty produced a more symmetric pattern closer to that of healthy controls, compared to arthrodesis (Flavin et al. 2013). Singer et al. reported similar results with total ankle arthroplasty patients more closely replicating healthy controls compared to tibiotalar arthrodesis patients in ankle power (watts), ankle extension moment (Nm/kg), and ankle moment at heel rise (Nm/kg) (Singer et al. 2013).

Case Report A 77-year-old male presented to the office after sustaining a right ankle fracture approximately 20 years prior. He had progressive ankle pain and disability with ambulation and activities. His preoperative imaging studies are listed in Fig. 1a, b. After failing nonoperative treatment modalities including bracing, injections, physical therapy, and activity modifications, the patient decided to undergo total ankle arthroplasty with a mobile-bearing implant. Figure 2a, b demonstrates the postoperative appearance of the ankle. Marked improvements were seen in patientreported outcomes at 2-year status post ankle replacement (Table 1). Improvements in numerous parameters of gait were achieved. Temporal-spatial data is listed in Table 2. Sagittal range of motion is plotted in Fig. 1. While the patient did not have normalization of his gait, there was significant improvement in all parameters of gait and marked improvement in his subjective outcome measures.

Conclusion/Summary Multiple studies have demonstrated objective functional improvement after total ankle arthroplasty, as measured by multiple parameters of gait analysis. This demonstration of improvement in patients compared to their preoperative function did not achieve improvement to a level comparable with normal controls. Interestingly, multiple studies have shown that the similarities of improvement following tibiotalar arthrodesis were far greater than the differences following total ankle arthroplasty, and the loss of sagittal plane motion after arthrodesis can be little or none, at least in patients with the most severe end-stage arthritis.

The Effects of Ankle Joint Replacement on Gait

9

Fig. 1 Isolated post-traumatic end-stage ankle arthritis

Fig. 2 Postoperative radiographs 24 months after undergoing total ankle arthroplasty Table 1 Patient reported outcome measures comparing preoperative and 24 month postoperative scores for the patient in the case report

Outcome SF36 mental SF36 physical AOFAS hindfoot score Visual analogue pain score

Preoperative 30.2 36.9 41 8

Postoperative 63.2 52.3 83 0.5

A fundamental limitation of biomechanical evaluation of gait is the inability of any study or gait model to distinguish the relative contributions of pain relief versus biomechanical change to the improvements in gait, which may account for the similarities in improvements between arthrodesis and arthroplasty patients. Much remains to be illuminated in terms of understanding the biomechanical effects of TAA and especially of separating the effects of surgery from those of arthritis, stiffness, and deformity in the adjacent joints of the hindfoot and midfoot.

10

J.M. Kane et al.

Table 2 Temporal spatial data from three-dimensional gait analysis of the patient in the case report Pre-Op temporal spatial data Cadence steps/ min Right step Right cm length Left step Left cm length Walking m/s speed

74.75925066 29.46351107 35.09333508 0.39657854

Post-Op temporal spatial data Cadence steps/ min Right step Right cm length Left step Left cm length Walking m/s speed

105.8928237 60.80176326 61.22609393 1.073929219

References Barnett CH, Napier JR (1952) The axis of rotation at the ankle joint in man. Its influence upon the form of the talus and the mobility of the fibula. J Anat 86(Pt 1):1–9 Beischer AD, Brodsky JW, Pollo FE et al (1999) Functional outcome and gait analysis after triple or double arthrodesis. Foot Ankle Int 20:545–553 Bolton-Maggs BG, Sudlow RA, Freeman MA (1985) Total ankle arthroplasty. A long-term review of the London Hospital experience. J Bone Joint Surg (Br) 67:785–790 Brodsky JW, Polo FE, Coleman SC et al (2011) Changes in gait following the Scandinavian total ankle replacement. J Bone Joint Surg Am 93:1890–1896 Brodsky JW, Coleman SC, Smith S et al (2013) Hindfoot motion following STAR total ankle arthroplasty: a multisegment foot model gait study. Foot Ankle Int 34(11):1479–1485 Buchner M, Sabo D (2003) Ankle fusion attributable to posttraumatic arthrosis: a long-term followup of 48 patients. Clin Orthop Relat Res 406:155–164 Buck P, Morrey BF, Chao EY (1987) The optimum position of arthrodesis of the ankle. A gait study of the knee and ankle. J Bone Joint Surg 69-A:1052–1062 Choi JH, Coleman SC, Tenenbaum S et al (2013) Prospective study of the effect on gait of a two-component total ankle replacement. Foot Ankle Int 34(11):1472–1478 Coester LM, Saltzman CL, Leupold J et al (2001) Long-term results following ankle arthrodesis for post-traumatic arthritis. J Bone Joint Surg 83-A:219–228 Deland JT, Morris GD, Sung IH (2000) Biomechanics of the ankle joint. A perspective on total ankle replacement. Foot Ankle Clin 5:747–759 Demottaz JD, Mazur JM, Thomas WH et al (1979) Clinical study of total ankle replacement with gait analysis. A preliminary report. J Bone Joint Surg Am 61(7):976–988 Dini AA, Bassett FH (1980) Evaluation of early results of Smith total ankle replacement. Clin Orthop 146:228–230 Doets HC, van Middelkoop M, Houdijk H et al (2007) Gait analysis after successful mobile bearing total ankle replacement. Foot Ankle Int 28(3):313–322 Dyrby C, Chou LB, Andriacchi TP et al (2004) Functional evaluation of the Scandinavian total ankle replacement. Foot Ankle Int 25(6):377–381 Espinoza N, Walti M, Favre P et al (2010) Misalignment of total ankle components can induce high joint contact pressures. J Bone Joint Surg Am 92(5):1179–1187 Flavin R, Coleman SC, Tenenbaum S et al (2013) Comparison of gait after total ankle arthroplasty and ankle arthrodesis. Foot Ankle Int 34(1):1340–1348 Fuchs S, Sandmann C, Skwara A, Chylarecki C (2003) Quality of life 20 years after arthrodesis of the ankle. A study of adjacent joints. J Bone Joint Surg 85-B:994–998 Gill LH (2002) Principles of joint arthroplasty as applied to the ankle. In: American Academy of Orthopaedic Surgeons (ed) Rosemont (AAOS) Instructional course lectures. American Academy of Orthopaedic Surgeons, Rosemont, pp 117–128

The Effects of Ankle Joint Replacement on Gait

11

Glazebrook M, Daniels T, Younger A et al (2008) Comparison of health-related quality of life between patients with end-stage ankle and hip arthrosis. J Bone Joint Surg Am 90(3):499–505 Gougoulias N, Khanna A, Maffulli N (2010) How successful are current ankle replacements?: a systematic review of the literature. Clin Orthop 468:199–208 Haddad SL, Coetzee JC, Estok R et al (2007a) Intermediate and long-term outcomes of total ankle arthroplasty and arthrodesis: a systematic review of the literature. J Bone Joint Surg Am 89:1899–1905 Haddad SL, Coetzee JC, Estok R et al (2007b) Intermediate and long-term outcomes of total ankle arthroplasty and ankle arthrodesis. J Bone Joint Surg Am 89:1899–1905 Hahn ME, Wright ES, Segal AD et al (2012) Comparative gait analysis of ankle arthrodesis and arthroplasty: initial findings of a prospective study. Foot Ankle Int 33(4):282–289 Hintermann B (2005) Total ankle arthroplasty: historical overview, current concepts, and future perspectives. SpringerWien, New York, pp 35–49 Houdijk H, Doets HC, van Middelkoop M et al (2008) Joint stiffness of the ankle during walking after successful mobile-bearing total ankle replacement. Gait Posture 27:115–119 Hurwitz DE, Ryals AB, Case JP et al (2002) The knee adduction moment during gait in subjects with knee osteoarthritis is more closely correlated with static alignment than radiographic disease severity, toe out angle and pain. J Orthop Res 20:101–107 Ilgin D, Ozalevli S, Kilinc O et al (2011) Gait speed as a functional capacity indicator in patients with chronic obstructive pulmonary disease. Ann Thorac Med 6:141–146 Inam S, Vucic S, Brodaty NE et al (2010) The 1-metre gait speed as a functional biomarker in amyotrophic lateral sclerosis. Amyotroph Lateral Scler 11:558–561 Ingrosso S, Benedetti MG, Leardini A et al (2009) Gait analysis in patients operated with a novel total ankle prosthesis. Gait Posture 30:132–137 Inman VT (1991) The joints of the ankle. In: Stiehl JB (ed) Biomechanics of the ankle joint, 2nd edn. Williams & Wilkins, Baltimore, pp 31–74 Kakkar R, Siddique MS (2011) Stresses in the ankle joint and total ankle replacement design. Foot Ankle Surg 17(2):58–63 Kawamura K, Tokuhiro A, Takechi H (1991) Gait analysis of slope walking: a study on step length, stride width, time factors and deviation in the center of pressure. Acta Med Okayama 45:179–184 Khazzam M, Long JT, Marks RM et al (2006) Preoperative gait characterization of patients with ankle arthrosis. Gait Posture 24:85–93 Kirtley C (2006) Clinical gait analysis: theory and practice. Elsevier, New York Kitaoka HB, Patzer GL (1994) Clinical results of the Mayo total ankle arthroplasty. J Bone Joint Surg Am 76:974–979 Kofoed H, Sorensen TS (1998) Ankle arthroplasty for rheumatoid arthritis and osteoarthritis: prospective long-term study of cemented replacements. J Bone Joint Surg 80-B:328–332 Leardini A, OConnor JJ, Giannini S (2014) Biomechanics of the natural, arthritic, and replaced human ankle joint. J Foot Ankle Res 7:8 Ledoux WR, Rohr ES, Ching RP et al (2006) Effects of foot shape on the three-dimensional position of foot bones. J Orthop Res 24:2176–2186 Lewis G (1994) The ankle joint prosthetic replacement: clinical performance and research challenges. Foot Ankle Int 15(9):471–476 Mayich DJ, Novak A, Vena D et al (2014) Gait analysis in foot and ankle surgery – topical review, Part 1: principles and uses of gait analysis. Foot Ankle Int 35(1):80–90 McIntosh AS, Beattyy KT, Dwan LN et al (2006) Gait dynamics on an inclined walkway. J Biomech 39:2491–2502 Mundermann A, Dyrby CO, Hurwitz DE et al (2004) Potential strategies to reduce medial compartment loading in patients with knee osteoarthritis of varying severity: reduced walking speed. Arthritis Rheum 50:1172–1178 Novak AC, Mayich DJ, Perry SD et al (2014) Gait analysis for foot and ankle surgeons – topical review, part 2: approaches to multisegment modeling of the foot. Foot Ankle Int 35(2):178–191

12

J.M. Kane et al.

Piriou P, Culpan P, Mullins M, et al (2008) Ankle replacement versus arthrodesis: a comparative gait analysis study. Foot Ankle Int 29(1): 3–9 Potter JM, Evans AL, Duncan G (1995) Gait speed and activities of daily living function in geriatric patients. Arch Phys Med Rehabil 76:997–999 Queen RM, Nunley JA (2010) The effect of footwear on preoperative gait mechanics in a group of total ankle replacement patients. J Surg Orthop Adv 19:170–173 Queen RM, DeBiasio JC, Butler RJ et al (2012) Changes in pain, function, and gait mechanics two years following total ankle arthroplasty performed with two modern fixed-bearing prostheses. Foot Ankle Int 33(7):535–542 Queen RM, Sparling TL, Butler RJ et al (2014a) Patient-reported outcomes, function, and gait mechanics after fixed and mobile-bearing total ankle replacement. J Bone Joint Surg Am 96:987–993 Queen RM, Butler RJ, Adams SB Jr et al (2014b) Bilateral differences in gait mechanics following total ankle replacement: a two year longitudinal study. Clin Biomech 29:418–422 Reggiani B, Leardini A, Corazza F et al (2006) Finite element analysis of a total ankle replacement during the stance phase of gait. J Biomech 39(8):1435–1443 Saltzman CL, Salamon ML, Blanchard GM et al (2005) Epidemiology of ankle arthritis. Iowa Orthop J 25:44–46 Singer S, Klejman S, Pinsker E et al (2013) Ankle arthroplasty and ankle arthrodesis: gait analysis compared with normal controls. J Bone Joint Surg Am 95(e191):1–10 Stauffer RN (1979) Total joint arthroplasty. The ankle. Mayo Clin Proc 54:570–575 Stauffer RN, Seagal NM (1981) Total ankle arthroplasty. Four years experience. Clin Orthop. 160:217–221 Stauffer RN, Chao EYS, Brewster RC (1977) Force and motion analysis of the normal, diseased, and prosthetic ankle joint. Clin Orthop 127:189–196 Tooms RE (1987) Arthroplasty of ankle and knee. In: Crenshaw AH (ed) Campbell’s operative orthopedics. C.V. Mosby Company, St. Louis, pp 1145–1150 Valderrabano V, Nigg BM, von Tscharner V et al (2007) Gait analysis in ankle osteoarthritis and total ankle replacement. Clin Biomech 22:894–904

Shoulder Joint Replacement and Upper Extremity Activities of Daily Living Hendrik Bruttel, David M. Spranz, Jan M. Eckerle, and Michael W. Maier

Abstract

3D motion analysis is mainly used for lower extremity, especially gait analysis. Upper extremity is a comparably new application and less standardized. The large range of motion and complex anatomy of the shoulder are challenging. However 3D motion analysis can provide deeper insight in the coordinated motion of the upper extremity as multiple joints can be monitored over the whole movement. Anatomical and reverse total shoulder arthroplasty are effective surgical treatment options for patients with osteoarthritis, rheumatic arthritis, cuff tear arthropathy, and traumatic shoulder injuries. 3D motion analysis helps to understand how prostheses influence movement patterns and improve shoulder function. Thus it may help to improve design of future prostheses and could potentially become an objective tool for examination of patients pre- and postoperatively. Although there are no standardized protocols yet, range of motion tasks and activities of daily living have been used in many protocols and proved to be effective tasks for analysis of shoulder function. Keywords

Shoulder kinematics • Shoulder arthroplasty • Shoulder joint replacement • Motion analysis • Upper extremity • Activities of daily living • Range of motion • Proprioception • Upper extremity model

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . State of the Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Applications of 3D Motion Analysis in Patients with Shoulder Joint Replacements . . . . . . . . . . .

2 4 9

H. Bruttel • D.M. Spranz • J.M. Eckerle • M.W. Maier (*) Clinic for Orthopedics and Trauma Surgery, Heidelberg University Hospital, Heidelberg, Germany e-mail: [email protected]; [email protected]; JanMarc. [email protected]; [email protected] # Springer International Publishing AG 2017 B. Müller, S.I. Wolf (eds.), Handbook of Human Motion, DOI 10.1007/978-3-319-30808-1_84-2

1

2

H. Bruttel et al.

Range of Motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Activities of Daily Living . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Angle Reproduction Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9 11 13 14 14

Introduction The shoulder consists of the humerus and the shoulder girdle, which is made up of the scapula and clavicle. Movement of the shoulder is the combined result of four different joints. The clavicle connects the shoulder to the thorax via the sternoclavicular joint and stabilizes scapular movements through the acromioclavicular joint. The scapula is connected to the thorax by muscles only. This allows rotational and translational movement and is called scapulothoracic joint. The glenohumeral joint is a ball-and-socket joint between the humeral head and the glenoid of the scapula. The tendons of the rotator cuff formed by the subscapularis, supraspinatus, infraspinatus, and teres minor stabilize the joint. The glenoid surface is small compared to the humeral head and thus allows a large range of motion (ROM), which is further extended by the shoulder girdle. Fig. 1 gives an overview of the shoulder anatomy. History of shoulder joint replacement goes back to 1893 when French surgeon E. J. Péan implanted the first shoulder prosthesis (Lugli 1978). In 1955 C. S. Neer presented the first modern design for a stemmed humeral head prosthesis used in trauma patients (Neer 1955). Today different designs are available for treatment of degenerative and traumatic injuries of the glenohumeral joint. Design of anatomical hemi- and total shoulder arthroplasty (TSA) aims to imitate the original anatomical situation. A ball-shaped prosthesis replaces the humeral head and was traditionally fixated in the diaphysis. Newer stemless models are fixated in the metaphysis and lead to smaller surgical trauma, shorter procedure time, and reduced loss of bone material, which facilitates revision surgery (Huguet et al. 2010; Berth and Pap 2013; Razmjou et al. 2013; Bell and Coghlan 2014). In TSA the glenoid is replaced by a socket-shaped prosthesis in contrast to hemiarthroplasty where the glenoid is not replaced. Patients with primary or secondary osteoarthritis and rheumatic arthritis can profit from this type of shoulder replacement. Good result for reduction of pain and improvement of range of motion are reported (Deshmukh et al. 2005; Radnay et al. 2007; Kasten et al. 2010; Raiss et al. 2012; Sandow et al. 2013). Special implant designs for traumatic shoulder injuries are also available (Aaron et al. 2013). Reverse shoulder arthroplasty (RSA) uses a different biomechanical design in which the humeral head is replaced by a socket-shaped component and a ball-shaped component is fixated to the glenoid (Grammont and Baulot 1993). This type of prosthesis is traditionally used in patients with rotator cuff tear arthropathy, characterized by rotator cuff insufficiency based on a massive irreparable rotator cuff tear, diminished acromiohumeral distance, and secondary developed arthritic changes of

Shoulder Joint Replacement and Upper Extremity Activities of Daily Living

3

Fig. 1 Anatomy of the shoulder

Fig. 2 Different types of shoulder joint replacements: conventional stemmed TSA (a), stemless TSA (b), and RSA (c)

the glenohumeral joint (Ecklund et al. 2007). Because of the good functional outcome, today the RSA is also used in revision cases and in patients with proximal humerus fractures. RSA caudalizes and medializes the glenohumeral center of rotation (Jobin et al. 2012; Rettig et al. 2013), which improves the lever arm of the deltoid muscle and increases muscle fiber recruitment of the anterior and posterior deltoid to compensate for a deficient rotator cuff. Fig. 2 shows different types of shoulder joint replacements.

4

H. Bruttel et al.

3D motion analysis is mainly used for lower extremity, especially gait analysis. Upper extremity is a comparably new application and less standardized. The large ROM and complex anatomy of the shoulder are challenging. However 3D motion analysis can provide deeper insight in the coordinated motion of the upper extremity as multiple joints can be monitored over the whole movement. In patients with shoulder joint replacement, it can help to understand how the prosthesis implantation influences the movement patterns of the upper extremity and thus may help to improve design of future prostheses. It could potentially become an objective tool for examination of pre- and postoperative patients.

State of the Art Different systems for motion analysis are available and can be used in patients with shoulder joint replacements. Optoelectronic systems use a set of cameras to track skin-mounted markers. These markers can either actively emit light via LEDs or passively reflect infrared light emitted by strobes on the cameras. Three-dimensional positions of markers in sight of more than one camera can be calculated via triangulation. These systems are often used in gait analysis as they can be scaled for tracking in large volumes. Additionally passive systems allow the patient to be unwired. However in upper extremity motion analysis, the subject is usually standing or seated, and large volumes for tracking are not necessary. Another concept is the use of electromagnetic tracking devices that generate a magnetic field in which sensors can track their position. These systems can be used for upper extremity motion analysis (Meskers et al. 1998b, 1999; Hannah and Scibek 2015), and unlike optoelectronic systems, they do not require visual sight of the sensors. Biomechanical models divide the body into segments that are connected by joints. Three non-collinear points per segment are necessary to track all six degrees of freedom. Anatomical landmarks are used to define segments in a biomechanically meaningful and comparable way (see chapter “▶ Upper Extremity Models for Clinical Movement Analysis” for more details on upper extremity modeling). Their position can be tracked either directly via mounting markers on the skin over the respective landmark or indirectly via clusters of markers or electromagnetic sensors. Direct tracking reduces calibration recordings and therefore unburdens the motion analysis protocol. Greater distances between tracked points reduce influence of artifacts. For indirect tracking a cluster or sensor is used to track segment kinematics. A technical coordinate system can be created for each cluster or sensor in which positions of the anatomical landmarks are digitized by recording reference frames using a pointing device (see Fig. 3) attached with markers or sensors (van Andel et al. 2008). Advantages of indirect tracking are reduced number of sensors and markers and therefore faster patient preparation. In case an optoelectronic system is used, they can also provide a workaround for problems with visual sight. In most studies skin-mounted markers or sensors are used. They provide a non-invasive way to record kinematics but are prone to soft tissue artifacts (STA). STA arise due to skin and muscular movement during motion, which may

Shoulder Joint Replacement and Upper Extremity Activities of Daily Living

5

Fig. 3 Pointing device (a) and a cluster of four reflective markers (b)

temporarily move the marker or sensor away from its designated position. Choosing anatomical landmarks with thin soft tissue coverage reduces STA. Some anatomical landmarks, e.g., on the scapula, cannot be tracked directly to sufficient degree and therefore require indirect techniques (Meskers et al. 2007; van Andel et al. 2009; Brochard et al. 2011; Lempereur et al. 2014; Warner et al. 2015). Clusters or electromagnetic sensors can also be attached to bone pins in order to eliminate STA. But due to their invasive nature, the application of this method is limited to small study groups. Calculating Joint Kinematics Joint kinematics can be calculated from kinematic data of two adjacent segments. Different methods to calculate angle time series to describe rotation can be used. Translational movements are usually not accounted for. Euler/Cardan angles are widely used in motion analysis, as they are mathematically well defined and can be calculated to yield clinically meaningful angles (Grood and Suntay 1983). Three angles are used to describe three subsequent rotations around either two (Euler angles) or three axes (Cardan angles) of a coordinate system in order to align them. The resulting angles are dependent on the order of rotations as subsequent rotations are influenced by the rotations before. Euler and Cardan angles therefore provide six different solutions each to achieve the same end position. Since three-dimensional motion does not occur sequentially but at once, it is not always easy to decide, which sequence is most useful. It is therefore necessary to agree on a rotation sequence in order to gain comparable data (Phadke et al. 2011). Another problem is that two points remain undefined (comparable to the poles of the earth). Near these singularities angles change faster as the spatial distance between degrees gets smaller (like longitudes near the poles of the earth), and hence measurement errors increase. Most body joints have a restricted ROM,

6

H. Bruttel et al.

e.g., the hip joint, or are essentially constraint to one plane, e.g., the knee joint. In these cases the singularities can be located outside the physiological ROM and therefore be omitted. However for the shoulder joint, the singularities are within the physiological ROM. Commonly Euler angles are used, and the singularities are thus located at 0 and 180 elevation (Doorenbosch et al. 2003; Wu et al. 2005). For many movements this can be sufficient, but it must be kept in mind for trials involving the neutral pose and maximum elevation. Another way to express joint kinematics is in terms of projection angles. Their advantage is that they are independent of a rotation sequence. However if a vector is orthogonal to one of the projection planes, the corresponding projection angle is undefined, comparable to the singularities of Euler/Cardan angles. Another problem is that projection angles always yield two angles: clockwise and counterclockwise rotated. It is practicable to use the angles between 180 and 180 to reflect the motion according to the neutral zero method. However especially when motion occurs outside the projection planes, values above 180 can be reached, e.g., 200 flexion along with some initial abduction. This would lead to a 360 jump from +180 to 180 in the angle time series and thus contradict intuitive expectation, although mathematically correct. Furthermore independency of the angles leads to the problem that movement reflects in more than one angle time series, e.g., a flexion movement that is slightly out of sagittal plane will reflect in both the flexion and abduction angle. Both Euler/Cardan and projection angles have in common that a three-dimensional movement is described by a set of two-dimensional angles. While easier to understand, they come with the abovementioned problems. Euler/Cardan angles are useful to describe complex movements in an easily understandable way. A projection angle can be especially useful when movement within a plane is measured, e.g., maximum extension to maximum flexion, as they are independent of the other rotations and are very similar to the SFTR system used in clinical exams (American Academy of Orthopaedic Surgeons 1965). Another approach is the use of helical angles. Helical angles use an axis to describe rotation around and translation along this axis. The unit vector of the axis can be multiplied with the rotation angle resulting in three helical angles or an attitude vector (Woltring 1994). Helical angles do not have undefined points, which is useful for upper extremity motion analysis in order to omit the problems arising from the large ROM as mentioned above. However their disadvantage is that they are not intuitively interpretable. Their use for clinical motion analysis is therefore limited, as most clinicians cannot relate to the data. International Society of Biomechanics’ Recommendations In 2005 the International Society of Biomechanics (ISB) published recommendations for motion analysis of the upper extremity (Wu et al. 2005). Anatomical landmarks to define coordinate systems for six segments (thorax, clavicle, scapula, humerus, forearm, and hand) are proposed. Rotation sequences to calculate comparable Euler/Cardan angles for the respective joints are proposed. The ISB intended to standardize motion

Shoulder Joint Replacement and Upper Extremity Activities of Daily Living

7

Table 1 Comparison of anatomical landmarks used in ISB recommendations and HUX model. Short names are taken from the original publication. Hand segment definitions are not included. (Wu et al. 2005; Rettig et al. 2009) Segment Thorax

Clavicle Scapula

Humerus

Forearm a

ISB Incisura jugularis (IJ) Proc. xiphoideus (PX) Proc. spinosus 7th cervical vertebra (C7) Proc. spinosus 8th thoracic vertebra (T8) Sternoclavicular joint (SC) Acromioclavicular joint (AC) Angulus acromialis (AA) Trigonum spinae scapulae (TS) Angulus inferior (AI) Processus coracoideus (PC)a Glenohumeral center of rotation (GH) Lateral epicondyle (EL) Medial epicondyle (EM) Radial styloid (RS) Ulnar styloid (US)

HUX Incisura jugularis (CLAV) Proc. xiphoideus (STRN) Proc. spinosus 7th cervical vertebra (C7) Proc. spinosus 10th thoracic vertebra (T10) Acromion (SHO)

Tuberositas deltoidea (HUM) Ulna distal of olecranon (ELB-ELBW) Radial styloid (RAD) Ulnar styloid (ULN)

PC is not needed for segment definition but for estimating GH according to Meskers et al. (1998a)

analysis of the upper extremity and their recommendations are applied in the majority of upper extremity studies. The proposed anatomical landmarks have relatively little soft tissue coverage and are thus easy to palpate (see Table 1). The location of the glenohumeral center of rotation (GH), which is needed to define the humerus segment, is the only landmark not directly palpable. Its position has to be calculated using a calibration trial. The position is then usually linked to the technical coordinate system of a marker cluster or electromagnetic sensor on the humerus. The ISB proposes two different methods to locate GH. GH can either be derived from adjacent anatomical landmarks using linear regression formulas (Meskers et al. 1998a) or via a functional method. A dynamic calibration trial is used to calculate the pivot point of instantaneous helical axes (Woltring 1990; Stokdijk et al. 2000). The latter method should be preferred (Stokdijk et al. 2000; Wu et al. 2005). The anatomical landmarks used for definition of the scapula cannot be tracked directly as the scapula moves beneath the skin, which leads to unacceptable errors (see Fig. 4). A cluster or electromagnetic sensor on the acromion is often used for scapula tracking (Meskers et al. 2007; van Andel et al. 2009; Brochard et al. 2011; Lempereur et al. 2014; Warner et al. 2015). Heidelberg Upper Extremity Model The Heidelberg Upper Extremity Model (HUX) proposed by Rettig et al. (2009) is another biomechanical model that has been used for shoulder joint replacement studies. It uses a smaller number of

8

H. Bruttel et al.

Fig. 4 Difference between marker position and palpated location (marked with X) of the angulus inferior of the scapula at 90 abduction

anatomical landmarks that are all (besides GH) directly traceable. This eliminates the need to record reference frames for clusters. It uses a least squares method to determine GH and the elbow joint center (Gamage and Lasenby 2002). Angles are calculated as projection angles and are thus comparable to clinical measurements using a goniometer. It proved to be reliable for large ROM and for internal/external rotation in the glenohumeral joint. The latter is described as problematic in humerus cluster-based approaches (Cutti et al. 2005). Scapula motion is not regarded in the model. Only humerothoracical angles are calculated for the shoulder. Marker positions for the HUX model are summarized in Table 1. Methods for Joint Center Localization Both models use GH for defining the humerus segment. Calculation of clinically meaningful humerothoracical angles is not possible without this landmark. Different methods have been proposed, since it is not palpable. Regression methods rely on empirical data gathered in cadaver studies. This empirical data is scaled to the subject using regression formulas (Meskers et al. 1998a). Shoulder joint replacements alter shoulder anatomy and move GH. Use of these methods in this field is questionable since there is no empirical data available for shoulder replacement patients. Another way to calculate position of GH is using functional methods, which have been found to be superior to empirical methods (Stokdijk et al. 2000). Several methods to estimate center of rotation have been proposed (Woltring 1990; Halvorsen et al. 1999; Leardini et al. 1999; Halvorsen 2003; Schwartz and Rozumalski 2005; Ehrig et al. 2006) and can be used. While ISB recommends use of the instantaneous helical axes method (Woltring 1990; Stokdijk et al. 2000; Wu et al. 2005), the least squares method by Gamage and Lasenby (2002) used in the HUX model might be preferable (Lempereur et al. 2010).

Shoulder Joint Replacement and Upper Extremity Activities of Daily Living

9

Applications of 3D Motion Analysis in Patients with Shoulder Joint Replacements Until now 3D motion analysis is not routinely used in patients with shoulder joint replacement pre- and postoperatively. Although development of standardized protocols is under discussion (Kontaxis et al. 2009) and several proposals have been made (van Andel et al. 2008; Vanezis et al. 2015), there is no commonly used protocol yet. Many study protocols use either ROM tasks or a set of activities of daily living (ADL) or both to analyze shoulder function. Furthermore proprioception has been studied using active angle reproduction tests (AAR). The following section intends to give an overview over typical applications for 3D motion analysis in patients with shoulder joint replacements.

Range of Motion Maximum shoulder ROM in frontal, sagittal, and transversal plane is used as a quantitative measurement for shoulder functionality. In clinical examination ROM is measured using a goniometer. Two types of ROM measurements can be differentiated: active and passive ROM. To test active ROM, the patient is asked to move their arm from a neutral position (usually standing with arm hanging loosely besides the body) to the maximum achievable position. This is done for flexion/extension, abduction/adduction, and internal/external rotation. For internal/external rotation, the elbow is usually flexed by 90 in order to have better visual representation of the movement. The other movements are assessed with the elbow at 0 flexion. Passive ROM is tested likewise but the examiner moves the patient’s arm. Difference or equality of active and passive ROM is typical for certain shoulder diseases and can therefore be important for diagnosis. Many study protocols include ROM tasks to assess shoulder function. Usually maximum elevation in frontal (flexion) and sagittal (abduction) plane is tested. Further tasks differ and include maximum internal/external rotation, elevation in scapular plane (40–45 anterior of frontal plane), extension, and adduction. The patient is asked to move the arm to the maximum achievable position starting from a neutral position. These tasks allow assessment of maximum active ROM in the tested planes and therefore give a reference value for evaluation of further tasks. For both TSA and RSA, compromised ROM has to be expected (Veeger et al. 2006; Bergmann et al. 2008; Alta et al. 2014). A larger maximum ROM can be expected in patients treated with TSA compared to patients with RSA (Alta et al. 2014). But Alta et al. (2011) could show that ROM in RSA patients also depends on whether it is used as revision or primary prosthesis. Primary RSA yields greater postoperative ROM than revision RSA. However maximum ROM can be evaluated sufficiently with standard clinical means, i.e., with a goniometer, and 3D motion analysis does not necessarily provide greater accuracy (Rettig et al. 2009). The advantage of motion analysis lies in its ability to measure participating joint’s kinematics throughout the movement. This

10

H. Bruttel et al. Scpula Protraction

Scapula Rotation

40

50

30

40

20

Med [deg] Lat

Ret [deg] Pro

60

30 20 10

10 0 -10

0

-20 40

60 Dep [deg] Ele

80

40

60 Dep [deg] Ele

80

Scapula Tilting

10

Post [deg] Ant

0 -10 -20 -30 -40 -50 40

60 Dep [deg] Ele

80

Fig. 5 Scapula protraction, rotation, and tilting plotted against shoulder elevation during an abduction task

can help to gain an understanding of the coordination of the participating joints. The scapulohumeral rhythm (SRH) is the relation of scapula and glenohumeral motion (Inman et al. 1944). It is often used as a quantitative description for shoulder coordination and can be used to assess compensation for loss of glenohumeral ROM. Accurate recording of scapula kinematics is difficult and subject to research. Generally contribution of the glenohumeral joint, i.e., the prosthesis, to elevation seems to be reduced in both TSA (de Toledo et al. 2012; Alta et al. 2014) and RSA patients (de Toledo et al. 2012; Kwon et al. 2012; Alta et al. 2014; Walker et al. 2015) compared to healthy subjects. In contrast some authors report close to normal values (Bergmann et al. 2008; Braman et al. 2010).

Shoulder Joint Replacement and Upper Extremity Activities of Daily Living

11

Quantitative measurements gained by ROM tasks are useful for clinical evaluation. Maximum ROM values build a bridge between 3D motion analysis results and regular clinical examination. SRH provides a numeric quantification of shoulder coordination that can be used to monitor degree of success in restoring normal shoulder coordination. Pre- and postoperative data can be helpful to evaluate different surgical treatments and improve future prosthetic designs. Advantage of ROM tasks is that they can be standardized rather easily. The shoulder girdle kinematics can be further described with angle/angle plots relating the scapula and clavicle joint kinematics to shoulder elevation (see Fig. 5).

Activities of Daily Living 3D motion analysis is mainly used for lower extremity especially gait analysis. Walking and standing are the main activities. Defining meaningful standardized tasks for upper extremity is more complicated, because there is a great number of tasks that can be performed and analyzed. Ability to handle everyday life has priority for patients with shoulder joint replacements. ADL are basic tasks of everyday life, e.g., drinking, eating, body hygiene, etc. Clinical scores for evaluation of shoulder function such as Constant score (Constant and Murley 1987), DASH (Hudak et al. 1996) and ASES (Michener et al. 2002) use ADL for assessment. However patient self-reported ability to perform certain ADL can differ greatly from examination results (Sager et al. 1992). Therefore an objective evaluation method is necessary. While ROM tasks can give a first impression of overall shoulder function and insight into coordination of the shoulder, their meaning for everyday life is limited. Many ADL do not necessarily need full-shoulder ROM, and different strategies to accomplish the same task can be used. For this reason ADL are often used in motion analysis protocols to assess functionality of upper extremity in a more functionoriented way. ADL differ greatly in terms of complexity and ROM used. Selected ADL should reflect different types of movement to cover most daily needs. They often concentrate on the patient’s body since position of external objects may be altered to meet patient’s abilities and are not standardized. Internal rotation is needed to reach lower body parts, especially the lower back in order to perform perianal care. External rotation is involved in many tasks involving upper body parts, such as hair combing. Adduction is needed for reaching the contralateral body side, e.g., for washing the armpit. The mentioned ADL examples are used in many study protocols. Additionally the ability to reach high objects, such as collecting a book from a shelf, is often tested, which mainly needs a large degree of forward flexion. Other ADL used include eating, drinking, and many more. A useful protocol should select few ADL in order to keep examination and post-processing time low. Many ADL consist of comparable shoulder motion and may not give further insight, e.g., reaching contralateral shoulder and eating use comparable ROM (Magermans et al. 2005). Simplified tasks that just involve touching the respective area are sometimes used (van Andel et al. 2008). While this can facilitate instructions and analysis, their

12

H. Bruttel et al.

meaningfulness for ability to perform ADL is questionable as they are often more easy than the actual task. The patient can be given objects, e.g., a washcloth, to make movements more realistic (Maier et al. 2014a). Another controversial point is how patients should be instructed. Execution of ADL differs from person to person, e.g., hair combing depends on hairstyle and length. But also many other tasks depend on preference and routine. A high degree of standardization is preferable to better differentiate pathological and physiological movement abilities. To gather comparable data, standardized instructions are useful (van Andel et al. 2008). Visual demonstrations by the examiner or by video help patients to perform ADL in the desired way. However depending on the intention of the study, freedom to choose own way of performance may be given (Veeger et al. 2006). While the latter method will most certainly result in more natural movement patterns, especially, analysis of kinematic time series might be difficult. Another factor to be considered is handedness. Some tasks, e.g., washing the armpit, are performed with both sides in everyday life. But the dominant hand usually does other tasks exclusively, e.g., perianal care, although this can be different in patients with shoulder diseases. Differences between dominant and non-dominant side in scapula motion could even be demonstrated for ROM tasks in healthy subjects (Matsuki et al. 2011). Standardized instructions might reduce influence of handedness to some degree, but it should still be considered. Different quantitative measurements can be gathered from ADL tasks. The ROM used for the ADL (AROM), i.e., the minimum and maximum of the respective angle over the whole movement, can be calculated. AROM can be compared to healthy individuals (Veeger et al. 2006; Kasten et al. 2010) pre- and postoperatively (Kasten et al. 2010; Maier et al. 2014a, b) and to maximum ROM values gathered from ROM tasks (Kasten et al. 2010; Maier et al. 2014b). AROM compared to healthy individuals may be a better estimate of patient’s restrictions in everyday life than maximum ROM. Amount of reduction of AROM compared to healthy individuals depends on the ADL and angle analyzed (Maier et al. 2014b). The effect of the implanted prosthesis on the capacity to handle everyday life can be quantified by comparing pre- and postoperative AROM. Both RSA and TSA have been shown to improve AROM (Kasten et al. 2010; Maier et al. 2014a). Rehabilitation is time-consuming, and even 6 months after surgery, there is still improvement to be expected (Maier et al. 2014b). The comparison of AROM to ROM tasks can be helpful to evaluate the ability to make use of postoperative gain in ROM. For elevation, patients seem to be unable to make use of their full ROM (Kasten et al. 2010; Maier et al. 2014a, b). This might indicate room for further improvement by training. A deeper understanding of movement patterns can be gained by describing kinematics with angle time series as done in gait analysis. While AROM may be normal, abnormal movement patterns can still exist. Data is expressed time normalized as 0–100% of movement in order to compare data of different individuals and to calculate mean curves for repeated trials (see Fig. 6). However to our knowledge, only one study by Veeger et al. (2006) provided kinematic time series for ADL in patients with shoulder joint replacements.

Shoulder Joint Replacement and Upper Extremity Activities of Daily Living Fig. 6 Averaged shoulder elevation during a repeated hair-combing task (solid black line) with standard deviation (dashed line) and reference data  one standard deviation (gray band)

13

Shoulder Elevation 140 120

[°]

100 80 60 40 20 0 0

20

40 60 % Movement

80

100

ADL provide a useful tool to assess patient’s ability to handle everyday life as they can provide measurements that are closer to patient’s experience. 3D motion analysis is an objective method for qualitative and quantitative evaluation of ADL and therefore superior to subjective assessment in clinical scores. However development of standardized ADL protocols is necessary in order to gain comparable results.

Angle Reproduction Tests Proprioception plays a key role in body motion. The motor cortex needs the joint positions to plan and coordinate motion. It is provided by the skin, muscle spindles, Golgi tendon organs, and joint capsules. Joint position sense can be evaluated using either passive or active angle reproduction tests. The subjects arm is guided by the examiner to a certain position, e.g., 30 abduction, and then returned to the neutral position. The blindfolded subject is then asked to reproduce the position. This can either be done actively by the subject him-/herself or passively when the examiner guides the arm, and the subject is asked to tell the examiner when the reference position is reached. The difference between the reproduced and the reference position is an indicator for the joint position sense. In patients with shoulder joint replacements, natural anatomy is altered significantly. The impact of shoulder replacement surgery on the body’s proprioception is therefore a field of interest. When a goniometer is used, angle measurements are usually either projected to sagittal or frontal plane (Cuomo et al. 2005). 3D motion analysis can be used to measure the accuracy of reproduction in all dimensions. Therefore it provides a useful instrument for quantifying proprioception. Depending on pose, patients with shoulder joint replacements showed equal or progressing deteriorating proprioception after surgery with TSA performing better than hemiarthroplasty (Kasten et al. 2009; Maier et al. 2012). No differences could be found between stemless and conventional stemmed TSA (Maier et al. 2015).

14

H. Bruttel et al.

Changes in proprioception and their clinical relevance are not yet completely understood, and further research is necessary.

Future Directions Upper extremity motion analysis is mainly used as a tool for research until this day. 3D motion analysis is cost and time intensive, and therefore routine clinical use is difficult to implement. Optoelectronic and electromagnetic motion analysis is restricted to specialized laboratories as they require extensive equipment. However magnetic resonance imaging, computer tomography, and many other technologies show that complex and expensive technology does not necessarily prohibit clinical routine use. Motion data objectively gathered by means of 3D motion analysis can provide deeper understanding of the effects of treatments for orthopedic and trauma patients. Clinical gait analysis is already used routinely in specialized centers for certain orthopedic conditions such as cerebral palsy. As mentioned above upper extremity poses challenges with respect to anatomy and standardized evaluation. For routine use comparable results and standardized reports, which can be understood and interpreted by clinicians, are necessary. The need for standardization regarding modeling (Wu et al. 2005) and protocols (Kontaxis et al. 2009) is commonly recognized, but still many differences in application exist. Partly this may be due to ongoing development of new methods, e.g., for joint center calculation (Lempereur et al. 2010), but clinical gait analysis shows that common ground for routine use can be found despite arising new technologies. Generally acknowledged proposals like the ISB recommendations by Wu et al. (2005) are therefore desirable. New technologies like inertial sensors could lower the barrier for application of motion analysis as they reduce equipment costs and can be used outside specialized laboratories. Parel et al. (2012) proposed a protocol fit for routine application based on this technique. Another topic to be addressed especially in respect to shoulder joint replacement research could be enhancement of upper extremity motion analysis with electromyography (EMG). Surface EMG combined with motion analysis could deepen our understanding of shoulder joint coordination and possible altered muscle function after joint replacement even further. For example, RSA intentionally alters muscular coordination of the shoulder enabling the deltoid to replace defective rotator cuffs, but to our knowledge until this day, only Walker et al. (2014) have used combined EMG and motion analysis in RSA patients.

References Aaron D, Parsons BO, Sirveaux F, Flatow EL (2013) Proximal humeral fractures: prosthetic replacement. Instr Course Lect 62:155–162

Shoulder Joint Replacement and Upper Extremity Activities of Daily Living

15

Alta TD, Bergmann JH, Veeger DJ, Janssen TW, Burger BJ, Scholtes VA, Willems WJ (2011) Kinematic and clinical evaluation of shoulder function after primary and revision reverse shoulder prostheses. J Shoulder Elbow Surg 20(4):564–570. doi:10.1016/j.jse.2010.08.022 Alta TD, de Toledo JM, Veeger HE, Janssen TW, Willems WJ (2014) The active and passive kinematic difference between primary reverse and total shoulder prostheses. J Shoulder Elbow Surg 23(9):1395–1402. doi:10.1016/j.jse.2014.01.040 American Academy of Orthopaedic Surgeons (1965) Joint motion: method of measuring and recording. American Academy of Orthopedic Surgeons, Chicago Bell SN, Coghlan JA (2014) Short stem shoulder replacement. Int J Shoulder Surg 8(3):72–75. doi:10.4103/0973-6042.140113 Bergmann JH, de Leeuw M, Janssen TW, Veeger DH, Willems WJ (2008) Contribution of the reverse endoprosthesis to glenohumeral kinematics. Clin Orthop Relat Res 466(3):594–598. doi:10.1007/s11999-007-0091-5 Berth A, Pap G (2013) Stemless shoulder prosthesis versus conventional anatomic shoulder prosthesis in patients with osteoarthritis: a comparison of the functional outcome after a minimum of two years follow-up. J Orthop Traumatol 14(1):31–37. doi:10.1007/s10195-0120216-9 Braman JP, Thomas BM, Laprade RF, Phadke V, Ludewig PM (2010) Three-dimensional in vivo kinematics of an osteoarthritic shoulder before and after total shoulder arthroplasty. Knee Surg Sports Traumatol Arthros 18(12):1774–1778. doi:10.1007/s00167-010-1167-4 Brochard S, Lempereur M, Remy-Neris O (2011) Accuracy and reliability of three methods of recording scapular motion using reflective skin markers. Proc Inst Mech Eng H 225(1):100–105 Constant CR, Murley AH (1987) A clinical method of functional assessment of the shoulder. Clin Orthop Relat Res 214:160–164 Cuomo F, Birdzell MG, Zuckerman JD (2005) The effect of degenerative arthritis and prosthetic arthroplasty on shoulder proprioception. J Shoulder Elbow Surg 14(4):345–348. doi:10.1016/j. jse.2004.07.009 Cutti AG, Paolini G, Troncossi M, Cappello A, Davalli A (2005) Soft tissue artefact assessment in humeral axial rotation. Gait Posture 21(3):341–349. doi:10.1016/j.gaitpost.2004.04.001 de Toledo JM, Loss JF, Janssen TW, van der Scheer JW, Alta TD, Willems WJ, Veeger DH (2012) Kinematic evaluation of patients with total and reverse shoulder arthroplasty during rehabilitation exercises with different loads. Clin Biomech 27(8):793–800. doi:10.1016/j. clinbiomech.2012.04.009 Deshmukh AV, Koris M, Zurakowski D, Thornhill TS (2005) Total shoulder arthroplasty: long-term survivorship, functional outcome, and quality of life. J Shoulder Elbow Surg 14(5):471–479. doi:10.1016/j.jse.2005.02.009 Doorenbosch CA, Harlaar J, Veeger DH (2003) The globe system: an unambiguous description of shoulder positions in daily life movements. J Rehabil Res Dev 40(2):147–155 Ecklund KJ, Lee TQ, Tibone J, Gupta R (2007) Rotator cuff tear arthropathy. J Am Acad Orthop Surg 15(6):340–349 Ehrig RM, Taylor WR, Duda GN, Heller MO (2006) A survey of formal methods for determining the centre of rotation of ball joints. J Biomech 39(15):2798–2809. doi:10.1016/j. jbiomech.2005.10.002 Gamage SS, Lasenby J (2002) New least squares solutions for estimating the average centre of rotation and the axis of rotation. J Biomech 35(1):87–93 Grammont PM, Baulot E (1993) Delta shoulder prosthesis for rotator cuff rupture. Orthopedics 16 (1):65–68 Grood ES, Suntay WJ (1983) A joint coordinate system for the clinical description of threedimensional motions: application to the knee. J Biomech Eng 105(2):136–144 Halvorsen K (2003) Bias compensated least squares estimate of the center of rotation. J Biomech 36 (7):999–1008 Halvorsen K, Lesser M, Lundberg A (1999) A new method for estimating the axis of rotation and the center of rotation. J Biomech 32(11):1221–1227

16

H. Bruttel et al.

Hannah DC, Scibek JS (2015) Collecting shoulder kinematics with electromagnetic tracking systems and digital inclinometers: a review. World J Orthop 6(10):783–794. doi:10.5312/wjo. v6.i10.783 Hudak PL, Amadio PC, Bombardier C (1996) Development of an upper extremity outcome measure: the DASH (disabilities of the arm, shoulder and hand) [corrected]. The Upper Extremity Collaborative Group (UECG). Am J Ind Med 29(6):602–608. doi:10.1002/(SICI) 1097-0274(199606)29:63.0.CO;2-L Huguet D, DeClercq G, Rio B, Teissier J, Zipoli B, Group T (2010) Results of a new stemless shoulder prosthesis: radiologic proof of maintained fixation and stability after a minimum of three years’ follow-up. J Shoulder Elbow Surg 19(6):847–852. doi:10.1016/j.jse.2009.12.009 Inman VT, JBdM S, Abbott LC (1944) Observations on the function of the shoulder joint. J Bone Joint Surg 26(1):1–30 Jobin CM, Brown GD, Bahu MJ, Gardner TR, Bigliani LU, Levine WN, Ahmad CS (2012) Reverse total shoulder arthroplasty for cuff tear arthropathy: the clinical effect of deltoid lengthening and center of rotation medialization. J Shoulder Elbow Surg 21(10):1269–1277. doi:10.1016/j. jse.2011.08.049 Kasten P, Maier M, Rettig O, Raiss P, Wolf S, Loew M (2009) Proprioception in total, hemi- and reverse shoulder arthroplasty in 3D motion analyses: a prospective study. Int Orthop 33 (6):1641–1647. doi:10.1007/s00264-008-0666-0 Kasten P, Maier M, Wendy P, Rettig O, Raiss P, Wolf S, Loew M (2010) Can shoulder arthroplasty restore the range of motion in activities of daily living? A prospective 3D video motion analysis study. J Shoulder Elbow Surg 19(2 Suppl):59–65. doi:10.1016/j.jse.2009.10.012 Kontaxis A, Cutti AG, Johnson GR, Veeger HE (2009) A framework for the definition of standardized protocols for measuring upper-extremity kinematics. Clin Biomech 24 (3):246–253. doi:10.1016/j.clinbiomech.2008.12.009 Kwon YW, Pinto VJ, Yoon J, Frankle MA, Dunning PE, Sheikhzadeh A (2012) Kinematic analysis of dynamic shoulder motion in patients with reverse total shoulder arthroplasty. J Shoulder Elbow Surg 21(9):1184–1190. doi:10.1016/j.jse.2011.07.031 Leardini A, Cappozzo A, Catani F, Toksvig-Larsen S, Petitto A, Sforza V, Cassanelli G, Giannini S (1999) Validation of a functional method for the estimation of hip joint Centre location. J Biomech 32(1):99–103 Lempereur M, Leboeuf F, Brochard S, Rousset J, Burdin V, Remy-Neris O (2010) In vivo estimation of the glenohumeral joint Centre by functional methods: accuracy and repeatability assessment. J Biomech 43(2):370–374. doi:10.1016/j.jbiomech.2009.09.029 Lempereur M, Brochard S, Leboeuf F, Remy-Neris O (2014) Validity and reliability of 3D marker based scapular motion analysis: a systematic review. J Biomech 47(10):2219–2230. doi:10.1016/j.jbiomech.2014.04.028 Lugli T (1978) Artificial shoulder joint by Pean (1893): the facts of an exceptional intervention and the prosthetic method. Clin Orthop Relat Res 133:215–218 Magermans DJ, Chadwick EK, Veeger HE, van der Helm FC (2005) Requirements for upper extremity motions during activities of daily living. Clin Biomech 20(6):591–599. doi:10.1016/j. clinbiomech.2005.02.006 Maier MW, Niklasch M, Dreher T, Wolf SI, Zeifang F, Loew M, Kasten P (2012) Proprioception 3 years after shoulder arthroplasty in 3D motion analysis: a prospective study. Arch Orthop Trauma Surg 132(7):1003–1010. doi:10.1007/s00402-012-1495-6 Maier MW, Caspers M, Zeifang F, Dreher T, Klotz MC, Wolf SI, Kasten P (2014a) How does reverse shoulder replacement change the range of motion in activities of daily living in patients with cuff tear arthropathy? A prospective optical 3D motion analysis study. Arch Orthop Trauma Surg 134(8):1065–1071. doi:10.1007/s00402-014-2015-7 Maier MW, Niklasch M, Dreher T, Zeifang F, Rettig O, Klotz MC, Wolf SI, Kasten P (2014b) Motion patterns in activities of daily living: 3- year longitudinal follow-up after total shoulder arthroplasty using an optical 3D motion analysis system. BMC Musculoskelet Disord 15:244. doi:10.1186/1471-2474-15-244

Shoulder Joint Replacement and Upper Extremity Activities of Daily Living

17

Maier MW, Lauer S, Klotz MC, Bulhoff M, Spranz D, Zeifang F (2015) Are there differences between stemless and conventional stemmed shoulder prostheses in the treatment of glenohumeral osteoarthritis? BMC Musculoskelet Disord 16:275. doi:10.1186/s12891-0150723-y Matsuki K, Matsuki KO, Mu S, Yamaguchi S, Ochiai N, Sasho T, Sugaya H, Toyone T, Wada Y, Takahashi K, Banks SA (2011) In vivo 3-dimensional analysis of scapular kinematics: comparison of dominant and nondominant shoulders. J Shoulder Elbow Surg 20(4):659–665. doi:10.1016/j.jse.2010.09.012 Meskers CG, van der Helm FC, Rozendaal LA, Rozing PM (1998a) In vivo estimation of the glenohumeral joint rotation center from scapular bony landmarks by linear regression. J Biomech 31(1):93–96 Meskers CG, Vermeulen HM, de Groot JH, van Der Helm FC, Rozing PM (1998b) 3D shoulder position measurements using a six-degree-of-freedom electromagnetic tracking device. Clin Biomech 13(4–5):280–292 Meskers CG, Fraterman H, van der Helm FC, Vermeulen HM, Rozing PM (1999) Calibration of the “flock of birds” electromagnetic tracking device and its application in shoulder motion studies. J Biomech 32(6):629–633 Meskers CG, van de Sande MA, de Groot JH (2007) Comparison between tripod and skin-fixed recording of scapular motion. J Biomech 40(4):941–946. doi:10.1016/j.jbiomech.2006.02.011 Michener LA, McClure PW, Sennett BJ (2002) American shoulder and elbow surgeons standardized shoulder assessment form, patient self-report section: reliability, validity, and responsiveness. J Shoulder Elbow Surg 11(6):587–594. doi:10.1067/mse.2002.127096 Neer CS 2nd (1955) Articular replacement for the humeral head. J Bone Joint Surg Am 37-A (2):215–228 Parel I, Cutti AG, Fiumana G, Porcellini G, Verni G, Accardo AP (2012) Ambulatory measurement of the scapulohumeral rhythm: intra- and inter-operator agreement of a protocol based on inertial and magnetic sensors. Gait Posture 35(4):636–640. doi:10.1016/j.gaitpost.2011.12.015 Phadke V, Braman JP, LaPrade RF, Ludewig PM (2011) Comparison of glenohumeral motion using different rotation sequences. J Biomech 44(4):700–705. doi:10.1016/j.jbiomech.2010.10.042 Radnay CS, Setter KJ, Chambers L, Levine WN, Bigliani LU, Ahmad CS (2007) Total shoulder replacement compared with humeral head replacement for the treatment of primary glenohumeral osteoarthritis: a systematic review. J Shoulder Elbow Surg 16(4):396–402. doi:10.1016/j.jse.2006.10.017 Raiss P, Schmitt M, Bruckner T, Kasten P, Pape G, Loew M, Zeifang F (2012) Results of cemented total shoulder replacement with a minimum follow-up of ten years. J Bone Joint Surg Am 94 (23):e1711–e1710. doi:10.2106/JBJS.K.00580 Razmjou H, Holtby R, Christakis M, Axelrod T, Richards R (2013) Impact of prosthetic design on clinical and radiologic outcomes of total shoulder arthroplasty: a prospective study. J Shoulder Elbow Surg 22(2):206–214. doi:10.1016/j.jse.2012.04.016 Rettig O, Fradet L, Kasten P, Raiss P, Wolf SI (2009) A new kinematic model of the upper extremity based on functional joint parameter determination for shoulder and elbow. Gait Posture 30 (4):469–476. doi:10.1016/j.gaitpost.2009.07.111 Rettig O, Maier MW, Gantz S, Raiss P, Zeifang F, Wolf SI (2013) Does the reverse shoulder prosthesis medialize the center of rotation in the glenohumeral joint? Gait Posture 37(1):29–31. doi:10.1016/j.gaitpost.2012.04.019 Sager MA, Dunham NC, Schwantes A, Mecum L, Halverson K, Harlowe D (1992) Measurement of activities of daily living in hospitalized elderly: a comparison of self-report and performancebased methods. J Am Geriatr Soc 40(5):457–462 Sandow MJ, David H, Bentall SJ (2013) Hemiarthroplasty vs total shoulder replacement for rotator cuff intact osteoarthritis: how do they fare after a decade? J Shoulder Elbow Surg 22 (7):877–885. doi:10.1016/j.jse.2012.10.023 Schwartz MH, Rozumalski A (2005) A new method for estimating joint parameters from motion data. J Biomech 38(1):107–116. doi:10.1016/j.jbiomech.2004.03.009

18

H. Bruttel et al.

Stokdijk M, Nagels J, Rozing PM (2000) The glenohumeral joint rotation Centre in vivo. J Biomech 33(12):1629–1636 van Andel CJ, Wolterbeek N, Doorenbosch CA, Veeger DH, Harlaar J (2008) Complete 3D kinematics of upper extremity functional tasks. Gait Posture 27(1):120–127. doi:10.1016/j. gaitpost.2007.03.002 van Andel C, van Hutten K, Eversdijk M, Veeger D, Harlaar J (2009) Recording scapular motion using an acromion marker cluster. Gait Posture 29(1):123–128. doi:10.1016/j. gaitpost.2008.07.012 Vanezis A, Robinson MA, Darras N (2015) The reliability of the ELEPAP clinical protocol for the 3D kinematic evaluation of upper limb function. Gait Posture 41(2):431–439. doi:10.1016/j. gaitpost.2014.11.007 Veeger HE, Magermans DJ, Nagels J, Chadwick EK, van der Helm FC (2006) A kinematical analysis of the shoulder after arthroplasty during a hair combing task. Clin Biomech 21(Suppl 1):S39–S44. doi:10.1016/j.clinbiomech.2005.09.012 Walker D, Wright TW, Banks SA, Struk AM (2014) Electromyographic analysis of reverse total shoulder arthroplasties. J Shoulder Elbow Surg 23(2):166–172. doi:10.1016/j.jse.2013.05.005 Walker D, Matsuki K, Struk AM, Wright TW, Banks SA (2015) Scapulohumeral rhythm in shoulders with reverse shoulder arthroplasty. J Shoulder Elbow Surg 24(7):1129–1134. doi:10.1016/j.jse.2014.11.043 Warner MB, Chappell PH, Stokes MJ (2015) Measurement of dynamic scapular kinematics using an acromion marker cluster to minimize skin movement artifact. J Vis Exp 96:e51717. doi:10.3791/51717 Woltring HJ (1990) Estimation of the trajectory of the instantaneous centre of rotation in planar biokinematics. J Biomech 23(12):1273–1274 Woltring HJ (1994) 3-D attitude representation of human joints: a standardization proposal. J Biomech 27(12):1399–1414 Wu G, van der Helm FC, Veeger HE, Makhsous M, Van Roy P, Anglin C, Nagels J, Karduna AR, McQuade K, Wang X, Werner FW, Buchholz B, International Society of B (2005) ISB recommendation on definitions of joint coordinate systems of various joints for the reporting of human joint motion – part II: shoulder, elbow, wrist and hand. J Biomech 38(5):981–992

Expert Opinion and Legal Considerations Henry M. Silvester

Abstract

This chapter provides practical assistance to experts who may be required to give evidence in court. It explores the development of law and rules that apply in several jurisdictions. In particular, the legal principles are examined in the context of an Australian jurisdiction which is at the forefront of the relatively new practice in this area – concurrent expert evidence. Important elements are the reliability of expert opinion, the source of knowledge, training and experience, as well as procedural aspects. It is prepared as at October 2016. Keywords

Opinion evidence • Court rules • Litigation • Concurrent evidence

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . State of the Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Evidentiary Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Assumptions of Fact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Court and Tribunal Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Concurrent Evidence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Some Practical Guidance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2 4 4 5 9 11 13 16 18 19 19

H.M. Silvester (*) Barry Nilsson Lawers, Sydney, NSW, Australia e-mail: [email protected] # Springer International Publishing AG 2017 B. Müller, S.I. Wolf (eds.), Handbook of Human Motion, DOI 10.1007/978-3-319-30808-1_85-1

1

2

H.M. Silvester

Introduction Expert evidence in courts and tribunals over the last 20 years has undergone a significant transformation. Gone are the days when a report is commissioned by one party from an expert, served on the other party with little or no requirements as to professional conduct or procedural requirements that allow the admissibility of the opinion evidence. The experience in one of the legal jurisdictions in Australia, namely New South Wales (“NSW”), is used in this chapter as an example of the use of expert evidence in litigation. The reason for this example is that NSW was the origin for one of the most significant recent changes in how experts give evidence in courts and tribunals (Jackson 2016). That change was the process of experts giving evidence concurrently in courts or tribunals having reached agreement and identified areas of disagreement. The purpose of this chapter is to provide an overview. More detailed consideration of applicable rules of evidence and procedures can be found for the Australian approach in Freckelton and Selby (2009) and Odgers (2014). For the United States, see Conley and Moriarty (2011). Justice Peter Garling of the Supreme Court of NSW helpfully identified some relevant judicial comment on the role of experts in litigation (Garling 2015): In The Queen v Turner [1975] QB 834 at 841, Lawton LJ expressed the basis upon which expert evidence is received in these words: An expert’s opinion is admissible to furnish the court with scientific information which is likely to be outside the experience and knowledge of a judge or jury.

A similar description was given by Gaudron and Gummow JJ in Osland v The Queen [1998] 197 CLR 316 when Their Honours said: Expert evidence is admissible with respect to a relevant matter about which ordinary persons are [not] able to form a sound judgment . . . without the assistance of [those] possessing special knowledge or experience in the area.

In the Canadian decision of the Supreme Court in R v Mohan [1994] 2 SCR 9; (1994) 89 CCC (3d) 402 the admission of expert evidence depended on the application of the following (Freckelton and Selby 2009 p.73): (1) (2) (3) (4)

Relevance; Necessity in assisting the trier of fact; The absence of any exclusionary rules; and A properly qualified expert.

The first two questions are often not difficult to address. The rules change between jurisdictions; however, there are similarities. But what is a “properly qualified expert” in the context of litigation? This depends upon the particular

Expert Opinion and Legal Considerations

3

knowledge, training, or experience of the expert against the backdrop of their field of study and compliance with court procedures. In 1995 the Evidence Act was introduced in NSW and also in relation to matters attracting the jurisdiction of the Commonwealth of Australia in an attempt to provide uniform evidence law, including as applied to expert evidence. Combined with the evidentiary requirements are the procedural obligations on experts and the NSW example is Part 31, Division 2 of the Uniform Civil Procedure Rules 2005. In the Australian experience, in order for an expert’s opinion to be considered as a reliable source of evidence for a judge or tribunal member to make a decision about an issue or issues in a dispute, a number of criteria must be met (see Freckelton and Selby 2009): • The evidentiary requirement is that an expert must have specialized knowledge based on training, study, or experience and that the opinion given is founded upon these elements. • Regardless of who has commissioned the expert, the expert’s overriding duty is to the court or tribunal; • They must exercise their independent professional judgment and if possible, endeavor to reach an agreement with any other expert witness qualified on the matter in issue. • There must be a transparency of information relied upon by the expert, not only as to the facts and assumptions of facts upon which the opinion is based, but other material, examinations, tests, or other investigations upon which the expert has relied or drawn upon to reach their opinion. • The duty to the court or tribunal extends to identifying the extent to which the expert may consider that their report is incomplete or inaccurate unless some further information is provided or in the event that insufficient research or data is identified. Also if, after providing an opinion, the expert changes their views, then this new opinion must be given immediately. The basis for these requirements is to ensure that in order for an expert opinion to be relied upon by a tribunal or court, it is in no way misleading and is reliably based upon knowledge, training, research, experience, or study and, where differing opinions are reached between experts, that their best endeavors are made to reach some agreement and to identify areas of disagreement and reasons for such disagreement. If agreement is not possible, then the court or tribunal can clearly have available the differing assumptions or facts or indeed the different interpretation of scientific principle which lead to a disagreement such that findings may be made about those matters and in turn accept or reject the expert opinion.

4

H.M. Silvester

State of the Art Originally developed in NSW in Australia, the innovative procedure of expert witnesses giving “concurrent evidence” in courts and tribunals has gained support in other jurisdictions such as the United Kingdom. This procedure applies before and during the court hearing. Before the hearing, it involves experts meeting in a “conclave” and identifying in a joint written report areas of agreement and disagreement. Then at the hearing the experts give evidence concurrently rather than the more common procedure of separately. Later in this chapter is a discussion of this recent development of “concurrent evidence.”

Evidentiary Requirements For opinion evidence to be admissible in an Australian court, it is necessary for that evidence to be relevant (s.55 of the Evidence Act 1995 (NSW and Commonwealth of Australia), have sufficient probative value (s.135 of the Evidence Act 1995), and comply with ss.76 and 79 of the Evidence Act 1995 which relevantly are: Section 76 The opinion rule (1) Evidence of an opinion is not admissible to prove the existence of a fact about the existence of which the opinion was expressed. (2) . . .

The exception to this is s.79 of the Evidence Act 1995. Section 79 Exception: opinions based on specialised knowledge (1) If a person has specialised knowledge based on the person’s training, study or experience, the opinion rule does not apply to evidence of an opinion of that person that is wholly or substantially based on that knowledge. (2) . . .. If a person has specialised knowledge based on the person’s training, study or experience, the opinion rule does not apply to evidence of an opinion of that person that is wholly or substantially based on that knowledge. Section 79 of the Evidence Act 1995 creates a requirement that opinion evidence must meet the following: • the person giving the opinion evidence must have “specialised knowledge”; • the specialised knowledge must be “based on training, study or experience”; • the opinion must be “wholly or substantially based on that [specialised] knowledge”.

These elements must be established on the balance of probabilities before a court can accept that the opinion expressed in a report is admissible. The secondary consideration, once admitted into evidence, is the persuasive weight that the court may attach to expert opinion. For example, the opinion may be admissible (and for example fulfil the requirements of s.79 of the Evidence Act 1995); however, an

Expert Opinion and Legal Considerations

5

assumption upon which the opinion is based may be established by other evidence which is of itself found to be doubtful. This in turn would cause a court to have doubt as to the value or weight of the expert opinion to the extent that it is based upon that questionable assumption. Significantly, in the event that an assumption or factual basis is not made out in the evidence, then it is doubtful that any weight may be attached to any expert opinion that is based upon an erroneous assumption of fact and indeed the expert opinion may well be rendered inadmissible before the court. Sometimes a court may seek the assistance of an expert to understand something technical without strict compliance with this section (Branson 2006). In considering what amounted to “specialized knowledge,” Justice Gaudron of the High Court of Australia in Velevski v R [2002] 187 ALR 233 at para. 82 observed that “specialized knowledge” was knowledge of matters which are outside the knowledge or experience of ordinary persons and which is sufficiently organized or recognized to be accepted as a reliable body of knowledge. The phrase, “specialized knowledge” is not defined in Australian legislation. In Adler v Australian Securities and Investments Commission [2003] NSWCA 131 at [629]; however, Justice Giles of the NSW Court of Appeal noted that the phrase “is not restrictive; its scope is informed by the available bases of training, study and experience”. It has considerable scope and s.80 of the Evidence Act 1995 also permits that opinion evidence can be admissible even if it is a matter of common knowledge. Example: Assume that you are asked to provide an opinion on whether a passenger in a motor vehicle was wearing a seat-belt when the vehicle was involved in a frontal collision with another vehicle. This example will be used again throughout this chapter to illustrate the practical application of the principles discussed. The first question is do you have the specialized knowledge based on training, study or experience? A bio-mechanical engineer would as they are trained in both engineering and relevant aspects of medicine. An engineer without medical knowledge may be sufficiently expert if, for example, they had significant experience in motor accident reconstruction over a lengthy period of time. A chemical engineer would not pass the specialized knowledge test as their expertise is not relevant to the issue the subject of the opinion, namely whether a seat-belt was worn.

Reliability In determining what constitutes specialized knowledge, courts have often considered whether the knowledge is from a “field of expertise.” An important question to be considered is the extent of the reliability of the evidence. An example of this would be the reliability of survey results upon which opinion or specialized knowledge is based: see for example, Interlego AG v Croner Trading Pty Limited (1991) 102 ALR 379. 1

The 1975 Federal Rules of Evidence (US) did not refer to the Frye test.

6

H.M. Silvester

The United States helped establish the concept of the general acceptance of a field of study as being a source of consideration of the expertise of an expert. An early decision in the United States of Frye v United States 293 F 1013 (1923) relevantly held that when considering the admissibility of expert evidence, “the thing from which the deduction is made must be sufficiently established to have gained general acceptance in the particular field in which it belongs” (also called the Frye test). With some modification and subject to criticism, this remained largely the prevailing test until the majority of the United States Supreme Court considered Rule 702 of the Federal Rules of Evidence (similar to the Australian s.79 of the Evidence Act 1995) in Daubert v Merrell Dow Pharmaceuticals 509 US 579 (1993)1. The court held: [T]he word “knowledge” connotes more than subjective belief or unsupported speculation. The term “applies to any body of known facts or to any body of ideas inferred from such facts or accepted as truths on good grounds”. . .proposed testimony must be supported by appropriate validation – i.e., “good grounds,” based on what is known.

The United States Supreme Court in Daubert went to considerable lengths to identify whether or not expert evidence was reliable based upon valid scientific reasoning or methodology including testing (Freckelton and Selby 2009 pp.63–64): (1) Whether it can be or has been tested . . .; (2) whether the theory or technique has been subject to peer review and publication . . .; (3) the known or potential rate of error and the existence and maintenance of standards controlling the technique’s operation; (4) whether a technique has gained general acceptance within the scientific community.

The concept of reliability became a cornerstone and in 2000, Rule 702 was amended in the United States to require reliability. With effect from 1 December 2000, Rule 702 provided: If scientific, technical or other specialised knowledge will assist the trier of fact to understand the evidence or to determine a fact in issue, a witness qualified as an expert by knowledge, skill, experience, training or education, may testify thereto in the form of an opinion or otherwise, if (1) the testimony is based upon sufficient facts or data, (2) testimony is the product of reliable principles and methods, and (3) the witness has applied the principles and methods reliably to the facts of the case.

It has been the approach in the United States that witnesses will not be permitted to give evidence as experts if their qualifications are not relevant and sufficient (Freckelton and Selby 2009; see also Conley and Moriarty 2011). The Canadian experience may also assist experts in understanding how best to provide opinion evidence. In the DNA profiling case of R v Johnston (1992) 69 CCC 395 at 413, Justice Langdon followed an earlier dissenting judgment of Justice Wilson in R v Beland [1987] 36 CCC (3d) 481 and considered that the judgment of Her Honour Justice Wilson was:

Expert Opinion and Legal Considerations

7

persuasive authority for the proposition that the [Frye 293 F 1013 (1923)] test should not be adopted in Canada. It recommends adoption of a more expansive admissibility standard, relevancy and helpfulness. It suggests by reference to cross-examination and opposing experts, that if the evidence meets initial standards of relevancy and helpfulness, further objections are relevant to weight and not to admissibility.

In relation to novel scientific evidence, Justice Langdon was of the view the following should be considered (at 415): (1) The potential rate of error; (2) The existence and maintenance of standards; (3) The care with which the scientific technique has been employed and whether it is susceptible to abuse; (4) Whether there are analogous relationships with other types of scientific techniques that are routinely admitted into evidence; (5) The presence of failsafe characteristics; (6) The expert’s qualifications and stature; (7) The existence of specialised literature; (8) The novelty of the technique and its relationship to more established areas of scientific analysis; (9) Whether the technique has been generally accepted by experts in the field . . .; (10) The nature and breadth of the inference adduced; (11) The clarity with which the technique may be explained; (12) The extent to which the basic data may be verified by the court and the jury; (13) The availability of other experts to evaluate the technique; and (14) The probative significance of the evidence.

In the United Kingdom, the question of admissibility of evidence was importantly considered in the Court of Appeal decision in R v Luttrell [2004] 2 Cr App R 31; [2004] EWCA Crim 1344. The court held that for expert evidence to be admissible, the study or experience of the expert witness provides authority to the opinion and the witness must be qualified to express the opinion. The court adopted the interpretation of Chief Justice King of the Supreme Court of South Australia in R v Bonython (1984) 38 SASR 45 at 46, in considering the authority of an expert: (a) whether the subject matter of the opinion is such that a person without instruction or experience in the area of knowledge or human experience would be able to form a sound judgment on the matter without the assistance of witnesses possessing special knowledge or experience in the area, and (b) whether the subject matter of an opinion forms part of a body of knowledge or experience which is sufficiently organised to be accepted as a reliable body of knowledge or experience, a special acquaintance with which by the witness would render his opinion of assistance to the court.

If the conditions are met, the evidence is admissible and may be considered by the court whereupon the question becomes one of the persuasive “weight” to be attached to the evidence as opposed to the admissibility of the evidence (Freckelton and Selby 2009). At p.34 in R v Luttrell (supra) the court considered the issue of reliability regarding the question of admissibility:

8

H.M. Silvester In established fields of science, the court may take the view that expert evidence would fall beyond the recognised limits of the field or that methods are too unconventional to be regarded as subject to the scientific discipline. But a skill or expertise can be recognised and respected, and thus satisfy the conditions for admissible expert evidence, although the discipline is not susceptible to this sort of scientific discipline.

In some cases, reliability of evidence might be relevant to admissibility. For example, if the evidence is “based on a developing new brand of science or medicine: until it is accepted by the scientific community as being able to provide accurate and reliable opinion”: R v Gilfoyle [2001] 2 Cr App R 57 at para. 25. Even if the field is recognized, an issue may be whether the particular techniques were sufficiently recognized within the expert’s profession. See, for example, R v Robb (1991) 93 Cr App R 161 regarding the admissibility of voice identification evidence. In Australia the courts have noted that the focus must be on “specialized knowledge,” not to be distracted by questions of reliability that are not specifically referred to in the Evidence Act 1995. A leading case of the Australian High Court, in considering opinion evidence, was HG v The Queen (1999) 197 CLR 414, in which her Honour Justice Gaudron (in dissent) did raise the question of reliability: So far as this case is concerned, the first question that arises with respect to the exception in s.79 of the Evidence Act is whether psychology or some relevant field of psychological study amounts to “specialised knowledge.” The position at common law is that, if relevant, expert or opinion evidence is admissible with respect to matters about which ordinary persons are unable “to form a sound judgment. . .without the assistance of [those] possessing special knowledge or experience. . .which is sufficiently organised or recognised to be accepted as a reliable body of knowledge or experience.” There is no reason to think that the expression “specialised knowledge” gives rise to a test which is in any respect narrower or more restrictive than the position at common law.

As to admissibility of expert evidence, it may well be that there is a trend away from the test of “general acceptance within the professional community” towards reliability, namely evidence given to the “validity and reliability of the technique or theory” (Freckelton and Selby 2009 p.75) although this will vary from jurisdiction to jurisdiction. Example: Continuing the example of an opinion as to whether a passenger in a motor vehicle accident was wearing a seat-belt there may be an issue as to causation. If the passenger was not wearing the seat-belt is this irrelevant as the injuries would have been caused regardless? In a frontal impact, there is well-known scientific evidence to say that wearing a seatbelt reduces injury. In this example, however, let us say that there may be a type of injury discovered relating to low impact frontal collisions, similar to the zygapophyseal joint injury which is difficult to detect. Going back to our example, assume a biomechanical engineer gave an opinion that no seat-belt was worn and had it been worn then the injuries alleged to have been caused would not have been caused (Genn 2012). The level of expertise depending upon the medical training may not be reliable or have the necessary specialized knowledge to comment on the type of injury and its causation if the expert did not have the requisite training or experience with that type of injury. In

Expert Opinion and Legal Considerations

9

this instance, the engineer may have to work with a sufficiently experienced medical professional.

The Assumptions of Fact The principles that apply to the admissibility of expert evidence in Australia have been succinctly set out in one of the leading decisions in the area by Justice Heydon (as he then was, before becoming a High Court judge). His Honour paid particular attention to the assumptions of fact that underpinned an expert’s opinion. In the New South Wales Court of Appeal in Makita (Australia) Pty Limited v Sprowles [2001] 52 NSWLR 705 at para. 85 Justice Heydon observed: In short, if evidence tendered as expert opinion evidence is to be admissible, – it must be agreed or demonstrated that there is a field of “specialised knowledge”; – there must be an identified aspect of that field in which the witness demonstrates that by reason of specialised training, study or experience, the witness has become an expert; – the opinion proffered must be “wholly or substantially based on the witness’ expert knowledge”; – so far as the opinion is based on facts “observed” by the expert, they must be identified and admissibly proved by the expert; – and so far as the opinion is based on “assumed” or “accepted” facts, they must be identified and proved in some other way; – it must be established that the facts on which the opinion is based form a proper foundation for it; – and the opinion of an expert requires demonstration or examination of the scientific or other intellectual basis of the conclusions reach: that is, the expert's evidence must explain how the field of “specialised knowledge” in which the witness is expert by reason of “training, study or experience,” and on which the opinion is ‘wholly or substantially based’ applies to the facts assumed or observed so as to produce the opinion propounded. If all these matters are not made explicit, it is not possible to be sure whether the opinion is based wholly or substantially on the expert’s specialized knowledge. If the court cannot be sure of that, the evidence is strictly speaking not admissible, and, so far as it is admissible, of diminished weight.

Subsequent to this decision, some of the above elements have been qualified. In Sydney-Wide Distributors Pty Limited v Red Bull Australia Pty Limited [2002] FCAFC 157 at paras. 16 and 87, the Full Court of the Federal Court of Australia held that many of the elements referred to by Justice Heydon in Makita “involve questions of degree, requiring the exercise of judgment” and that where a judge is conducting a trial alone without a jury the matters may go to weight rather than admissibility. In the Red Bull case, Her Honour Justice Branson noted: The approach of Heydon JA as set out [in para. [85] of the judgment] is, as it seems to me to be understood as counsel of perfection. As a reading of His Honour’s reasons for judgment as a whole reveals, His Honour recognised that in the context

10

H.M. Silvester

of an actual trial, the issue of admissibility of evidence tendered as expert opinion evidence may not be always to be addressed in the way outlined in the above paragraph. There were three reasons given for limiting the “counsel of perfection.” The first was where opinion evidence was admitted without objection by the parties and in this context, a court rarely would interfere. Secondly, a ruling on admissibility of evidence is usually required during the course of the trial rather than at the end by which time other evidence may impact on the weight of the opinion evidence. As Her Honour Justice Branson noted further in the Red Bull case: . . . It may prove to be the case that evidence ruled admissible as expert opinion will later be found by the trial judge to be without weight for reasons that, strictly speaking, might be thought to go to the issue of admissibility (e.g. that the witness’s opinion is expressed with respect to a matter outside his or her area of expertise or is not wholly or substantially based on that expertise.

The third reason was identified in the earlier case of Quick v Stoland Pty Ltd [1998] 87 FCR 371 at 373-74: The common law rule that the admissibility of expert opinion evidence depends on proper disclosure of the factual basis of the opinion is not reflected as such in the Evidence Act 1995 (Cth.). The Australian Law Reform Commission recommended against such precondition to the admissibility of expert opinion, expressing the view that the general discretion to refuse to admit evidence would be sufficient to deal with problems that might arise in respect of an expert opinion, the basis of which is not disclosed: ALRC Report No. 26, vol. 1 para. 750.

In a further decision in Australia, Cadbury Schweppes Pty Ltd v Darrell Lea Chocolate Shops Pty Ltd [2006] 228 ALR 719, (Federal Court of Australia) Justice Heerey had to determine the admissibility of opinion evidence which had been based on “market research reports” and the like which had not been proved in evidence and were not likely to be “proved”: [2006] 228 ALR 719 at 722 para. 6. At para. 7, Heerey J said: However, I accept the submission of Senior Counsel for Cadbury that this aspect of Makita has not been followed in the Federal Court. The lack of proof of a substantial part of the factual basis of Dr Gibbs’ opinions does not of itself render his evidence inadmissible under s.79. Such lack of proof merely goes to the weight which may be given to the opinion [and the court referred to the decision in Red Bull and other decisions].

There are other qualifications regarding assumptions of fact. Opinion evidence is not admissible if it is found that the opinion was based upon different facts from those disclosed, for example, if the expert was influenced by undisclosed facts. The fact of how the expert came to hold an opinion is relevant to weight: Australian Securities & Investments Commission v Rich [2005] NSWCA 152 per Chief Justice Spigelman. As long as the opinion identifies the reasoning process, it is still able to be admitted into evidence even if there were other tests that could have been done to

Expert Opinion and Legal Considerations

11

underpin the opinion. An opinion may not be excluded because the reasoning process is not fully disclosed if the area of expertise involves some subjectivity. An opinion may be admitted into evidence where the expert has made an observation even when all the nonopinion elements of that observation are not set out. The trend is that it is not necessary to always prove matters which are customarily relied upon by experts in a particular field. The need to prove a fact upon which a particular opinion is based was clarified by Justice Heydon in Rhoden v Wingate [2002] NSWCA 165 at para. 86, namely, that it was not whether the fact is proved to a particular standard but whether “there is evidence which, if accepted, is capable of establishing” the existence of the fact. If a written opinion sets out facts upon which the opinion has proceeded, and the report is admitted into evidence then those facts are also admitted into evidence, unless the court upholds an objection: s.60 Evidence Act 1995. Advocates may make submissions to the court to exclude the admission of the facts upon which the opinion is based until those facts are independently proven (e.g., in NSW and Australian Federal Courts s.136 Evidence Act 1995). Rule 703 of the United States Federal Rules of Evidence provides the following: The facts or data in the particular case upon which an expert bases an opinion or inference may be those perceived by or made known to him [or her] at or before the hearing. If of a type reasonably relied upon by experts in a particular field in forming opinions or inferences upon the subject, the facts or data need not be admissible in evidence.

Example: The expert is asked to assume that the passenger who did not wear the seat belt was involved in a frontal collision. If in fact the collision was not purely frontal but involved a lateral element, then the opinion may be found to be irrelevant as it did not address the effect on causation, for example, of failure to wear a seat-belt in a lateral collision. It is important when being asked to give an opinion based on assumptions that when objectively considered with known facts whether the assumptions make sense and to question the assumptions if they do not make sense.

Court and Tribunal Rules Various jurisdictions have procedural requirements as to the admissibility of opinion evidence. In New South Wales, the procedural requirements on expert opinion are set out in part in Rule 31.23 of the Uniform Civil Procedure Rules 2005 (NSW). It requires that an expert witness must comply with the “Code of Conduct” which is set out in Schedule 7 to the rules and any report is inadmissible unless it contains an acknowledgement that the expert has read the Code of Conduct and has agreed to be bound by it. The same applies to any oral evidence to be received from an expert. The court has a discretion to exempt compliance but the application of the Code of Conduct is treated very seriously and experts are rarely exempted.

12

H.M. Silvester

In essence, the Code of Conduct emphasizes the following: (1) The duty of the expert is to the court rather than any party who has qualified or paid the expert to appear. (2) That the expert is not an advocate for a party but rather an independent, impartial source of evidence and opinion to assist the court. (3) The expert must follow any direction by the court. (4) There is a duty to work co-operatively with other expert witnesses by exercising their judgment and endeavoring to reach agreement. (5) Requirements as to experts’ reports (see discussion above as to evidentiary matters). (6) Experts’ conference – based upon a direction by the court for experts from the parties to proceedings to confer with each other to identify areas of agreement and/or disagreement and if appropriate prepare a joint report. The last aspect is discussed below under “concurrent evidence.” The requirements under the rules for experts’ reports work together with the evidentiary principles and arguably are more specific than the evidentiary requirements for admissibility of the opinion. They extend to an expert identifying if a particular issue falls outside their area of expertise and if they believe that a report is incomplete or inaccurate without some qualification, then the qualification must be stated in the report. Further, if the expert considered that their opinion is not a concluded opinion due to insufficient research or data or any other reason, then this must be stated when the opinion is expressed. If the expert changes their opinion, then the expert must provide a supplementary report detailing the changed opinion. In Australia under the Legal Profession Uniform Law Australia Solicitors Conduct Rules 2015, Rule 24.2.3, a lawyer can draw to the expert’s attention inconsistencies or other difficulties with the evidence but not encourage the expert to give evidence different from the opinion they had formed – put simply the opinion in the report must remain that of the expert. Communications between the expert and the instructing lawyer should not only actually achieve this result, they also must not appear to be otherwise. In most jurisdictions, ethical obligations imposed upon legal practitioners will regulate the extent to which they may engage with the expert in the formulation of an opinion to be relied upon by a court and, while it is usually permissible for legal practitioners to test or challenge opinions in conference prior to the completion of a report, it is not permissible for them to suggest or require a particular opinion to be expressed. The orders of the court to be complied with can sometimes be onerous. Personal explanations may be required by the court from experts as to why orders of the court have not been complied with. Experts often have professional responsibilities outside of their duty to the court including the pressure of business, but this often is not a satisfactory explanation for failure to comply with the court’s directions. Another potential area of tension that may arise are competing interests between obligations to the court and obligations to the expert’s own governing bodies or

Expert Opinion and Legal Considerations

13

associations, including ethical and disclosure obligations. A clear and typical example can be found in medical practitioners and the competing obligation of confidentiality to the patient in contrast to the obligation of full candor to the court. Extreme care must be exercised when balancing the competing obligations.

Concurrent Evidence Concurrent evidence is a relatively new procedural innovation different from the more traditional adversarial approach to testing expert evidence. One of the difficulties with the traditional approach was that there was a separation in the giving of evidence by experts so that evidence was not heard in the context of either other expert evidence or witness evidence received later in the trial. Sometimes experts would have to be recalled to give evidence if new factual evidence had emerged during the course of the trial which changed the assumptions upon which the experts previously gave their opinions. Also during the course of the hearing, one expert would often have to sit in the body of the tribunal and suggest questions to an advocate who was cross-examining another expert. This could be cumbersome. So, not only for reasons of efficiency but also to produce more useful and relevant evidence to a tribunal, in the jurisdiction of New South Wales, Australia, the concept of concurrent evidence began. In Australia, the practice of concurrent evidence was developed and has since been copied in other jurisdictions, such as the United Kingdom2. The stages involved in concurrent evidence are as follows (Garling 2011): • • • •

Preparation The expert conclave – the meeting of experts before the hearing The joint written report Oral evidence

In the usual course, the experts would be instructed by the parties to the litigation to prepare their separate written opinions and this was no different to the previous approach. Then the new procedure is applied to focus attention on the issues in dispute. Once the written opinions are exchanged, and before any trial, the experts have a meeting (the expert conclave) to identify areas of agreement or disagreement. A joint report is then prepared by the experts setting out those areas of agreement and those areas where differences of opinion remain. Where there is a difference of 2

A pilot program was established in Manchester Specialist courts between 2010 and 2013 with His Honour Judge David Waksman QC, the Manchester Mercantile Judge overseeing the pilot, as he then was. Professor Dame Hazel Genn monitored the pilot and published findings in the Civil Justice Quarterly, “Getting to the Truth: Experts and Judges in the ‘Hot Tub’” (2013) 32 CJQ 275–299 (Genn 2012). From 1 April 2013, in the United Kingdom Practice Direction 35 was adopted into the Civil Procedure Rules.

14

H.M. Silvester

opinion, the report should also identify the reasons and the basis for the difference of opinion. Often, prior to the conclave occurring, it is incumbent upon the legal representatives for parties to prepare a joint list of assumptions and questions to be put to the experts for consideration in conclave. The preparation of such a document is often of some significance as it tends to set the agenda for the conclave and focus the attention of the experts upon matters considered by the parties to be relevant considerations. It is also important to remember that the usual course is that the experts meet and prepare their report in the absence of the legal representatives or parties. This reflects the basic premise that the experts overriding obligation is to the court and not to the parties that have retained them. Once the hearing commences, the experts of the same discipline, for example, biomechanical engineers, give their evidence concurrently and the judge or presiding officer of the tribunal chairs a discussion between the experts. In essence, the joint report that the experts have created from their prehearing meeting forms the agenda for the discussion. Advocates appearing for the parties also participate in the “discussion” and can put questions or challenge the experts (either individually or jointly). The process is all controlled by the judge or presiding tribunal member. Advantages of the procedure of concurrent evidence from experts are that it not only saves time and costs, but allows the experts to properly assist the tribunal in reaching a decision about the particular dispute (Jackson 2016). The procedure is also consistent with the well-known process of peer review within academic and professional communities. The peers give their review of each other’s opinions in the context of the hearing before the judge or tribunal. While the process does suffer from some criticism, this is to be expected from any innovation. The key to the process of working effectively is that the experts understand the rules underpinning the process in each jurisdiction. For example, the Supreme Court of NSW, Common Law Division – General Case Management List Practice Note No. SC CL5, Parts 36 – 40 (see also SC CL7 (Professional Negligence List): Concurrent expert evidence 36. This part of the Practice Note applies to all proceedings in which a claim is made for damages for personal injury or disability. 37. All expert evidence will be given concurrently unless there is a single expert appointed or the Court grants leave for expert evidence to be given in an alternate manner. 38. At the first Directions Hearing the parties are to produce a schedule of the issues in respect of which expert evidence may be adduced and identify whether those issues potentially should be dealt with by a single expert witness appointed by the parties or by expert witnesses retained by each party who will give evidence concurrently. 39. In the case of concurrent experts, within 14 days of all expert witness statements/reports being filed and served, the parties are to agree on questions to be asked of the expert witnesses. If the parties cannot reach agreement within 14 days, they are to arrange for the proceedings to be re-listed before the Court for directions as to the questions to be answered by the expert witnesses.

Expert Opinion and Legal Considerations

15

40. In the case of concurrent experts the experts in each area of expertise are to confer and produce a report on matters agreed and matters not agreed within 35 days of the first Directions Hearing or such other time as the Court may order.

Regarding the joint report, rule 31.26 of the Uniform Civil Procedure Rules (NSW) relevantly provides the following: (1) . . .

(2) The joint report must specify matters agreed and matters not agreed and the reasons for any disagreement. (3) The joint report may be tendered at the trial as evidence of any matters agreed. (4) In relation to any matters not agreed, the joint report may be used or tendered at the trial only in accordance with the Rules of Evidence and the Practices of the court. (5) Except by leave of the court, a party affected may not adduce evidence from any other expert witness on the issues dealt with in the joint report.

In addition to the above, individual judges may place emphasis on certain aspects of the procedure, for example the material with which the experts ought to be provided (Garling 2011 p.5): (a) an index of the documents, together with a paginated folder of the documents which is to be put before each expert participating at a joint conference and the giving of concurrent evidence; (b) a complete list of the factual assumptions which are agreed, or else for which each party contends, as the appropriate basis for the joint expert opinion; and (c) the questions which each party contends are appropriate for the experts to be asked to answer.

In the prehearing stage of identifying what is agreed or disagreed, the experts when meeting may be assisted by an independent chairperson. The chairperson can then ensure that the pre-hearing conference takes place in a manner allowing each expert’s opinion to be heard and also be responsible for the joint report arising from the pre-hearing meeting. Justice Garling suggested some guidance to experts giving concurrent oral evidence as follows (Garling 2015 pp.16–17): It is my practice to briefly outline what is intended to happen, describe the roles and functions of those in the court room, and to describe the ground rules for the session. In particular, I find it necessary to remind the participants that only one person can speak at any one time because often when a discussion takes place, the witnesses can forget that they are in a court room. In giving this outline, I commence with identifying the role of the trial judge and remind them that it is the judge’s task to control the proceedings by, in effect, being the chairman of their professional meeting, so as to ensure that the agenda items are all covered in an orderly fashion, to ensure that each of the witnesses has an opportunity to state their opinions and the basis for them and, ultimately, to ensure that the process is conducted fairly to the parties, and each expert and with civility.

16

H.M. Silvester

His Honour went on to note, consistent with the Code of Conduct referred to above, that the role of the experts is to give their evidence truthfully, not as an advocate for either party and impartially. His Honour also helpfully has identified some of the practicalities that can apply which experts would face if they were in court or a tribunal giving concurrent evidence (Garling 2011 p.18): Accordingly, at the conclusion of the first issue (or item on the agenda) and after the judge has finished raising any matters, counsel for each of the parties then, in turn, can question the witnesses ensuring as they do that each expert has the opportunity to answer the question asked. In other words, the examination of the experts by counsel bears little similarity to the typical cross-examination. The purpose of counsel’s questions is to ensure that an expert’s opinion is fully articulated and tested against a contrary opinion, even an opinion elicited by the judge. As Giles JA said at [107] in Turjman v Stonewall Hotel Pty Ltd [2011] NSWCA 392: “When concurrent evidence is being taken with the degree of direction by the judge, counsel are not passengers. They can and should seek to raise material issues and put material questions to the witnesses . . ., if necessary submitting that the judge’s view of how the evidence should be brought out should be modified.” This process is then repeated for each item until all of the issues have been dealt with.

Some Practical Guidance In the context of these obligations, experts need to exercise care when they are preparing either written reports or giving oral evidence in proceedings. An expert must ensure that their opinion is their own, not something which has evolved through draft opinions edited by others to something tending towards advocacy. Rather the opinion must be an independent expert opinion. An expert in a particular field of study may sometimes express opinions outside of that field. The expert must however be cautious when straying into this territory and is likely to be questioned about this in court. Such evidence may not satisfy the test of whether the expert has “specialized knowledge.” This may not only undermine that part of the expert’s opinion but may call into question the persuasive weight as to the balance of the opinion. An example of this is that a mechanical engineer would be able to comment about the forces involved in the collision of vehicles and the transfer of energy to the occupants of the vehicle. That expert, however, arguably would not have the expertise sufficient for an opinion to be admissible regarding assessment of medical evidence such as extrapolating from injuries identified as to the physical forces that may have caused them. A mechanical engineer with biomechanical knowledge, training, or experience, however, more likely would be able to express that as an admissible opinion. Chief Justice Gleeson characterized the evidence in HG v The Queen (supra), as “a combination of speculation, inference, personal and second hand views as to the credibility of the complainant, and the process of reasoning which went well beyond the field of expertise”. It is important for the expert not to stray.

Expert Opinion and Legal Considerations

17

The expert giving the opinion must be careful not to simply engage in factual analysis where their specialized knowledge is unnecessary. In this instance, a judge receives no benefit from the opinion based upon the specialized knowledge of the expert as the judge can perform the factual analysis. Indeed, such opinion is unlikely to be admissible given that it is not based upon the specialized training, experience, or education. An example of this might be the opinion as to whether or not a victim in a vehicle collision was wearing a seatbelt. A judge may have the benefit of the observations of witnesses such as any other passengers in the vehicle and other sources of information such as attending ambulance personnel. The expert should combine their understanding of these observations based upon their knowledge, training, or experience, for example as a biomechanical engineer. The ambulance officers’ notes and hospital records may show abdominal and shoulder bruising suggestive that a seatbelt was worn. Here the expert combines the available facts derived from the medical records and by using their expertise interprets those records to give an opinion as to the likely forces from a seatbelt to the body that would be sufficient to cause the bruising. Chief Justice Gleeson, commented further in HG v The Queen (supra): An expert whose opinion is sought to be tendered should differentiate between the assumed facts upon which the opinion is based, and the opinion in question . . . By directing attention to whether an opinion is wholly or substantially based on specialised knowledge based on training, study or experience, the section [s.79] requires that the opinion is presented in a form which makes it possible to answer that question . . . Experts who venture "opinions" (sometimes merely their own inference of fact), outside their field of specialised knowledge may invest those opinions with a spurious appearance of authority, and legitimate processes of fact finding may be subverted.

The word “substantially” in the context of “substantially based on specialized knowledge” allows the court discretion. This allows an opinion to be admissible if based upon specialized knowledge even if it takes into account matters of “common knowledge”: Velevski v The Queen [2002] 76 ALJR 402 (High Court of Australia) per Justices Gaudron, Gummow, and Callinan. Reliability of an opinion has been an important part of considering whether or not expert evidence can be admissible. For example, the issue of body mapping, facial mapping, and fingerprinting. Fingerprint evidence has a scientific basis and database for purposes of comparison to allow experts to formulate an opinion as to whether one fingerprint is similar to another. Facial or body mapping may lack a similar database or mathematical formula as exists in the case of fingerprint comparison. The issue perhaps is not that the opinion lacked reliability but that without the available specialized knowledge the opinion becomes subjective: Daubert v Merrell Dowell Pharmaceuticals 509 US 579 [1993]. An expert can and often uses materials provided by others in formulating their opinion. A radiologist who produces a Magnetic Resonant Imaging scan which is then used by a surgeon for forming their opinion about diagnosis does not mean that the ultimate diagnosis is not formed “substantially based” on the specialized

18

H.M. Silvester

knowledge, of the surgeon. Care must be exercised when the expert makes their own investigations. For example, the expert should never to contact directly a party to the litigation or witness. A good guide is to first discuss any such investigations with whoever has requested the written opinion. A difficulty emerges where there are joint opinions and as can be seen from the procedural approach to expert evidence this is becoming increasingly important. If there is only oral evidence from 1 of the 2 experts who prepared a report, and the report jointly authored does not identify whose opinion has been formed by which of the two experts, then the requirement that the opinion be wholly or substantially based on the specialized knowledge of the witness may not be fulfilled. If an expert’s opinion has been based upon assumed facts and those facts ultimately have not been separately proven then the expert’s opinion is of little, if any, weight. Also important is the reasoning process, which leads to the formation of opinion. It is important for the experts to have assumptions clearly set out rather than, for example, to be relying on written statements from witnesses that the experts may have been provided. The reason is that then in the process of concurrent evidence, one expert may be interpreting a statement and taking assumptions from it in one way, while other experts may be interpreting the statement differently. By having agreed assumptions or at least clearly formulated contended assumptions, then the basis for their opinion is clear.

Conclusion One of the judges responsible for encouraging the development of concurrent evidence in NSW, Justice McClellan, made the following comment (McLellan 2007 p.17): As far as the decision-maker is concerned, my experience is that because of the opportunity to observe the experts in conversation with each other about the matter, together with the ability to ask and answer each other’s questions, the capacity of the judge to decide which expert to accept is greatly enhanced. Rather than have a person’s expertise translated or coloured by the skill of the advocate, and as we know the impact of the advocate is sometimes significant, you have the expert’s views expressed in his or her own words.

An expert’s own words are all the more persuasive if they are given impartially, based upon identified assumptions, with the authority of knowledge, training, or experience and if they are able to respond to peer review.

Expert Opinion and Legal Considerations

19

Glossary Concurrent evidence A procedure where experts in the same field, each separately qualified by the parties to litigation, give evidence jointly as a panel in court. They identify areas of agreement and disagreement in a joint report before trial. At trial each expert comments on the opinion of the other experts and advocates for the parties also question them. The judge acts as a chairperson and ultimately decides the issues. Court rules Rules of procedure applied to court proceedings. Litigation The process of dispute resolution by the members of courts or tribunals based upon admissible evidence presented by advocates for the parties to the dispute. Opinion evidence Evidence based upon a person’s trainingstudy or experience. It is given initially in writing by an expert and subject to oral or written review by peers, court advocates, and judges based upon applicable court rules

References Branson JC (2006) Expert evidence: a judge’s perspective. Bar News, NSW Bar Association, Sydney, summer 2006/2007 pp. 32–38. Available online http://archive.nswbar.asn.au/docs/ resources/publications/bn/bn_summer0607.pdf Conley I, Moriarty I (2011) Scientific and expert evidence, 2nd edn. Wolters Kluwer, New York Freckelton I, Selby H (2009) Expert evidence – law, practice, procedure and advocacy, 4th edn. Thomson Reuters, Sydney Garling JP (2011) Concurrent expert evidence – reflections and development. Paper presented at the Australian Insurance Law Association, Twilight Seminar Series, Sydney, 17 August 2011. Available online http://www.supremecourt.justice.nsw.gov.au/Documents/garling170811.pdf Garling JP (2015) Concurrent expert evidence – the new South Wales experience. Paper presented at the University of Oxford Faculty of Law, Oxford, 1 December 2015. Available online http:// www.supremecourt.justice.nsw.gov.au/Documents/Speeches/2015%20Speeches/Garling_ 20151201.pdf Genn DH (2012) Manchester concurrent evidence pilot – Interim Report, UCL Judicial Institute, London, January 2012. Available online https://www.judiciary.gov.uk/wp-content/uploads/ JCO/Documents/Reports/concurrent-evidence-interim-report.pdf Jackson LJ (2016) Concurrent evidence – a gift from Australia. Paper presented at the conference of the Commercial Bar Association of Victoria, London, 29 June 2016. Available online https:// www.judiciary.gov.uk/wp-content/uploads/2016/06/lj-jackson-concurrent-expert-evidence.pdf McLellan JP (2007) Concurrent evidence. Paper presented at the medicine and law conference of the Law Institute of Victoria, Melbourne, 29 November 2007. Available online http://www. supremecourt.justice.nsw.gov.au/Documents/mcclellan_2007.11.29.pdf Odgers S (2014) Uniform evidence law, 11th edn. Thomson Reuters, Sydney

Injury Mechanisms in Traffic Accidents Brian D Goodwin, Sajal Chirvi, and Frank A Pintar

Abstract

This chapter aims to describe and explain current understandings of injury mechanisms in motor vehicle crashes. The following sections are organized according to anatomical regions and associated injuries. This chapter was limited to discussing mechanisms of severe and/or fatal injuries. Real-world injury biomechanics and injury forensics studies play an essential role that impels new developments in vehicle design and safety enhancements. The science of impact mechanics aims to explain injury mechanisms, characterize biomechanical systems, estimate injury risk, and analyze approaches to injury prevention. The intent of this chapter is to review current understandings of injury mechanisms common in traffic crashes. Keywords

Vehicle crash • Vehicle collision • Injury scale • Injury probability • Injury criteria • Motor vehicle

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Epidemiology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Current Approaches and Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . State of the Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Head and Neck . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Head and Brain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cervical Injury . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2 3 3 7 7 7 9

B.D. Goodwin (*) • S. Chirvi • F.A. Pintar Neuroscience Research Labs – Research 151, Medical College of Wisconsin, Zablocki VA Medical Center, Milwaukee, WI, USA e-mail: [email protected]; [email protected]; [email protected] # Springer International Publishing AG 2017 B. Müller, S.I. Wolf (eds.), Handbook of Human Motion, DOI 10.1007/978-3-319-30808-1_93-1

1

2

B.D. Goodwin et al.

Thorax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Traumatic Rupture of the Aorta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Thoracic Spine Injuries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Intra-abdominal Injuries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pelvis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fracture Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mechanisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lower Extremities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mechanisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

13 13 13 15 17 21 21 22 23 25 25 26 28 29 29

Introduction An adequate understanding of injury mechanisms is necessary to implement new engineering approaches for preventing injury. Severe injuries can cause various degrees of life change or death. The innate desire to avoid injury along with the value placed on a single human life has fueled safety regulations in vehicle and road design. Generally, the efficacy of solutions in safety engineering has relied on the explanatory power of injury mechanisms for a given injury event. In general, injury results from the transmission of a critical amount of energy into the body. An injury mechanism can be defined as a description of either the cause of energy transmission resulting in injury or the circumstantial effects that lead to unsustainable stresses or strains (mechanical failure). Beyond a mechanistic understanding of injury, injury criteria describe the relationship between a measureable physical response and the risk of injury. Injury criteria are often formulated from an understanding of the injury mechanism, or the injury mechanism is discovered through finding the “best fit” between the injury risk and a collection of physical responses. Injury tolerance can be defined as the point at which an injury criterion meets a threshold that is specified based on a statistical probability representing risk (cf. Section ▶ Head and Neck) (Hardy et al. 2015). Validating injury mechanisms in the lab can often be difficult due to biomechanical dissimilarities between a healthy human that has an active musculoskeletal response and a postmortem human surrogate (PMHS) that passively responds. Contributing factors such as muscle contractions tend to confound estimation of injury mechanisms. The musculoskeletal response needs particular attention especially when the injured party can anticipate a given traffic impact. Consequently, injury mechanism hypotheses are often deemed as confirmed (or disconfirmed) through inferential approaches using forensic evidence from real-world crashes in conjunction with PMHS experimentation.

Injury Mechanisms in Traffic Accidents

3

Epidemiology In 2002, traffic crashes were the 11th leading cause of death worldwide (2.1% of all deaths and 25% of injury-related deaths) contributing 20–50 million injuries and 1.2 million deaths per year (Kauvar and Wade 2005). By 2013, the estimated nonfatal traffic injuries had risen to 102 million (>100% increase) and fatal to 1.5 million ( 25% increase) (Haagsma et al. 2016). The Global Burden of Disease (GBD) has developed a metric called the DALY (disability-adjusted life years), which quantifies the years of life lost and disability years due to injury, disease, violence, etc. (where an increase in the DALY corresponds to an increase in death and injury rates). From 1990 to 2013, DALY values have decreased in rich nations (the most substantial being – 67% in Asia Pacific and – 61% in Western Europe), while overall changes in DALY values were not significant in developing nations indicating essentially no change. Four-wheeled injuries significantly increased in South Asia (+22%) and sub-Saharan Africa (+20%) (Haagsma et al. 2016). These epidemiological data indicate enhanced safety design across major automobile manufacturers as well as improved road conditions and traffic management. However, governments in developing nations have virtually ignored road and traffic conditions. Injuries incurred from traffic crashes are a significant cause of morbidity and mortality. The continued burden of traffic crash injuries involves rigorous experimental and forensic biomechanics to better understand injury mechanisms. Traffic crash biomechanics have the end goal of adapting road and vehicle design to everchanging traffic conditions.

Current Approaches and Data Analysis Injury Severity Quantification A variety of methods have been developed for quantifying injury severity over the years, but it remains a difficulty to standardize an objective method for injury severity. Currently, injury severity metrics are based on consensus as well as historical data. There remains a need for precision in injury severity quantification. In research, defining injury risk relies heavily on the Abbreviated Injury Scale (AIS) for both quantifying historical injury severities as well as model development (e.g., probability of death score; PODS). Injury severity scores are an important measure for biomechanical research used for retrospective analyses that enable engineering advancements with the aim to minimize injury risk from traffic-related incidents. AIS The Abbreviated Injury Scale (AIS) of the Association for the Advancement of Automotive Medicine (AAAM) is a numerical rating system for quantifying injury severity in motor vehicle crashes (MVCs). AIS is a proprietary classification system and its use requires trained personnel to properly code a victim. As a result, not all traffic-related incident victims are coded according to the AIS. Currently, AIS is in its sixth revision, and it is the most widely used metric for coding injury severity. The

4

B.D. Goodwin et al.

AIS code consists of a 6-digit value followed by a single-digit value (i.e., 123456.1). The first “pre-dot” number identifies the body region, type of anatomical structure, its specific structure, and level (Gabler et al. 2015). The second component of the score has a range between 0 and 6 corresponding to no injury up to the most severe injury, respectively. High AIS ratings indicate injuries with high mortality rates, as it is essentially a threat-to-life scale. The AIS is implemented for all body regions, each with different methods for quantifying injury severity. For example, AIS level for a vertebral body compression fracture depends on the height of the vertebra after injury, and the AIS level for a skeletal chest injury is a function of the number of sustained rib fractures (as well as the presence of a flail chest for AIS  3). AIS levels also apply to soft tissue damage, which depend heavily on the organ that was damaged.

ISS, NISS, and PODS The Injury Severity Scale (ISS) is a measure of the probability of survival. The ISS value is simply calculated by finding the maximum AIS values from each body region (eight body regions total: head, neck, face, chest, abdomen, spine, extremity, and external) then summing the squares of the top three maximum AIS values. This calculation gives the ISS metric a range of 1 to 75. The goal of the ISS level is to quantify the overall bodily injury severity. The New Injury Severity Score (NISS) incorporates two modifications of the ISS: (1) the sum of the squares of the top three ISS scores no matter the body region and (2) if any AIS score is 6, then NISS is automatically set to 75. ISS is more widely used, but NISS has been shown to better predict hospitalization durations and organ failure (Balogh et al. 2000, 2003; Gabler et al. 2015). The Probability of Death Score (PODS) is an estimate of probability of survival. PODSa is the same estimate while accounting for victim age. The two probabilities are quantified as follows. PODS ¼ PODSa ¼

ex 1 þ ex

(1)

for PODS x ¼ 2:2ðAIS1 Þ þ 0:9ðAIS2 Þ  11:25 þ C

(2)

x ¼ 2:7ðAIS1 Þ þ 1:0ðAIS2 Þ þ 0:06ðAGEÞ  15:4 þ C

(3)

for PODSa

Where C = 0.764 for cars, and AIS1 and AIS2 are the first and second highest AIS values. The equations above have been derived from an empirical data fit (Somers 1983a, b). The advantage of PODS over ISS is that it describes real historical data and its value has an explainable meaning by virtue of its basis in probability.

Injury Mechanisms in Traffic Accidents

5

Fig. 1 Example of a human injury probability curve where impact force is the independent variable for estimating probability/risk of injury

Human Injury Probability Curve A human injury probability curve (HIPC) is an estimate of the injury risk with respect to a defined variable called the injury criterion. Real-world forensic data provides knowledge of possible injuries. Safety engineering aims to minimize risk of these known injuries. Quantifying injury risk is left to biomechanics research for identifying injury mechanisms, establishing injury criteria, designing PMHS experiments, and quantifying human tolerance through statistical analyses. Injury criteria are the independent variables for which injury risk is estimated, and these depend on both the method for processing experimental PMHS data (Yoganandan et al. 2014b) and the statistical methodology for estimating risk (Parr et al. 2013; Yoganandan et al. 2014a, 2015a). Local criteria such as stress and strain or global criteria such as force and moment are often selected based on principles of failure mechanics (Fig. 1). In this sense, an injury risk of 50% means that one-half of the considered population has a tolerance limit lower than the corresponding value of the independent variable (or injury criterion). Injury risk estimates or HIPCs do not translate well across differing loading conditions or body regions. As a result, many injury risk curves are necessary to assess whole body risk under well-defined conditions. For example, risk of femur fracture under 3-point bending does not translate to fracture risk during axial loading of the femur. Human tolerance or risk of injury (c.f. Fig. 1) is computed using statistical techniques such as survival analysis. A survival analysis approach lends itself to injury risk estimation because of the nature of datasets to which the HIPC is fit (McKay and Bir 2009; Parr et al. 2013; Yoganandan et al. 2013b). Datasets that contain human injury and noninjury experiments contain various types of censored (uncertain) data, which represent injurious and noninjurious experiments. A survival analysis has been commonly carried out for clinical studies where the time of survival needs to be estimated following diagnosis of a fatal disease. The time of

6

B.D. Goodwin et al.

survival is considered censored when the date of death is unknown, and the survival analysis incorporates statistical techniques to account for datasets with censored data (Pintar et al. 1998; Funk et al. 2002; Yoganandan et al. 2014a; Petitjean et al. 2015; Yoganandan et al. 2015a). For HIPC generation, an injury criteria variable is used for estimating risk instead of time of survival.

Explaining Mechanisms Through Forensic Data Retrospective studies rely heavily on traffic crash, diagnoses, and patient outcomes. Databases of field data are managed by government departments and fall into one of three categories (Gabler et al. 2015): (1) fatal crash data, (2) details of crash investigations, and (3) higher-level data of all fatal and nonfatal crashes. This chapter will give a brief overview of popular databases that are managed in the United States (Gabler et al. 2015), and the reader is encouraged to seek other sources for more in-depth descriptions of internationally available tools. CIREN Although international databases have been developed, the Crash Injury Research Engineering Network (CIREN; managed by National Highway Traffic Safety Administration) and National Automotive Sampling System (NASS) contain vast amounts of forensic data. Multiple CIREN centers in the United States collected crash and medical data from as early as 1996. CIREN has enabled detailed retrospective analyses from medical and biomechanical perspectives (Augenstein et al. 2000; Augenstein and Diggs 2003; Kirk and Morris 2003). For example, Yoganandan et al. (2009) used CIREN data to survey a database of over 1,800 traffic crash victims and found that diffuse axonal injury occurred in crashes where victims experienced contact-induced blunt force head trauma (Yoganandan et al. 2009). NASS The National Automotive Sampling System (NASS) is a database containing detailed characteristics of more than 5,000 crashes each year. This database contains data from crashes with a range of severities from no injury to minor to fatal, which involved cars, light trucks, vans, and/or sport utility vehicles. The NASS database is populated by NHTSA crash investigators who document data and evidence from crash sites through forensics, photography, interviews, and hospital injury severity codes. FARS The Fatality Analysis Reporting System (FARS) is a comprehensive database of all traffic related fatalities in the United States. This database includes traffic crashes of all vehicle types including bicyclist and pedestrian fatalities. FARS contains less specific data than the NASS, which includes over 400 data elements per crash compared to about 175 elements in FARS. For a crash to be included in the FARS database, at least one fatality must occur within 30 days of the incident. Injuries are coded using the coarse KABCO scale (killed, incapacitated, moderate injury,

Injury Mechanisms in Traffic Accidents

7

complaint of pain, or property damage only). FARS has been maintained by NHTSA since 1975 and contains data from 30,000 to 40,000 fatal accidents.

State of the Art Elucidating injury mechanisms from MVCs requires robust experimentation and data analysis in applied impact mechanics. Generally, exploratory research within the field of biomechanics is rare since the field has evolved to focus heavily on applied biomechanics for injury prevention/minimization. As a result, analytical techniques for estimating injury risk have improved significantly (Pintar et al. 1998; Petitjean and Trosseille 2011; Takhounts et al. 2013). New PMHS experimental methodologies are consistently published to estimate biomechanical responses under simulated impact conditions. Multiple PMHS experiments are performed with the aim to generate a mean response or response corridor with high biofidelity. Injury and noninjury responses are then used for deriving injury criteria, which enables injury risk estimation under defined loads. Since PMHS specimens are unidentical, data censoring approaches are necessary to estimate risk for the average human. The survival analysis can handle censored data types. It is a fairly new approach to estimate injury risk along with a confidence level of that risk, which has been called the human injury probability curve (HIPC) (cf. Sec. Human Injury Probability Curve). The field of applied biomechanics has advanced significantly where experimental setups can closely simulate MVCs including impacts from rollover (Lessley et al. 2014), oblique (Yoganandan et al. 2015b), and cyclist/pedestrian (Matsui and Oikawa 2015) collisions. High-speed videography when integrated with digital image correlation (DIC) (Anuta 1970; Chu et al. 1985) provides the means to capture widespread strain (and strain rate) in large-scale applications (e.g., vehicle deformation) or small-scale experiments (e.g., bone or ligament strain) under highly dynamic or quasistatic loading conditions (McCormick and Lord 2010; Begonia et al. 2015).

Head and Neck Head and Brain Brain Injury It is generally accepted that traumatic brain injury (TBI) results from sudden movement of the head (usually caused by blunt impact), resulting in both linear and angular acceleration. Studies show that these two modes of acceleration contain fundamental differences in terms of their injury mechanisms. Diffuse axonal injury (DAI) is the most common TBI where sharp, dynamic head movements directly cause neuron damage leading to neurologic sequelae. A study from 2009 found that a survey of 1,823 cases contained brain traumas to 67 (41 adults) vehicle occupants that were coded as DAI (3.6%; no crash resulted

8

B.D. Goodwin et al.

in more than one DAI) (Yoganandan et al. 2009). Within the adult sample size, 33 were lateral crashes (80%), and all DAI occurrences involved head contact loads. Hardy et al. (2001) were able to show that pure angular acceleration (20–25 ms duration) exposes white matter to tensile strains resulting in damaged axon fibers (Hardy et al. 2001), while Anderson et al. (2003) observed that DAI severity was correlated with peak linear and angular accelerations in the sheep model (Anderson et al. 2003). Neural damage occurred ipsilateral and contralateral to the impact, which suggests that brain matter undergoes shear and/or tensile stresses due to its abrupt displacement relative to the skull. Axonal stretching under angular acceleration still requires further validation as an injury mechanism, but it is the most reasonable hypothesis for DAI. Linear acceleration subjects the brain to a pressure wave beginning with compression where the cells nearest to the impact experience the highest compressive forces. An effective fluid percussion device has been used in dogs to show that a pressure wave causes concussion (Gurdjian et al. 1955). These waves varied in duration and pressure from 1 to 46 ms and 34.5 to 345 kPa, respectively. VandeVord et al. (2012) were able to show that externally applied hyperbaric blasts cause significant neurotrauma in rats, which lead to cognitive deficits from diffuse glial cell damage (Vandevord et al. 2012). Short duration (5–10 ms) angular accelerations from blunt impact can also rupture veins or arteries causing acute subdural hematoma (ASDH) resulting in cell damage (Davceva et al. 2012).

Acute Subdural Hematoma ASDH is one of the most deadly injuries having a mortality rate of greater than 50% across studies. The ischemia resulting from vascular damage causes neuronal damage, and head trauma patients die or become seriously disabled (Wilberger et al. 1991). ASDH is found in victims of blunt force trauma and is the most grave of injuries due to its high incidence (30%), high mortality (60%), and injury severity (common Glasgow Coma scores range 3–5) (King 2015). Data seems to point to a 50% risk of ASDH subsequent to experiencing a severe TBI (>9 Glasgow Coma score) based on a study from 17 Austrian centers, which documented the injuries and outcomes of 360 patients (Leitgeb et al. 2012). Among those patients with an ASDH, 47% died in the hospital, 19% survived with “unfavorable” outcome, and 32% survived with a “favorable” outcome. Bridging vein rupture has been viewed as the injury mechanism for ASDH. However, as King points out, this mechanistic explanation seems unsound since fluid mechanics principles are violated by virtue of the adhesive resistance at the interface of the dura and arachnoid (King 2015). Blood would have to enlarge the space between dura and arachnoid before it is visible through medical imaging. Blood would flow into the superior sagittal sinus if it were to flow in the path of least resistance (and it will), which implies that there is insufficient pressure for an ASDH to form at the dura-arachnoid interface. While the evidence is strong for ASDH formation following bridging vein rupture (Gennarelli and Thibault 1982; Depreitere et al. 2006), this mechanism is difficult to defend in terms of fluid mechanics. Four hypotheses have been proposed by King (2015): (1) almost all ASDHs from impacts

Injury Mechanisms in Traffic Accidents

9

result from cortical arterial rupture at the dura-arachnoid interface where arachnoid border cells need to separate to tear cortical arteries, (2) radial separation of the border cell layers occur when under dynamic skull deformation from direct impacts, (3) angular acceleration subjects the border cell layer to shearing deformation causing cortical artery rupture, and (4) the bridging vein rupture is only correlated to the events surrounding the formation of ASDH but does not cause it (King 2015). The first hypothesis would provide a sensible explanation for ASDH formation, the second provides a reason as to why ASDH forms at sites remote from the impact since skull deformation does not occur only locally, the third is more unlikely due to the “slow” motion of the brain surface when exposed to angular accelerations under 10,000 rad/s2 at the head center of gravity, and the fourth is the logical consequence if the bridging vein rupture hypothesis is false.

Cervical Injury Epidemiology Traumatic spinal cord injuries (TSCIs) from MVCs represent the majority of spinal cord trauma cases per capita on an international level (Haagsma et al. 2016). With the exception of poor regions and regions that contain with highly dense populations (Tropical Latin America, South Asia, Oceania, and Eastern Europe; regions per World Health Organization), traffic-related incidents involving four-wheeled vehicles, motorcycles, bicycles, or pedestrians account for approximately 50% or more of documented TSCI cases on a per region basis (Sekhon and Fehlings 2001; Middleton et al. 2012; Lee et al. 2014). TSCI cases from traffic-related incidents in developed regions remains either stable or reduced, while underdeveloped regions fail to constrain the rise in TSCI cases due to poor traffic conditions and diminished vehicle safety standards (Haagsma et al. 2016). Neck Pain and Whiplash Neck pain is the most commonly reported symptom following a rear end vehicular collision. Evidence suggests that 50% of victims of whiplash report pain 1 year after the injury, where greater initial pain, number of symptoms, and degree of debilitation predicted recovery rates (Carroll et al. 2009). Nociceptors are nerve endings that act as pain receptors, and they have a high stimulus threshold to action potential initiation compared to other nerve endings. The signals from nociceptors are sent to the spinal cord and brain by which pain is perceived. Intervertebral discs often become enflamed from whiplash injury, which has the effect of lowering the stimulus threshold of surrounding nociceptors causing an increased sensitivity to pain. The consequent pain sensations will then be amplified under reduced loading conditions giving the perception of chronic pain. Additionally, nociceptors become more concentrated and numerous around the degenerated region in the soft tissue. Within the intervertebral space, nociceptors reside in facet capsules, spinal ligaments, tendons, and muscles (Deng et al. 2000). Barnsley et al. (1995) and Bogduk and Marsland (1988) provide clinical evidence of

10

B.D. Goodwin et al.

cervical facet pain in patients with neck pain following whiplash (Bogduk and Marsland 1988; Barnsley et al. 1995). A number of hypotheses have been proposed for the precise mechanism of whiplash that leads to chronic pain after initial injury. Anatomical complexities of the neck make it difficult to converge on a hypothesis without conflicting premises (Siegmund et al. 2001, 2008). Yang and Begeman (1996) proposed a shear force hypothesis where forces are transferred up the cervical spine to the occipital condyles (Yang and Begeman 1996). The torso is thrust forward and its momentum pulls the head forward subjecting the intervertebral space to shear especially at lower levels where the facet angle is farther from the vertical. In this case, the pain is attributed to straining facet capsules. Deng et al. (2000) as well as Lu et al. (2005) confirm the hypothesis that painful signals are received from nociceptors surrounding the facet capsule and ligament resulting from shear forces in the neck (Deng et al. 2000; Lu et al. 2005). Their PMHS and animal experiments show that the neck is subjected to compression, tension, shear forces, flexion, and extension throughout the duration of the impact. The shear force hypothesis has been further corroborated through computational models (Stemper et al. 2004; Panzer et al. 2011; Fice and Cronin 2012). Literature seems to indicate that the mechanism of injury discussed here is quite definitive. Close attention can be given to the engineering of methods to best prevent whiplash injury unless compelling evidence is put forth that disconfirms the shear force hypothesis.

Cervical Spine Injury Since the most severe injuries of the cervical spine involve vertebral body fractures or dislocations, this section is limited to an overview of vertebral body anatomy of the cervical spine. Additional information regarding anatomical detail is left to other sources (Nightingale et al. 2015). The cervical spine is comprised of seven vertebrae where the occipital condyle (OC) sits on top of C1 (OC-C1 joint) and C7 rests on top of T1 (C7-T1 joint) (Fig. 2). The cervical spine has a slight curve, called lordosis, during nominal posture. The upper cervical spine (UCS; OC to C2) is anatomically distinct from the rest of the cervical spine. Technically, C1 does not have a vertebral body but is a ring with distinguishably larger facets (Fig. 2). The UCS is capable of a greater range of motion than the rest of the spine. Damage to the UCS is almost unsurvivable due to the vulnerability of the spinal cord in the UCS. As a result, data on UCS fractures is sparse since victims are not rushed to the hospital (Yoganandan et al. 1989). The C1 vertebra (also known as the “atlas”) is susceptible to a multipart fracture, and fatalities are common with a four-part fracture (Jefferson 1919). Vertebral bodies in the lower cervical spine (LCS; C3 to T1) are uniformly shaped and have lesser range of motion than those of the UCS. Burst fractures, dislocations, and fracture dislocations are common to the lower cervical spine vertebrae, which include fractures of the cortical bone, endplate fractures, and loss of disk height. Injuries to the spinal cord are less common with fractures in the LCS compared to the UCS (Fig. 3).

Injury Mechanisms in Traffic Accidents

11

Fig. 2 Cervical vertebral bodies

Much consideration has been devoted to clinical classifications of neck injuries (Allen et al. 1982; Torg 1985; Yoganandan et al. 1989; Myers and Winkelstein 1995), but the complexity of the cervical spine leads physicians and researchers to necessarily base classifications on inference and anecdotal evidence (Nightingale et al. 2015). Relatively speaking, the neck is strong during compression, but headfirst impacts from small drop heights (0.5–2 m) have produced a variety of neck injuries in PMHS experiments (Nusholtz et al. 1981; Yoganandan et al. 1986). Neck compression injuries arise from conditions where the victim lands headfirst such as vehicle occupant ejection, motorcyclist ejection, or vehicle rollover. The fracture level and type are a function of the buckling mode during injury, which is subject to the initial orientation of the spine (Nusholtz et al. 1981, 1983), or the degree of neck lordosis (Culver et al. 1978). Compression injury mechanisms are further complicated by the decoupling response of the head and neck (Yoganandan et al. 1991). Yoganandan et al. (1991) were among the first to use whole cervical spines under compressive loads and suggested that the neck behavior during buckling influences injury. Vertical impact tests on well-constrained head and neck specimens without lordosis (straightened necks where the occipital condyle was approximately concentric with T1) were performed by Pintar et al. (1995), which produced a wide range of injuries, and the complex neck behavior revealed no single metric as an injury predictor (Pintar et al. 1995). Additionally, Nightingale et al. (1996a, b, 1997) performed a variety of experiments with differing postures and impact angles and quantified compressive failure loads (Nightingale et al. 1996a, b, 1997). A wide range of

12

B.D. Goodwin et al.

POSTERIOR

ANTERIOR

Facets Capsular ligament

Intertransverse ligament

Anterior longitudinal ligament Anterior one-half annulus fibrosus Interspinous and supraspinous ligaments

Posterior one-half annulus fibrosus

Ligamentum flavum Posterior longitudinal ligament

Fig. 3 Vertebral body with ligaments

injuries were produced including midlevel burst fractures, odontoid fractures, and Hangman’s fractures. Midlevel fractures are best explained in terms of the complex kinematics of first and second order buckling of the neck. Neck tension-extension injuries in the cervical spine are much less common than compression injuries but are more likely to have higher injury severity and have a fatal outcome. Etiologically, extension injuries have been found in victims who were not wearing a seat belt or were too close to the airbag when it deployed. Tensionextension injuries are very difficult to reproduce since neck musculature has a substantial influence on the presence and/or nature of the injury itself (Chancey et al. 2003). The maximum tensile tolerance of the neck essentially is increased by 19% when the occupant anticipates the injurious event by activating neck muscles.

Injury Mechanisms in Traffic Accidents

13

Regardless, the ligamentous cervical spine will fail at lower loads, but the neck tensile tolerance is increased during muscle activation. Some PMHS experiments fail to reproduce injuries observed in real-life traffic accidents due to the relaxed musculature of the cadaver. It remains to be seen how each parameter that defines the initial conditions prior to injury affect the characteristics of the resulting injury. However, the body of work on this topic suggests that injury traits are influenced by the orientation of the neck, head, and subject-specific anatomical characteristics (e.g., degree of lordosis) immediately before injury.

Thorax Introduction Chest injury ranks just behind head injury in overall number of fatalities in traffic accidents (Services UDoHaH 2007). During a vehicle collision, the thorax is exposed to vehicle interior components including restraint systems, each of which poses varying risks. Thorax injuries are common in frontal and side collisions as well as their oblique counterparts. A retrospective analysis by Nirula and Pintar (2008) shows that the incidence of severe chest injury ( AIS 3) was 5.5% and 33% in NASS and CIREN, respectively (Nirula and Pintar 2008). The steering wheel, door panel, armrest, and seat were all identified as contact points with substantial risk of severe injury to the thorax. The reader is encouraged to look to other sources for an overview of thorax anatomy in the context of injury mechanisms (Cavanaugh and Yoganandan 2015). The following survey of injury mechanisms will refer to various regions of the thorax with very little review of anatomy basics.

Traumatic Rupture of the Aorta Background Though traumatic rupture of the aorta (TRA) does not happen frequently, it has the highest fatality rate in traffic crashes of all injuries to the thorax. Crash data from 1998 to 2006 shows that TRAs occurred in approximately 1% of all vehicle occupants in traffic-related incidents, but was the cause of 21% (8%) of all fatalities (Shkrum et al. 1999). In the occurrence of a TRA, there is only a brief time-window where the injury can be treated before it becomes fatal. Bertrand et al. (2008) found that TRAs were twice as common in occupants involved in side impact (2.4%) compared to frontal impact (1.1%) in the United Kingdom (Bertrand et al. 2008). The difference in TRA incidence from side impact happened after the advent of air bag and seat belt restraints to prevent frontal crash injuries.

14

B.D. Goodwin et al.

Fig. 4 Heart anatomy of interest

Anatomy The aorta extends from the base of the left ventricle of the heart at the aortic root. The ascending arc of the aorta is relatively flexible while the descending aorta is secured to the thoracic spine via the pleural reflection. The peri-isthmic region is between the anastomosis of the left subclavian artery and the descending aorta. The peri-isthmus is the most common place for TRAs to originate (Fig. 4). Mechanism Tears in the aorta have occurred in the peri-isthmic region for an estimated 94% of all TRAs (Katyal et al. 1997). Past PMHS studies struggled to reproduce TRAs until Hardy et al. (2006) was able to demonstrate that TRAs can be induced in PMHS specimens through longitudinal quasistatic loading (Hardy et al. 2006). This study was also the first to orient the specimen to place the diaphragm, heart, and aorta in an anatomically consistent way with a healthy human. Interestingly, the aorta was subjected to tension without having to induce whole body acceleration. Circumferential tears were found to be almost ubiquitous among the observed TRAs. Both the intima (innermost layer) and media (middle, thicker layer) of the aorta were found to be sensitive to tearing (Cammack et al. 1959). The accepted mechanism of injury for TRA is tension in the aorta, which apparently develops immediately after (not during) direct chest impact. Hardy et al. (2008) used radio-opaque markers to track the response of the peri-isthmus region while acquiring high-frequency x-ray, and they point out specific catalysts of TRA: (1) deformation of the thorax, (2) elongation (longitudinal stretch) of the aorta, and (3) the tethering of the aorta to the spine, which promotes stretching (Hardy et al.

Injury Mechanisms in Traffic Accidents

15

2008). Here, PHMS specimens were impacted in different modes: shoveling, side impact, submarining, and combined. All impact modes caused TRAs by transverse tears and two oblique tears.

Thoracic Spine Injuries Injuries to the thoracic spine have predominantly originated from traffic-related incidents (Robertson et al. 2002b; Leucht et al. 2009), which could be a surprising statistic since it is difficult to imagine how the thoracic spine could be exposed to such forces in a vehicle passenger in a seated position. In the United States, 8% of spine fractures from MVCs are AIS  3, and 9% of all thoracic spine injuries are AIS  3 (Wang et al. 2009). In a retrospective data analysis from the United Kingdom, almost 23,000 patients were surveyed for motorcycle and MVC victims (Robertson et al. 2002a). Spinal injuries were present in 126 (11.2%) motorcyclists and 383 (14.1%) car occupants. The thoracic region was the most common spine injury in motorcyclists (54.8%; n = 126), while thoracic spine injuries in car occupants were present in 26.6% (n = 383) of spinal injury cases. Similarly, Pintar et al. (2012) analyzed CIREN and NASS data from MVCs and found the dorsal spine to be particularly vulnerable despite public awareness and seatbelt use in the United States (Pintar et al. 2012) (Fig. 5).

Compression Fracture Perhaps nonintuitively, injuries related to spine compression are often observed in frontal and rear-end crashes or in the presence of abrupt decelerations/accelerations. Begeman et al. (1973) simulated frontal crash responses in PHMS using a sled (Begeman et al. 1973). Three-point seat belts (with shoulder strap) were found to promote spine compression compared to a single lap restraint. Begemen et al. found that a deceleration of about 15 G’s results in over 600 lbs of shoulder belt tension and a spine load of about 900 lbs. Additionally, it was found that the axial loads are augmented when the body is held erect by the shoulder restraint. Yang and King (1995) hypothesized that the shoulder strap restrains the upper body before the forward momentum of the body can be sufficiently diminished (King and Yang 1995). The combined force from the forward momentum and the upper restraint acts on the thoracic spine forcing it into a straightened or lordotic posture, which produces dangerous loads on more caudal segments. Furthermore, the asymmetric 3-point belt restraint will also load the spine asymmetrically exposing vertebral bodies to concentrated loading conditions. Forensic clinical studies provide a real-world risk of spine compression injury from frontal crashes (Ball et al. 2000). Burst fractures between L1 and T12 levels were found in 80% of patients where 3-point seat belts were fastened while in only 25% of patients wearing a lap belt. Nonetheless, a recent retrospective study (Pintar et al. 2012) points out that despite the frequent occurrence of thoracolumbar fractures in frontal impacts, the precise injury mechanism remains elusive.

16

B.D. Goodwin et al.

Fig. 5 Spine load in PMHS during horizontal deceleration from a frontal MVC

Flexion Distraction Injury Flexion-distraction injuries (or Chance fracture (Chance 1948)) have been found in vehicle occupants involved in MVCs, and they are not limited to the thoracic spine. For example, Chance fractures have been reproduced through airbag deployment in PMHS experiments (Cheng et al. 1982). The thoracolumbar region seems to be especially exposed to the possibility of Chance fractures during a MVC. The mechanism of injury involves fracture initiation in the posterior aspect of the neural arch and continues anteriorly (Fig. 6). The spine experiences hyperflexion followed by a distraction (Stemper et al. 2015). This injury has been especially common in occupants wearing only a lap belt or an improperly secured 3-point restraint. A Canadian study analyzed medical data of eight children involved in an MVC wearing lap or 3-point restraints (Santschi et al. 2005). Of the five children wearing a

Injury Mechanisms in Traffic Accidents

17

Fig. 6 Flexion-distraction injury (Chance fracture)

lap belt, four experienced flexion-distraction fractures to the lumbar spine and three were permanently paralyzed. Flexion-distraction fractures were accompanied with intra-abdominal injuries, and this coincidence seems to be a pattern (LeGay et al. 1990). The high incidence of spinal injury and intra-abdominal injuries that result from wearing a lap belt restraint was apparently enough motivation for the United States to prohibit the lap belt design in all cars sold in the United States since September of 2007 (NHTSA 2005).

Intra-abdominal Injuries Anatomy The abdomen is conventionally described through nine regions: (R/L) hypochontriac, (R/L) lumbar, (R/L) iliac, epigastric, umbilical, and hypoastric (Fig. 7). This section will focus on anatomical features pertinent to traffic accident injury, but for a detailed description of abdomen anatomy, the reader is encouraged to look to other sources (Hardy et al. 2015). The lower ribs offer protection from blunt trauma to the upper abdomen, and the anterolateral abdominal wall (skin, subcutaneous tissue, muscle, fascia, and parietal peritoneum) provides additional protection to the abdominal viscera. Soft tissues and organs (Fig. 8) are tethered through the mesentery, which is made up of two layers of peritoneum either between organs or tethering viscera to the abdominal wall. A certain amount of movable freedom is granted to abdominal

18

B.D. Goodwin et al.

Fig. 7 The abdominal region

organs based on the length of the tethering mesentery, which protects from robust vibrations and abrupt accelerations. The stomach, small intestine, large intestine, and gallbladder are classified as hollow or membranous organs, and these organs are especially prone to serosal tears and perforations in MVCs (Shinkawa et al. 2004). Abdominal organs that are classified as solid include the liver, spleen, kidneys, and pancreas. Solid organs tend toward increased injury severity in traffic crashes, and the liver and spleen are the most frequently injured abdominal organs in MVCs (Klinich et al. 2010; Hardy et al. 2015). The liver is highly vascular and fluid-filled, and it is the largest internal organ in the body located in the upper right region of the abdomen. The liver is vulnerable to blunt trauma and bone fragments from a rib fracture can puncture, and depending on the magnitude of the impact, rupture or burst. The spleen is located in the upper left region of the abdomen, and it receives protection from the lower rib cage from blunt impact. Spleen rupture is the most common injury in traffic crashes.

Background The abdomen is a highly complex region of the body where the effects of different loads or impacts on individual organs are difficult to quantify (Yoganandan et al. 2000). A study that analyzed NASS data from 1993 to 1998 found that approximately half of 129,269 abdominal injuries resulted from front-end collisions

Injury Mechanisms in Traffic Accidents

19

Fig. 8 Soft tissues and organs of the abdomen

(Yoganandan et al. 2000). AIS-6 abdominal injuries were present in 94 vehicle occupants and occurred from right (22%) and frontal (78%) impacts. Of the abdominal trauma cases, 31% were spleen injury, 30% were liver injury, and ~33% of all injuries were coded AIS  3. Frontal collision MVCs yielded injuries most common in the liver (39%), spleen (29%), and digestive organs (11%). Other common injuries (>AIS-2) were of the kidney, diaphragm, arteries, and urogenital systems. The abdomen is especially vulnerable to injury during side and oblique impacts, and the state of automobile safety has progressed to curtail abdominal injuries through side air bags and side curtain air bags (Baur et al. 2000; Yoganandan et al. 2007). Due to anatomical location, the spleen or the liver has been the most seriously injured from far-side impacts (Augenstein et al. 2000).

Mechanisms Ball et al. (2000) point out that there is a high probability of intra-abdominal injury requiring laparotomy in patients injured while wearing only a lap belt compared to a 3-point restraint ( 60% vs. 25%) (Ball et al. 2000). Incidence of life-threatening intra-abdominal trauma from wearing a lap belt during vehicle collision has been reported in approximately 50% of flexion-distraction injury patients (Gertzbein and

20

B.D. Goodwin et al.

Court-Brown 1988; LeGay et al. 1990; Green et al. 1991). Anderson et al. (1991) reported a high rate of abdominal trauma associated with crashes where passengers were wearing only lap belts (Anderson et al. 1991). Thirteen of 20 patients (65%) required laparotomy, and eight of nine younger patients (>16y.o.) had lifethreatening intra-abdominal injuries. Anderson et al. note that children appear more likely to experience abdominal injury from lap belts in the event of a collision. Other literature is in agreement with the study by Ball et al. (2000), which found that the most common injuries were small-bowel perforations and serosal tears. A PMHS study from 2015 found that specimens experience lateral abdominal deflections >50 mm in a vehicle with side and curtain airbags exposed to an oblique side-impact collision at approximately 25 km/h (Yoganandan et al. 2015b). These abrupt deformations of the thorax resulted in fractures to the four lowest ribs, which increases the risk of liver or spleen puncture and dramatically increases the internal pressure of abdominal organs facilitating rupture. Arm placement modestly influences the nature of injury during side impact to vehicle occupants, but it seems to mainly affect the number of rib fractures and their locations (Kemper et al. 2008). Injury criteria for the abdomen have been extensively studied where almost every measurable mechanical response has been investigated for use as a predictor of injury for the abdomen (Yoganandan et al. 2001). First and perhaps foremost, correlations have been identified between the presence of injury (and severity) and the amount of abdominal compression (Melvin et al. 1973; Stalnaker et al. 1973; Viano et al. 1989; Lamielle et al. 2008; Hardy et al. 2015). The most common organs (or regions) of the abdomen that have been injured during abdominal compression are upper and lower abdomen, liver, spleen, jejunum-ileum, and pancreas. Second, both the nature of the injury and its severity have been found to be considerably sensitive to the change in velocity during impact or impactor speed (Melvin et al. 1973; Kroell et al. 1981; Yoganandan et al. 2000). This velocity criteria was then taken a step further and combined with compression (or V*C), which was called the abdominal injury criteria (AIC) (Rouhana et al. 1985). It was found that the severity of abdominal injury was correlated well with AIC. Other variations of this criteria have been proposed, such as Vmax  Cmax (Stalnaker and Ulman 1985) and VC(t)max (called the viscous tolerance criterion for thoracic impact) (Viano and Lau 1983), each having good correlation with injury severity under various impact conditions. Kroell et al. (1986) investigated the relationships between velocity, compression, and heart rupture trauma from blunt impact to porcine subjects, and they found a better correlation between Vmax  Cmax over VC(t)max for probability of injury and AIS  4 (Kroell et al. 1986). Lastly, force or the manner in which force is applied influences the nature and severity of injury (Stalnaker et al. 1973; Trollope et al. Trollope et al. 1973; Haffner et al. 1996). Talantikite et al. (1993) performed 25 pendulum impact experiments on excised human livers at various speeds, and they found that peak forces (as measured from the pendulum) greater than 500 N yielded injuries AIS = 3 or greater (Talantikite et al. 1993). In this same study, whole body experiments appeared to have a tolerance threshold of 4.4 kN, which caused deflection of half the abdomen. Studies have also been carried out using variants of the force injury criteria such as impact energy, Fmax  Cmax, and pressure. Organ

Injury Mechanisms in Traffic Accidents

21

pressure correlations to injury seem to be limited to the liver (Foster et al. 2006), kidney (Rouhana et al. 1985), and lower abdomen (Miller 1989). Sparks et al. (2007) were able to show that internal pressure is a reasonable predictor of abdominal injury but found that the product of peak internal pressure and the peak time-derivative of pressure was the best predictor of liver injury (Sparks et al. 2007). These data suggest that there is a 50% risk of injury with a vascular pressure of 64 kPa (9.28 psi). The abdomen seems to have a wide variety of injury mechanisms, which vary with organ tissue. Consequently, mechanisms of abdominal injury have been simplified through individual injury criteria.

Pelvis Pelvic fractures are considered severe injuries, and pelvic injury is often accompanied by damage to other organs. A high incidence of pelvis injuries exists among traffic crash victims (Ooi et al. 2010). The mortality rate of occupants in MVCs with pelvic fractures seems to indicate that pelvis fractures are accompanying injuries; i.e., pelvis injuries seem to be only correlated to incidence of death (Gokcen et al. 1994; Petrisor and Bhandari 2005). However, evidence suggests that fracture related hemorrhaging is associated with a high risk of death following pelvic disruption (Gabbe et al. 2011).

Anatomy The pelvis is a ring structure that supports the flexible spinal column and transmits the stress of weight bearing to the lower extremities. It is surrounded by a complex arrangement of muscles providing a thick compliant covering over the pelvis. The pelvis is stronger in the vertical and longitudinal directions to bear loads during walking and running. Conversely, it is weaker along the lateral direction due to small pubis bones at the front of the pelvic girdle. Four bones form the pelvis. Two innominate bones form the anterior and lateral walls of the ring, and the sacrum and coccyx make up the posterior wall. Each innominate bone consists of three fused segments, the ilium, ischium, and pubis. The fusion occurs around a cup-shaped articular cavity called the acetabulum, or the socket part of the hip joint. The ilium is divided into the large wing-like ala and the body of ilium forming the superior part of the acetabulum. The anterior superior iliac spine (ASIS) and the posterior superior iliac spine (PSIS) are bony prominences at the anterior and posterior extremities of the iliac crest. Sacroiliac joints attach the sacrum to the ilia through weight bearing synovial joints. The articular surfaces of the sacroiliac joint contain irregular depressions that interlock the two bones allowing limited joint motion.

22

B.D. Goodwin et al.

The ischium forms the lower lateral part of the innominate bone, and it constitutes the posterior third of the acetabulum cup. The lowest portion of the body is the ischial tuberosity, which supports the upper torso in a seated posture. The pubic bone is irregularly shaped and contains a body and two rami: the superior and inferior pubic rami. The superior ramus extends from the body to the mid-sagittal plane, where it articulates with the corresponding ramus on the opposite side. The joint formed by the two superior rami is called pubic symphysis, which is a slightly movable joint containing a cartilaginous disc between the two bones. The inferior pubic rami join each other through the pubic symphysis. There are two major ligament groups in the pelvic region: (1) the ligaments surrounding the vertical load bearing sacroiliac joints, and (2) the ligaments passing between the sacrum and ischium. The former consists of anterior sacroiliac ligaments, posterior sacroiliac ligaments, and interosseous ligaments. The latter consists of sacrospinous ligaments and sacrotuberous ligaments.

Fracture Types Fracture types are classified according to the energy input to the pelvis as well as the direction of the force impact (Linnau et al. 2007). Pennal and Tile (1980) classified pelvic fractures based on the presumed force vector direction (Pennal et al. 1980), and Young and Burgess (1990) refined these classifications (Burgess et al. 1990). Pelvis fractures have been generally characterized according to the presumed loading vector as either anteroposterior compression (APC), lateral compression (LC), vertical shear (VS), or combined mechanical injury (CM) (Pennal et al. 1980). Each of these injury types has a commonly used associative severity scale from I to III, where III is considered unstable or most severe (e.g., APC-III or LC-III). APC fractures in the anterior aspect typically demonstrate pubic symphysis diastasis or a vertical fracture pattern of rami and posterior injuries defined by subsets. APC-I shows slight widening of pubic symphysis and one SI joint but have intact anterior and posterior SI joint ligaments. APC-II results in injury widening SI joint anteriorly affecting anterior SI ligaments and ipsilateral sacrotuberous and sacrospinous ligaments with intact posterior SI ligaments. Stability of APC-II depends on various ligaments involved. APC-III represents complete separation of hemipelvis from pelvic ring with rupture of anterior and posterior SI ligaments. APC-III is again treated as severely unstable and involves disruption of all sacroiliac joint ligaments (Young and Resnik 1990). LC pelvic injuries are caused through impact to the proximal femur that transfers a load to the iliac crest. Lateral compression fractures occur as the pelvis rotates toward the midline along the impact vector. The ilium rotates medially while its posterior is attached to the sacrum through posterior sacroiliac (SI) ligaments (Burgess et al. 1990). A secondary crush is common within the hemipelvis on the impact side, which causes an internal rotation; i.e., the contralateral hemi pelvis rotates externally causing an “open-book fracture.” This injury is considered LC-III or

Injury Mechanisms in Traffic Accidents

23

severely unstable since the compressive force also affects the contralateral side by way of an external rotation of the anterior pelvis (Young and Resnik 1990). VS injury results in a diastasis of the symphysis or a vertical fracture pattern of rami anteriorly, and it vertically displaces the hemipelvis (Burgess et al. 1990; Manson et al. 2010). Most of these fractures result from a fall from height and are uncommon in MVCs.

Mechanisms The majority of experimental PMHS work has been done for pelvis injury mechanisms through direct lateral or frontal impacts to the pelvis (Yoganandan et al. 2013a). Therefore, this discussion of injury mechanisms focuses primarily on those from side, oblique, and frontal impacts in MVCs.

Lateral Impact Lateral impact injury mechanisms are pertinent to side or oblique impacts in MVCs. During side impact MVCs, the impact is often delivered in a direct manner from the intruding door. Historically, PMHS studies have exposed whole cadavers to impacts along the lateral aspect of the greater trochanter of the femur. Pubic rami were found to experience substantial strain during lateral impact (Molz et al. 1997). According to Bonquet et al. (Bouquet et al. 1998), a 50% injury risk of AIS  2 is incurred when the pelvis deflection (or change in maximum lateral width) reaches 46 mm (Bouquet et al. 1998). While the risk of injury can be developed based on test measurements, the nature of the injury depends on the properties of the impactor. Using 20 unembalmed male cadavers, Nusholtz and Kaiker (1986) found that the nature of the injury depends on the rigidity of the impactor (Nusholtz and Kaiker 1986). In absence of padding, pelvic damage occurred at or around the acetabulum whereas the pubic area was prone to fractures from impact from a rigid device. Schiff et al. (2008) studied 728 lateral impact crashes that resulted in pelvic fractures and compared them to 5710 control cases involved in lateral impact without suffering any pelvic injuries (Schiff et al. 2008). The authors found statistically significant factors including occupant (1) age, (2) sex, (3) weight, and (4) vehicle type: (1) occupants 65 years and above had 70% increased risk, (2) nonpregnant females had 60% increased risk, (3) underweight occupants had 80% increased risk, and (4) occupants of vans had 80% decreased risk. Magnitude of lateral intrusion had the strongest relation to pelvic fracture. Cavanaugh et al. (1990) laterally impacted 12 whole unembalmed cadavers in a way where the energy was directed through the greater trochanter (Cavanaugh et al. 1990). The impact velocities ranged from 6.7 to 10.5 m/s. Injuries occurred to inferior and superior left pubic rami and the left sacroiliac joint. Pelvic injury tolerance was found between 8 kN and 10.6 kN. A peak force of 7.98 kN yielded 25% probability of injury. A peak compression of 32.6% corresponded to a 25% risk of fracture. Similar to intra-abdominal injury studies (cf. Sec. 0), the best injury criterion was related to Vmax  Cmax (peak velocity times peak compression).

24

B.D. Goodwin et al.

Likewise, Viano et al. (1989) performed a series of pendulum (23.4 kg) impact tests on unembalmed cadavers with impact velocities of 4.5, 6.7, and 9.4 m/s where the force vector was directed through the torso (Viano et al. 1989). They found that peak pelvic acceleration and pelvic deformation were not reliable measures, but the ratio of pelvic deformation to pelvic width correlated well with pubic rami fracture, which was the only type of injury observed. The tolerance level for 25% probability of serious injury to the pelvis was found to be 27% of pelvic compression, based on the entire width of the pelvis. Impact energy that is delivered through the greater trochanter may result in substantial kinetic energy, which moves the contralateral hemipelvis against the center console of the vehicle. Tencer et al. (2007) analyzed pelvis accelerations and reviewed CIREN for pelvic injuries of occupants in vehicles with and without a center console. They found a higher incidence of pelvis fracture in MVCs with center consoles (Tencer et al. 2007). The abrupt deceleration from the center console contributes to injuries in the contralateral hemipelvis. The maximum and minimum accelerations for (1) fixed seat without console, (2) fixed seat with console, and (3) movable seat without console were (1) 28.5 g and 3.3 g, (2) 24.8 g and 10.5 g, and (3) 15.3 g and 3.8 g, respectively. A 50% reduction in primary pelvic acceleration was observed for vehicles without a center console.

Frontal/Rear Impact Occupants involved in head-on collisions in MVCs are exposed to a high risk of a frontal load on the pelvis. Compared to lateral impacts, frontal impacts indirectly load the pelvis where energy is transmitted from the knee, which is impacted by the dashboard (King 2001). Generally, injury mechanisms differ between front and rear vehicle occupants. A patella fracture is often accompanied by posterior hip dislocation in front-seat passengers, which have a tendency to sit with their knees together and hips flexed at 90 when the knee impacts the dashboard (Markham 1972). Rear side passengers thrown forward on impact may suffer anterior dislocation of the hip joint where higher energy loads can give rise to sacroiliac joint separation. This injury pattern for rear seat passengers is due to their tendency to sit with knees and hips flexed at 90 degrees and with hips in abduction with a slight external rotation (Markham 1972). Salzar et al. (2006) conducted a frontal impact study using an isolated hemi-pelvis model to study the posture-dependency of resulting fractures of the acetabulum and proximal femur (Salzar et al. 2006). Peak forces varied with the acetabular support area, and the highest sustained forces occurred under abduction and extension of the specimen (i.e., having the largest contact area) while adduction and flexion failed under lesser forces (i.e., having the least contact area). Interestingly, Masson et al. (2005) found the critical quasistatic load to strongly depend on sex by testing seven male and five female isolated pelves from embalmed cadavers (Masson et al. 2005). Quasistatic loads were applied in the anteriorposterior direction to the symphysis through a rectangular rigid plate. The sustainable force sustained before collapse ranged from 556 to 3981 N with average peak loads for female equal to 1053 N and for males 2501 N. The force-displacement

Injury Mechanisms in Traffic Accidents

25

corridors indicated that the female pelvic ring was more fragile due to the larger retro-pubic angle when controlling for bone density. Though a majority of pedestrians are struck laterally, frontal loading to the pelvis can occur when a pedestrian is hit from the front or back. After analyzing 1014 cases, 20.7% of the cases had injuries from anteroposterior compression (Eastridge and Burgess 1997). Pedestrian injury patterns are primarily influenced by front-end vehicle design. Simms et al. (2006) reviewed accident data and simulated pedestrian-vehicle interactions between sport utility vehicles (SUVs) and cars (Simms and Wood 2006). The probability of pelvis injury was 2.5 times as likely in pedestrians when struck by a vehicle with an elevated bumper, as is common in SUVs.

Lower Extremities Anatomy The lower extremities include the femur head, acetabulum joint, thigh, femur, knee, patella, tibia, fibula, ankle, and foot. It consists of 29 distinct bone types, 72 articulating surfaces, 30 synovial joints, greater than 100 ligaments, and 30 muscle attachments (Crandall et al. 1996). The knee is made up of the patella, femoral condyles, and knee ligaments. The thigh comprises the supracondylar, shaft, and subtrochanteric region of the femur. The hip consists of the femoral head, neck, and acetabulum (the hip socket). Femur anatomy is included in the knee, thigh, and hip. The patella bone (or knee cap) hovers over the anterior knee, which acts as a mechanism to aid in leg extension and knee protection. Major knee ligaments include the posterior cruciate ligament (PCL), anterior cruciate ligament (ACL), medial collateral ligament (MCL), and lateral collateral ligament (LCL). The reader is encouraged to seek out other literature for in-depth explanations of ligament anatomy and functionality if so desired (Rupp 2015). The femur is a strong bone that spans the length of the thigh, and it consists of medial and lateral condyles at its distal head that articulate along the superior end of the tibia. The proximal end of the femur forms the trochanteric region from which the femoral neck sprouts allowing rotation about the hip. From a CIREN database study, from 1997 to 2003, the femoral shaft (31.5%) and the acetabulum (21.6%) were the most commonly reported injuries within the knee-thigh-hip arrangement (Rupp 2006). The tibia and fibula are attached by articulating surfaces at its proximal and distal ends. The ankle joint contains the talocrural joint, talocalcaneal joint, talocalcaneonavicular joint, and transverse tarsal joint. The hindfoot rotates in three planes of motion through internal and external rotation (transverse plane), dorsiflexion and plantarflexion (sagittal plane), and inversion and eversion (coronal plane) (Salzar et al. 2015). A common misconception is that the hindfoot is attached to the leg through hinge joints while the actual motion of the hindfoot is not contained within any single plane. This

26

B.D. Goodwin et al.

particularly applies to flexion of the hindfoot joints, which is not an isolated movement but is linked to inversion and eversion.

Mechanisms The knee-thigh-hip complex injuries appear to be the most commonly injured region of the lower extremities in MVC, and they are also the most severe lower extremity injury due to the high risk of arterial rupture (Kuppa et al. 2003). The lower extremities are the most frequently injured in MVCs where 25% of these injuries are toward the knee due to the presence of knee bolsters. Knee injuries often have are associated with high DALY values due to long recovery times. Knee bolster stiffness varies across vehicles and many are sufficiently padded transmitting the critical loads to the femur and hip (Salzar et al. 2015).

Knee Injury Knee injuries make up about 16% of all injuries to the lower extremities (Chang et al. 2008). Database studies indicate that few knee injuries during lateral impacts have been found to be AIS > 2, and as a result, little attention has been given to lateral loading mechanisms throughout literature (Kuppa and Wang 2001; Rupp et al. 2002). Patellar fractures are the most common knee injury and frequently occur in frontal impacts when the knee interacts with the dashboard or, in the case of rear seat passengers, the seats in front of them. Fracture occurs after critical compression force between the external impactor and the femoral condyles. Evidence suggests that risk of injury increases especially in drivers who have a contracted knee extensor muscle for braking (or bracing) immediately prior to the MVC (Atkinson et al. 1998). The patella is also susceptible to a tension load when it impacts a surface obliquely, which subjects the patella to a sliding motion and can cause a patellar ligament avulsion. Femur Injury About 39% of all knee-thigh-hip injuries are thigh fractures (Chang et al. 2008). Injury to the knee is often accompanied by fractures to the femoral condyles because the patella is pushed into the intracondylar notch, which disunites the condyles (Powell et al. 1975). Femoral shaft fractures are the most frequent, and they are commonly the product of axial compression. The curvature of the femoral shaft has a weakening effect on its axial strength. Axial loads tend to induce a posterior-toanterior bending moment, especially from knee loading applied medially to the femoral condyle (Viano and Khalil 1976; Viano and Stalnaker 1980; Rupp et al. 2003; Ivarsson et al. 2009). The mechanisms of injury of the trochanteric femur due to axial loading remain somewhat elusive, but it has been hypothesized that the mechanism is a combination of tension within the greater trochanter, the anchoring of the femoral head to the acetabulum, and the moment arm of the femoral neck (Rupp 2006). In fact, evidence suggests that fractures within the femoral neck occur through similar mechanisms (Rupp et al. 2003).

Injury Mechanisms in Traffic Accidents

27

Occupants often brace for the impending collision causing muscle contraction. PMHS experiments designed to reproduce loads caused by MVCs have produced injuries that are somewhat inconsistent with data collected by CIREN. The relaxed nature of PMHS specimen yields knee, distal femur, and hip injuries from frontal MVCs, while the CIREN database indicates that the midshaft of the femur is the most common injury (Rupp 2006). Chang et al. (2008) proposed that muscle activation preloads the femur prior to impact increasing the bending moment applied to the femur (Chang et al. 2008).

Hip Injury Fourty-five percent of knee-thigh-hip injuries are hip fractures and dislocations from frontal impacts where fracture of the acetabulum is most frequent (Rupp et al. 2003; Chang et al. 2008). There are many parameters that influence hip response and susceptibility to injury including hip posture during knee loading as well as interactions between the trochanter and components of the vehicle interior. Differences in the geometry of the acetabulum between men and women also have been shown to affect the extent and nature of fracture (Wang et al. 2004; Holcombe et al. 2011). Similarly, body mass and the center of gravity of the upper body prior to impact have a considerable affect on the hip response since it is coupled to the abdomen and lower extremity masses (Rupp et al. 2008). Approximately 70% of AIS > 2 injuries to the lower extremities from lateral impacts in nearside occupants during MVCs are hip or pelvis injuries compared to the 9% of injuries that are to either the thigh or knee (Banglmaier et al. 2003). Far side occupants are exposed to different loading conditions and only 38% of AIS > 2 injuries are to the hip or pelvis. Lower Leg Injury The vast majority of lower extremity injuries are to the knee-thigh-hip, and injuries to the lower leg complex rarely result in life-threatening circumstances. As vehicular safety standards have improved and the number of airbag equipped vehicles increased, severe injuries to the lower leg have increased since critical body regions are more protected than the lower leg. Additionally, prior to the advent of airbags, lower leg injuries were less relevant and generally undocumented in the event of a life threatening or fatal injury (Salzar et al. 2015). The effects of bending moments on the lower leg have been extensively studied, and its tolerance depends on the loading rate, direction, and magnitude (Schreiber et al. 1998; Kerrigan et al. 2004; Ivarsson et al. 2005; Yoganandan et al. 2014a). Focal lateral loads directed toward the midshaft region of the tibia generated injuries that indicated the general location of the fracture was relatively independent of the force direction (e.g., lateral-medial vs. anterior-posterior) (Rabl et al. 1996). Ankle Injury Ankle injuries are the most severe lower leg injuries and are common in frontend MVCs. Interactions between the lower leg and the break/accelerator pedal have been attributed to approximately 25% of inversion or eversion (Xversion; rotation about

28

B.D. Goodwin et al.

the X-axis; rotation about the anterior-posterior axis) related injuries (Morris et al. 1997). Lower leg injuries depend upon axial preloads or preflexion (e.g., through the Achilles tendon), and injuries are likely more severe when lower leg muscles are activated to hurriedly and forcefully depress the break pedal, which preloads the foot in dorsiflexion (Funk et al. 2001). It is commonly understood that injuries to the ankle occur when the foot is forced into Xversion. PMHS experiments that employed axial preloads followed by forced Xversion produced (medial and lateral) ligament tears, malleolar fractures, and tibial osteochondral fractures (Funk et al. 2002). Petit et al. (1997) were able to simulate tension in the Achilles tendon while quasistatically loading the foot into dorsiflexion and consistently observed medial malleolus fractures and calcaneofibular ligament tears (Petit et al. 1997). Rudd et al. (2004) conducted PMHS experiments that simulated the force applied to the foot under dorsiflexion when the driver was depressing the break pedal (Rudd et al. 2004). This setup provided an appropriate simulation of the leg in the event of a front-end MVC. Bony fractures at the ankle joint were present in 11 of 20 specimens, ligament ruptures in 4 specimens, and osteochondral damage in almost all specimens. Rudd et al. (2004) used surface acoustic sensors to detect fracture timing, and they point out that the fractures to the ankle do not necessarily align with the peak force of the dynamic load. Instead, the ankle seems to continue to bear higher load after initial fractures until the fracture is catastrophic.

Future Directions Despite the fact that biomechanics has largely evolved to applied biomechanics, considerable debate surrounds many accepted mechanisms. The current debate calls for further validation and repeated experiments. However, objections to accepted injury mechanisms are difficult to validate since funding opportunities for experimental replication are scarce. Albert King comments on this problem, “. . . [those] who review research proposals . . . are disinclined to reverse a previous opinion and tend to disapprove any proposal that will reverse this opinion” (King 2015). However, advancements in instrumentation will continue to allow researchers to peer into the biomechanical response with more detail. As long as funding opportunities in applied biomechanics remain available. As transportation technology evolves, first-world economies will continue to demand more sophisticated tests from the biomechanics community. However, the complexity of tests often precedes scientific understanding. As a result, injury explanations can contain much speculation, but PMHS tests will indeed indicate which injuries are likely to occur under defined loading conditions. Finally, the anthropomorphic test devices (ATDs) have provided insight into the biomechanical response with acceptable biofidelity (Baudrit and Trosseille 2015). The Hybrid III ATD was developed and has been used since the early 1990s (Mertz 1993). As vehicle and traffic technology advances, the body will be exposed to new and variable loads in the event of an MVC. Recently, Danelson et al. evaluated the

Injury Mechanisms in Traffic Accidents

29

biofidelity of the Hybrid III during vertical loading, and they concluded that there is a need for further developments to improve biofidelity (Danelson et al. 2015). The Hybrid III was designed specifically to mimic a seated automobile occupant, but the divergence from the PMHS response was sizable. Similarly, Foreman et al. point out the need for a biofidelic ATD response from pedestrian hits (Foreman et al. 2015). Injuries with complex mechanisms (especially soft tissue injuries) are difficult to predict from the ATD response. Future work is necessary to better estimate the human response through ATDs, and a vast amount of work has been carried out by way of biofidelity studies, response corridor generation techniques, and PMHS experimentation to pave the way for future improvements.

Cross-References ▶ Data Analytics for Biomechanics ▶ Machine Learning Techniques for Data Analytics of Human Motion ▶ Normalization Techniques ▶ Pedestrians in Traffic Accidents ▶ Special Pediatric Consideration in Traffic Accidents ▶ Traffic Accidents with Two-Wheel Vehicles ▶ Vehicle Occupants in Traffic Accidents

References Allen B Jr, Ferguson R, Lehmann TR, O’brien RP (1982) A mechanistic classification of closed, indirect fractures and dislocations of the lower cervical spine. Spine (Phila Pa 1976) 7:1–27 Anderson PA, Henley MB, Rivara FP, Maier R V (1991) Flexion distraction and chance injuries to the thoracolumbar spine. J Orthop Trauma 5:153–160 Anderson RWG, Brown CJ, Blumbergs PC et al (2003) Impact mechanics and axonal injury in a sheep model. J Neurotrauma 20:961–974. doi:10.1089/089771503770195812 Anuta PE (1970) Spatial registration of multispectral and multitemporal digital imagery using fast Fourier transform techniques. IEEE Trans Geosci Electron 8:353–368 Atkinson P, Atkinson T, Haut R, Eusebi C, Maripudi V, Hill T, Sambatur K (1998) Development of injury criteria for human surrogates to address current trends in knee-to-instrument panel injuries (No. 983146). SAE Technical Paper Augenstein J, Diggs K (2003) Performance of advanced air bags based on data William Lehman Injury Research Center and new NASS PSUs. Annu Proc Assoc Adv Automot Med 47:99–101 Augenstein J, Perdeck E, Martin P et al (2000) Injuries to restrained occupants in far-side crashes. Ann Adv Automot Med 44:57–66 Ball ST, Vaccaro AR, Albert TJ (2000) Injuries of the thoracolumbar spine associated with restraint use in head-on motor vehicle accidents. Spinal Disord 13:297–304 Balogh ZJ, Offner PJ, Moore EE, Biffl WL (2000) NISS predicts postinjury multiple organ failure better than the ISS. J Trauma Acute Care Surg 48:624–628 Balogh ZJ, Varga E, Tomka J et al (2003) The new injury severity score is a better predictor of extended hospitalization and intensive care unit admission than the injury severity score in patients with multiple orthopaedic injuries. J Orthop Trauma 17:508–512. doi:10.1097/ 00005131-200308000-00006

30

B.D. Goodwin et al.

Banglmaier RF, Rouhana SW, Beillas P, Yang KH (2003) Lower extremity injuries in lateral impact: a retrospective study. Ann Proc Assoc Adv Automot Med 47:425–444 Barnsley L, Lord S, Wallis B, Bogduk N (1995) The prevalence of chronic cervical zygapophysial joint pain after whiplash. Spine (Phila Pa 1976) 20:20–26 Baudrit P, Trosseille X (2015) Proposed method for development of small female and midsize male thorax dynamic response corridors in side and forward oblique impact tests. Stapp Car Crash J 59:177–202 Baur P, Lange W, Messner G et al (2000) Comparison of real world side impact/rollover collisions with and without thorax airbag/head protection system: a first field experience study. Ann Proc Assoc Adv Automot Med 44:187–201 Begeman PC, King AI, Prasad P (1973) Spinal loads resulting from -Gx acceleration. In: Proceedings of the 17th stapp car crash conference, Coronado, CA, pp 343–360 Begonia MT, Dallas M, Vizcarra B et al (2015) Non-contact strain measurement in the mouse forearm loading model using digital image correlation (DIC). Bone 81:593–601. doi:10.1016/j. bone.2015.09.007 Bergen GS, Chen LH, Warner M (2008) Injury in the United States; 2007 chartbook Bertrand S, Cuny S, Petit P et al (2008) Traumatic rupture of thoracic aorta in real-world motor vehicle crashes. Traffic Inj Prev 9:153–161. doi:10.1080/15389580701775777 Bogduk N, Marsland A (1988) The cervical zygapophysial joints as a source of neck pain. Spine (Phila Pa 1976) 13:610–617 Bouquet R, Ramet M, Bermond F et al (1998) Pelvis human response to lateral impact. In: Proceedings of the 16th international technical conference on the enhanced safety of vehicles, Windsor, pp 1665–1686 Burgess AR, Eastridge BJ, Young JWR, Ellison TS, Ellison Jr PS, Poka A, et al (1990) Pelvic ring disruptions: effective classification system and treatment protocols. J Trauma Acute Care Surg 30:848–856 Cammack K, Rapport RL, Paul J, Baird WC (1959) Deceleration injuries of the thoracic aorta. AMA Arch Surg 79:244–251. doi:10.1001/archsurg.1959.04320080080010 Carroll LJ, Holm LW, Hogg-Johnson S et al (2009) Course and prognostic factors for neck pain in Whiplash-Associated Disorders (WAD). Results of the bone and joint decade 2000-2010 Task force on neck pain and its associated disorders. J Manip Physiol Ther 32:S97–S107. doi:10.1016/j.jmpt.2008.11.014 Cavanaugh JM, Yoganandan NA (2015) Thorax injury biomechanics. In: Yoganandan N, Nahum AM, Melvin JW (eds) Accidental injury: biomechanics and prevention, 3rd edn. Springer, New York, pp 332–334 Cavanaugh JM, Walilko TJ, Malhotra A, Zhu Y, King AI (1990) Biomechanical response and injury tolerance of the pelvis in twelve sled side impacts (No. 902305). SAE Technical Paper Chance GQ (1948) Note on a type of flexion fracture of the spine. Br J Radiol 21:452 Chancey VC, Nightingale RW, C a VE et al (2003) Improved estimation of human neck tensile tolerance: reducing the range of reported tolerance using anthropometrically correct muscles and optimized physiologic initial conditions. Stapp Car Crash J 47:135–153 Chang C-Y, Rupp JD, Kikuchi N, Schneider LW (2008) Development of a finite element model to study the effects of muscle forces on knee-thigh-hip injuries in frontal crashes. Stapp Car Crash J 52:475 Cheng R, Yang KH, Levine RS, King AI, Morgan R (1982) Injuries to the cervical spine caused by distributed frontal load to the chest. SAE Paper #821155 Chu TC, Ranson WF, Sutton MA (1985) Applications of digital-image-correlation techniques to experimental mechanics. Exp Mech 25:232–244 Crandall J, Martin P, Bass C et al. (1996) Foot and ankle injury: the roles of driver anthropometry, footwear, and pedal controls. Paper presented at: 40th Annual Proceedings of the Association for the Advancement of AutomotiveMedicine Culver RH, Bender M, Melvin JW (1978) Mechanisms, tolerances, and responses obtained under dynamic superior-inferior head impact. Ann Arbor 7:103

Injury Mechanisms in Traffic Accidents

31

Danelson KA, Kemper AR, Mason MJ et al (2015) Comparison of ATD to PMHS Response in the under-body blast environment. Stapp Car Crash J 59:445–520 Davceva N, Janevska V, Ilievski B et al (2012) The occurrence of acute subdural haematoma and diffuse axonal injury as two typical acceleration injuries. J Forensic Legal Med 19:480–484. doi:10.1016/j.jflm.2012.04.022 Deng B, Begeman PC, Yang KH et al (2000) Kinematics of human cadaver cervical spine during low speed rear-end impacts. Stapp Car Crash J 44:171–188 Depreitere B, Van Lierde C, Vander SJ et al (2006) Mechanics of acute subdural hematomas resulting from bridging vein rupture. J Neurosurg 104(6):950. doi:10.3171/jns.2006.104.6.950 Eastridge BJ, Burgess AR (1997) Pedestrian pelvic fractures: 5-year experience of a major urban trauma center. J Trauma Acute Care Surg 42:695–700 Fice JB, Cronin DS (2012) Investigation of whiplash injuries in the upper cervical spine using a detailed neck model. J Biomech 45:1098–1102. doi:10.1016/j.jbiomech.2012.01.016 Foreman JL, Joodaki H, Forghani A et al (2015) Whole-body response for pedestrian impact with a generic sedan buck. Stapp Car Crash J 59:401–444 Foster CD, Hardy WN, Yang KH et al (2006) High-speed seatbelt pretensioner loading of the abdomen. Stapp Car Crash J 50:27–51 Funk JR, Crandall JR, Tourret LJ, MacMahon CB, Bass CR, Khaewpong N, Eppinger RH (2001) The effect of active muscle tension on the axial injury tolerance of the human foot/ankle complex (No. 2001-06-0074). SAE Technical Paper Funk JR, Srinivasan SCM, Crandall JR et al (2002) The effects of axial preload and dorsiflexion on the tolerance of the ankle/subtalar joint to dynamic inversion and eversion. Stapp Car Crash J 46:245–265. doi:2002-22-0013 [pii] Gabbe BJ, De Steiger R, Esser M et al (2011) Predictors of mortality following severe pelvic ring fracture: results of a population-based study. Injury 42:985–991 Gabler HC, Weaver AA, Stitzel JD (2015) Automotive field data in injury. In: Yoganandan N, Nahum AM, Melvin JW (eds) Accidental injury: biomechanics and prevention, 3rd edn. Springer, New York, pp 33–47 Gennarelli TA, Thibault LE (1982) Biomechanics of acute subdural hematoma. J Trauma 22:680–686 Gertzbein SD, Court-Brown CM (1988) Flexion-distraction injuries of the lumbar spine. Clin Orthop Relat Res 227:52–60 Gokcen EC, Burgess AR, Siegel JH et al (1994) Pelvic fracture mechanism of injury in vehicular trauma patients. J Trauma Acute Care Surg 36:789–796 Green DA, Green NE, Spengler DM, Devito DP (1991) Flexion-distraction injuries to the lumbar spine associated with abdominal injuries. J Spinal Disord Tech 4:312–318 Gurdjian ES, Webster JE, LH R (1955) Observations on the mechanism of brain concussion, contusion, and laceration. Surg Gynecol Obstet 101:680–690 Haagsma JA, Graetz N, Bolliger I et al (2016) The global burden of injury: incidence, mortality, disability-adjusted life years and time trends from the Global Burden of Disease study 2013. Inj Prev 22:3–18. doi:10.1136/injuryprev-2015-041616 Haffner MP, Sances S, Kumaresan S et al (1996) Response of human lower thorax to impact. Ann Proc Assoc Adv Automot Med 40:33–43 Hardy WN, Foster CD, Mason MJ, Yang KH, King AI, Tashman S (2001) Investigation of head injury mechanisms using neutral density technology and high-speed biplanar x-ray. Stapp Car Crash J 45:337–368 Hardy WN, Shah CS, Kopacz JM et al (2006) Study of potential mechanisms of traumatic rupture of the aorta using insitu experiments. Stapp Car Crash J 50:247–266 Hardy WN, Shah CS, Mason MJ et al (2008) Mechanisms of traumatic rupture of the aorta and associated peri-isthmic motion and deformation. Stapp Car Crash J 52:233–265. doi:2008-220010 [pii]

32

B.D. Goodwin et al.

Hardy WN, Howes MK, Kemper AR, Rouhana SW (2015) Impact and injury response of the abdomen. In: Yoganandan N, Nahum AM, Melvin JW (eds) Accidental injury: biomechanics and prevention, 3rd edn. Springer, New York, pp 373–434 Holcombe S, Kohoyda-Inglis C, Wang L et al (2011) Patterns of acetabular femoral head coverage. Stapp Car Crash J 55:479–490 Ivarsson BJ, Kerrigan JR, Lessley DJ, Drinkwater DC, Kam CY, Murphy DB, . . . Kent RW (2005) Dynamic response corridors of the human thigh and leg in non-midpoint three-point bending (No. 2005-01-0305). SAE Technical Paper Ivarsson BJ, Genovese D, Crandall JR et al (2009) The tolerance of the femoral shaft in combined axial compression and bending loading. Stapp Car Crash J 53:251 Jefferson G (1919) Fracture of the atlas vertebra. Report of four cases, and a review of those previously recorded. Br J Surg 7:407–422 Katyal D, McLellan BA, Brenneman FD et al (1997) Lateral impact motor vehicle collisions: significant cause of blunt traumatic rupture of the thoracic aorta. J Trauma 42:769–772 Kauvar DS, Wade CE (2005) The epidemiology and modern management of traumatic hemorrhage: US and international perspectives. Crit Care 9(Suppl 5):S1–S9. doi:10.1186/cc3779 Kemper AR, McNally C, E a K et al (2008) The influence of arm position on thoracic response in side impacts. Stapp Car Crash J 52:379–420. doi:10.4271/811007 Kerrigan JR, Drinkwater DC, Kam CY et al (2004) Tolerance of the human leg and thigh in dynamic latero-medial bending. Int J Crashworthiness 9:607–623 King AI (2001) Fundamentals of impact biomechanics: part 2-biomechanics of the abdomen, pelvis, and lower extremities. Annu Rev Biomed Eng 3:27–55 King AI (2015) Introduction to and applications of injury biomechanics. In: Narayan Y, Nahum AM, Melvin JW (eds) Accidental injury: biomechanics and prevention, 3rd edn. Springer, New York, pp 7–14 King AI, Yang KH (1995) Research in biomechanics of occupant protection. J Trauma 38:570–576 Kirk A, Morris A (2003) Side airbag deployments in the UK – initial case reviews. In: Proceedings of the 18th international technical conference on enhanced safety of vehicles. Nagoya, pp 1–8 Klinich KD, Flannagan CAC, Nicholson K et al (2010) Factors associated with abdominal injury in frontal, farside, and nearside crashes. Stapp Car Crash J 54:73–91 Kroell CK, Pope ME, Viano DC et al (1981) Interrelationship of velocity and chest compression in blunt thoracic impact. In: 25th Stapp car crash conference proceedings. SAE Technical Paper 811016, Warrendale, PA, pp 547–580 Kroell C, Allen SD, Warner CY, Perl T (1986) Interrelatinoship of velocity and chest compression in blunt thoracic impact to Swine II. In: 30th Stapp car crash conference proceedings SAE Technical Paper 861881, Warrendale, PA, pp 99–121 Kuppa S, Wang J, Haffner M, Eppinger R (2001) Lower extremity injuries and associated injury criteria. In 17th ESV Conference, Paper (No. 457) Kuppa S, Fessahaie O et al (2003) An overview of knee-thigh-hip injuries in frontal crashes in the United States. Natl Highw Traffic Saf Adm ISSI. doi:10.1017/CBO9781107415324.004 Lamielle S, Vezin P, Verriest JP et al (2008) 3D deformation and dynamics of the human cadaver abdomen under seatbelt loading. Stapp Car Crash J 52:267 Lee BB, Cripps RA, Fitzharris M, Wing PC (2014) The global map for traumatic spinal cord injury epidemiology: update 2011, global incidence rate. Spinal Cord 52:110–116. doi:10.1038/ sc.2012.158 LeGay DA, Petrie DP, Alexander DI (1990) Flexion-distraction injuries of the lumbar spine and associated abdominal trauma. J Trauma 30:436–444 Leitgeb J, Mauritz W, Brazinova A et al (2012) Outcome after severe brain trauma due to acute subdural hematoma. J Neurosurg 117:324–333. doi:10.3171/2012.4.JNS111448 Lessley DJ, Riley P, Zhang Q et al (2014) Occupant kinematics in laboratory rollover tests: PMHS response. Stapp Car Crash J 58:251 Leucht P, Fischer K, Muhr G, Mueller EJ (2009) Epidemiology of traumatic spine fractures. Injury 40:166–172. doi:10.1016/j.injury.2008.06.040

Injury Mechanisms in Traffic Accidents

33

Linnau KF, Blackmore CC, Kaufman R et al (2007) Do initial radiographs agree with crash site mechanism of injury in pelvic ring disruptions? A pilot study. J Orthop Trauma 21:375–380 Lu Y, Chen C, Kallakuri S et al (2005) Neurophysiological and biomechanical characterization of goat cervical facet joint capsules. J Orthop Res 23:779–787. doi:10.1016/j.orthres.2005.01.002 Manson T, O’Toole RV, Whitney A et al (2010) Young-Burgess classification of pelvic ring fractures: does it predict mortality, transfusion requirements, and non-orthopaedic injuries? J Orthop Trauma 24:603–609 Markham DE (1972) Anterior dislocation of the hip and diastasis of the contralateral sacro-iliac joint – the rear-seat passenger’s injury? Br J Surg 59:296–298 Masson C, Baque P, Brunet C (2005) Quasi-static compression of the human pelvis: an experimental study. Comput Methods Biomech Biomed Engin 8:191–192 Matsui Y, Oikawa S (2015) Risks of serious injuries and fatalities of cyclists associated with impact velocities of cars in car-cyclist accidents in Japan. Stapp Car Crash J 59:385–400 McCormick N, Lord J (2010) Digital image correlation. Mater Today 13:52–54. doi:10.1016/ S1369-7021(10)70235-2 McKay BJ, Bir CA (2009) Lower extremity injury criteria for evaluating military vehicle occupant injury in underbelly blast events. Stapp Car Crash J 53:229–249 Melvin JW, Stalnaker RL, Roberts VL, Trollope ML (1973) Impact injury mechanisms in abdominal organs. In: 17th Stapp car crash conference. Society of Automotive Engineers, Oklahoma City Mertz HJ (1993) Anthropomorphic test devices. In: Accidental injury. Springer, New York, pp 66–84 Middleton JW, Dayton A, Walsh J et al (2012) Life expectancy after spinal cord injury: a 50-year study. Spinal Cord 50:803–811 Miller M (1989) The biomechanical response of the lower abdomen to belt restraint loading. J Trauma Acute Care Surg 29:1571–1584 Molz FJI V, George PD, Go LS et al (1997) Simulated automotive side impact on the isolated human pelvis: Phase I: development of a containment device. Phase II: analysis of pubic symphysis motion and overall pelvic compression. In: Stapp car crash conference proceedings, Warrendale, PA, pp 75–89 Morris A, Thomas P, Taylor AM, Wallace WA (1997) Mechanisms of fracture in ankle and hindfoot injuries to front seat car occupants- an in-depth accident data analysis. Stapp Car Crash Conf 41:181–192. doi:10.4271/973328 Myers BS, Winkelstein BA (1995) Epidemiology, classification, mechanism, and tolerance of human cervical spine injuries. Crit Rev Biomed Eng 23(5–6):307–409 NHTSA (2005) Federal motor vehicle safety standards; Occupant crash protection. Docket No. NHTSA-04-18726 Nightingale RW, McElhaney JH, Richardson WJ et al (1996a) Experimental impact injury to the cervical spine: relating motion of the head and the mechanism of injury. J Bone Jt Surg Am 78:412–421 Nightingale RW, McElhaney JH, Richardson WJ, Myers BS (1996b) Dynamic responses of the head and cervical spine to axial impact loading. J Biomech 29:307–318 Nightingale RW, McElhaney JH, Camacho DL, Kleinberger M, Winkelstein BA, Myers BS (1997) The dynamic responses of the cervical spine: buckling, end conditions, and tolerance in compressive impacts (No. 973344). SAE Technical Paper Nightingale RW, Myers BS, Yoganandan NA (2015) Neck injury biomechanics. In: Yoganandan NA, Nahum AM, Melvin JW (eds) Accidental injury: biomechanics and prevention, 3rd edn. Springer, New York, pp 259–308 Nirula R, Pintar FA (2008) Identification of vehicle components associated with severe thoracic injury in motor vehicle crashes: a CIREN and NASS analysis. Accid Anal Prev 40:137–141. doi:10.1016/j.aap.2007.04.013 Nusholtz GS, Kaiker PS (1986) Pelvic stress. J Biomech 19:1003–1014

34

B.D. Goodwin et al.

Nusholtz GS, Melvin JW, Huelke DF, Alem NM, Blank JG (1981) Response of the cervical spine to superiorinferior head impact (No. 811005). SAE Technical Paper Nusholtz GS, Huelke DE, Lux P, Alem NM, Montalvo F (1983) Cervical spine injury mechanisms (No. 831616). SAE Technical Paper Ooi CK, Goh HK, Tay SY, Phua DH (2010) Patients with pelvic fracture: what factors are associated with mortality? Int J Emerg Med 3:299–304 Panzer MB, Fice JB, Cronin DS (2011) Cervical spine response in frontal crash. Med Eng Phys 33:1147–1159. doi:10.1016/j.medengphy.2011.05.004 Parr JC, Miller ME, Pellettiere JA, Erich RA (2013) Neck injury criteria formulation and injury risk curves for the ejection environment: a pilot study. Aviat Sp Environ Med 84:1240–1248. doi:10.3357/ASEM.3722.2013 Pennal GF, Tile M, Waddell JP, Garside H (1980) Pelvic disruption: assessment and classification. Clin Orthop Relat Res 151:12–21 Petit P, Portier L, Foret-Bruno J-Y et al (1997) Quasistatic characterization of the human foot-ankle joints in a simulated tensed state and updated accidentological data. In: Proceedings of the international research council on the biomechanics of injury conference, Hannover, Germany, pp 363–376 Petitjean A, Trosseille X (2011) Statistical simulations to evaluate the methods of the construction of injury risk curves. Stapp Car Crash J 55:411 Petitjean A, Trosseille X, Yoganandan N, Pintar FA (2015) Normalization and scaling for human response corridors and development of injury risk curves. In: Yoganandan N, Nahum AM, Melvin J (eds) Accidental injury: biomechanics and prevention, 3rd edn. Springer, New York, pp 769–792 Petrisor BA, Bhandari M (2005) (i) Injuries to the pelvic ring: Incidence, classification, associated injuries and mortality rates. Curr Orthop 19:327–333 Pintar FA, Yoganandan N, Voo L, Cusick JF, Maiman DJ, Sances A (1995) Dynamic characteristics of the human cervical spine (No. 952722). SAE Technical Paper Pintar FA, Yoganandan NA, Voo L (1998) Effect of age and loading rate on human cervical spine injury threshold. Spine (Phila Pa 1976) 23:1957–1962 Pintar FA, Yoganandan NA, Maiman DJ et al (2012) Thoracolumbar spine fractures in frontal impact crashes. Ann Adv Automot Med 56:277–283 Powell WR, Ojala SJ, Advani SH, Martin RB (1975) Cadaver femur responses to longitudinal impacts (No. 751160). SAE Technical Paper Rabl W, Haid C, Krismer M (1996) Biomechanical properties of the human tibia: fracture behavior and morphology. Forensic Sci Int 83:39–49 Robertson A, Branfoot T, Barlow IF, Giannoudis PV (2002a) Spinal injury patterns resulting from car and motorcycle accidents. Spine (Phila Pa 1976) 27:2825–2830. doi:10.1097/01. BRS.0000035686.45726.0E Robertson A, Giannoudis P V, Branfoot T et al (2002b) Spinal injuries in motorcycle crashes: patterns and outcomes. J Trauma 53:5–8. doi:10.1097/00005373-200207000-00002 Rouhana SW, Lau IV, Ridella SA (1985) Influence of velocity and forced compression on the severity of abdominal injury in blunt, nonpenetrating lateral impact. J Trauma 25:490–500 Rudd R, Crandall J, Millington S et al (2004) Injury tolerance and response of the ankle joint in dynamic dorsiflexion. Stapp Car Crash J 48:1–26. doi:2004-22-0001 [pii] Rupp JD (2006) Biomechanics of hip fractures in frontal motor vehicle crashes. Phdthesis, PhD dissertation, The University of Michigan, Ann Arbor Rupp JD (2015) Knee, thigh, and hip injury biomechanics. In: Yoganandan NA, Nahum AM, Melvin JW (eds) Accidental injury: biomechanics and prevention, 3rd edn. Springer, New York, pp 471–497 Rupp JD, Reed MP, Van Ee CA et al (2002) The tolerance of the human hip to dynamic knee loading. In: 46th Stapp car crash conference, Ponte Verdra Beach, FL Rupp JD, Reed MP, Jeffreys TA, Schneider LW (2003) Effects of hip posture on the frontal impact tolerance of the human hip joint. Stapp Car Crash J 47:21

Injury Mechanisms in Traffic Accidents

35

Rupp JD, Miller CS, Reed MP et al (2008) Characterization of knee-thigh-hip response in frontal impacts using biomechanical testing and computational simulations. Stapp Car Crash J 52:421 Salzar RS, “Dale” Bass CR, Kent R et al (2006) Development of injury criteria for pelvic fracture in frontal crashes. Traffic Inj Prev 7:299–305 Salzar RS, Lievers BW, Bailey AM, Crandall JR (2015) Leg, foot, and ankle injury biomechanics. In: Yoganandan NA, Nahum AM, Melvin JW (eds) Accidental injury: biomechanics and prevention, 3rd edn. Springer, New York, pp 499–547 Santschi M, Echavé V, Laflamme S et al (2005) Seat-belt injuries in children involved in motor vehicle crashes. Can J Surg 48:373–376 Schiff MA, Tencer AF, Mack CD (2008) Risk factors for pelvic fractures in lateral impact motor vehicle crashes. Accid Anal Prev 40:387–391 Schreiber P, Crandall J, Hurwitz S, Nusholtz GS (1998) Static and dynamic bending strength of the leg. Int J Crashworthiness 3:295–308 Sekhon LH, Fehlings MG (2001) Epidemiology, demographics, and pathophysiology of acute spinal cord injury. Spine (Phila Pa 1976) 26:S2–12. doi:10.1097/00007632-200112151-00002 Services UDoHaH (2007) Injury in the United States: 2007 Shinkawa H, Yasuhara H, Naka S et al (2004) Characteristic features of abdominal organ injuries associated with gastric rupture in blunt abdominal trauma. Am J Surg 187:394–397 Shkrum M, McClafferty K, Green R et al (1999) Mechanisms of aortic injury in fatalities occurring in motor vehicle collisions. J Forensic Sci 44:44–56 Siegmund GP, Myers BS, Davis MB et al (2001) Mechanical evidence of cervical facet capsule injury during whiplash: a cadaveric study using combined shear, compression, and extension loading. Spine (Phila Pa 1976) 26:2095–2101 Siegmund GP, Davis MB, Quinn KP et al (2008) Head-turned postures increase the risk of cervical facet capsule injury during whiplash. Spine (Phila Pa 1976) 33:1643–1649. doi:10.1097/ BRS.0b013e31817b5bcf Simms CK, Wood DP (2006) Pedestrian risk from cars and sport utility vehicles-a comparative analytical study. Proc Inst Mech Eng Part D J Automob Eng 220:1085–1100 Somers RL (1983a) The probability of death score: a measure of injury severity for use in planning and evaluating accident prevention. Accid Anal Prev 15:259–266. doi:10.1016/0001-4575(83) 90050-7 Somers RL (1983b) The probability of death score: an improvement of the injury severity score. Accid Anal Prev 15:247–257. doi:10.1016/0001-4575(83)90049-0 Sparks JL, Bolte JH, Dupaix RB et al (2007) Using pressure to predict liver injury risk from blunt impact. Stapp Car Crash J 51:401–432 Stalnaker RL, Ulman MS (1985) Abdominal trauma – review, response, and criteria. In: 29th Stapp car crash conference proceedings, pp 1–16 Stalnaker RL, McElhaney JH, Roberts VL, Trollope ML (1973) Human torso response to blunt trauma. In: King WF, Mertz HJ (eds) Human impact response: measurement and simulation. Springer US, Boston, pp 181–199 Stemper BD, Yoganandan NA, Pintar FA (2004) Validation of a head-neck computer model for whiplash simulation. Med Biol Eng Comput 42:333–338 Stemper BD, Pintar FA, Baisden JL (2015) Lumbar spine injury biomechanics. In: Yoganandan N, Nahum AM, Melvin JW (eds) Accidental injury: biomechanics and prevention, 3rd edn. Springer, New York, pp 451–470 Takhounts EG, Craig MJ, Moorhouse K et al (2013) Development of brain injury criteria (BrIC). Stapp Car Crash J 57:243 Talantikite Y, Brun-Cassan F, Le coz J, Tarriere C (1993) Abdominal protection in side impact. Injury mechanisms and protection criteria. Proc Int Res Counc Biomech Inj Conf 21:131–144 Tencer AF, Kaufman R, Huber P et al (2007) Reducing primary and secondary impact loads on the pelvis during side impact. Traffic Inj Prev 8:101–106 Tile M, Pennal GF (1980) Pelvic disruption: principles of management. Clin Orthop Relat Res 151:56–64

36

B.D. Goodwin et al.

Torg JS (1985) Epidemiology, pathomechanics, and prevention of athletic injuries to the cervical spine. Med Sci Sports Exerc 17:295–303 Trollope ML, Stalnaker RL, JH ME, Frey CF (1973) The mechanism of injury in blunt abdominal trauma. J Trauma 13:962–970 Vandevord PJ, Bolander R, Sajja VSSS et al (2012) Mild neurotrauma indicates a range-specific pressure response to low level shock wave exposure. Ann Biomed Eng 40:227–236. doi:10.1007/s10439-011-0420-4 Viano DC, Khalil TB (1976) Investigation of impact response and fracture of the human femur by finite element modeling (No. 760773). SAE Technical Paper Viano DC, Lau VK (1983) Role of impact velocity and chest compression in thoracic injury. Aviat Sp Environ Med 54:16–21 Viano DC, Stalnaker RL (1980) Mechanisms of femoral fracture. J Biomech. doi:10.1016/00219290(80)90356-5 Viano DC, Lau IV, Asbury C et al (1989) Biomechanics of the human chest, abdomen, and pelvis in lateral impact. Accid Anal Prev 21:553–574. doi:10.1016/0001-4575(89)90070-5 Wang SC, Brede C, Lange D et al (2004) Gender differences in hip anatomy: possible implications for injury tolerance in frontal collisions. Annu Proc Assoc Adv Automot Med 48:287–301 Wang MC, Pintar F, Yoganandan N, Maiman DJ (2009) The continued burden of spine fractures after motor vehicle crashes. J Neurosurg Spine 10:86–92. doi:10.3171/SPI.2008.10.08279 Wilberger JE, Harris M, Diamond DL (1991) Acute subdural hematoma: morbidity, mortality, and operative timing. J Neurosurg 74:212–218. doi:10.3171/jns.1991.74.2.0212 Yang KH, Begeman PC (1996) A proposed role for facet joints in neck pain in low to moderate speed rear end impacts part I: biomechanics. In: 6th injury prevention through biomechanics symposium, Wayne State University, Detroit, MI, pp 59–63 Yoganandan NA, Sances Jr A, Maiman DJ et al (1986) Experimental spinal injuries with vertical impact. Spine (Phila Pa 1976) 11:855–860 Yoganandan N, Haffner M, Maiman DJ, Nichols H, Pintar FA, Jentzen J, . . . Sances A (1989) Epidemiology and injury biomechanics of motor vehicle related trauma to the human spine (No. 892438). SAE Technical Paper Yoganandan NA, Pintar FA, Sances Jr A et al (1991) Strength and kinematic response of dynamic cervical spine injuries. Spine (Phila Pa 1976) 16:S511–S517 Yoganandan NA, Pintar FA, Gennarelli TA, Maltese MR (2000) Patterns of abdominal injuries in frontal and side impacts. Ann Proc Assoc Adv Automot Med 44:17–36 Yoganandan N, Pintar FA, Maltese MR (2001) Biomechanics of abdominal injuries. Crit Rev™ Biomed Eng 29(2) Yoganandan NA, Pintar FA, Zhang J, Gennarelli TA (2007) Lateral impact injuries with side airbag deployments-A descriptive study. Accid Anal Prev 39:22–27. doi:10.1016/j.aap.2006.05.014 Yoganandan NA, Gennarelli TA, Zhang J et al (2009) Association of contact loading in diffuse axonal injuries from motor vehicle crashes. J Trauma 66:309–315. doi:10.1097/ TA.0b013e3181692104 Yoganandan NA, Humm JR, Pintar FA (2013a) Force corridors of post mortem human surrogates in oblique side impacts from sled tests. Ann Biomed Eng 41:2391–2398. doi:10.1007/s10439-0130847-x Yoganandan N, Stemper BD, Pintar FA et al (2013b) Cervical spine injury biomechanics: applications for under body blast loadings in military environments. Clin Biomech 28:602–609. doi:10.1016/j.clinbiomech.2013.05.007 Yoganandan NA, Arun MWJ, Pintar FA, Szabo A (2014a) Optimized lower leg injury probability curves from postmortem human subject tests under axial impacts. Traffic Inj Prev 15(Suppl 1): S151–S156. doi:10.1080/15389588.2014.935357 Yoganandan N, Arun MWJ, Pintar FA (2014b) Normalizing and scaling of data to derive human response corridors from impact tests. J Biomech 47:1749–1756. doi:10.1016/j. jbiomech.2014.03.010

Injury Mechanisms in Traffic Accidents

37

Yoganandan NA, Arun MWJ, Pintar FA, Banerjee A (2015a) Lower leg injury reference values and risk curves from survival analysis for male and female dummies: meta-analysis of postmortem human subject tests. Traffic Inj Prev 16(Suppl 1):S100–S107. doi:10.1080/ 15389588.2015.1015118 Yoganandan NA, Humm JR, Pintar FA et al (2015b) Oblique loading in post mortem human surrogates from vehicle tests using chestbands. Stapp Car Crash J 59:1–22 Young JW, Resnik CS (1990) Fracture of the pelvis: current concepts of classification. AJR Am J Roentgenol 155:1169–1175 Zhu F, Dong L, Jin X et al (2015) Testing and modeling the responses of Hybrid III crash-dummy lower extremity under high-speed vertical loading. Stapp Car Crash J 59:521–536

Vehicle Occupants in Traffic Accidents Garrett A Mattos

Abstract

Occupant motion and injury response in motor vehicle crashes is dictated by the forces applied to the human body in combination with the relative motion between occupant and vehicle. These responses are complex and dependent on many factors including those relating to crash, vehicle, and occupant characteristics. The study of occupant response is an important step in improving vehicle design and crashworthiness. This chapter provides an overview of the important aspects of occupant response for the four main crash modes: frontal, side, rollover, and rear. The general characteristics of occupant kinematic response are discussed with respect to each crash mode. Typical injury patterns and mechanisms are identified and their relationship to occupant motion and interaction with the crash environment is highlighted. Keywords

Crash mode • Injury patterns • Occupant kinematics • Serious injury

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 State of the Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Frontal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Injury Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Side . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Injury Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Rollover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Injury Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Rear . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

G.A. Mattos (*) Transport and Road Safety (TARS) Research Centre, University of New South Wales, Sydney, NSW, Australia e-mail: [email protected] # Springer International Publishing AG 2017 B. Müller, S.I. Wolf (eds.), Handbook of Human Motion, DOI 10.1007/978-3-319-30808-1_94-1

1

2

G.A. Mattos

Injury Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Countermeasures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

13 16 16 17

Introduction Traffic accidents generally involve one or more discrete impact events that result in global acceleration and local deformation of the vehicle structure. The specific characteristics of the vehicle response ultimately affect the motion of its occupants. This chapter will explore the relationships between the characteristics of vehicle crashes and occupant kinematics and injury patterns. Significant work has been accomplished, historically, to study occupant motion and predict injury response in motor vehicle crashes. Early testing programs utilized live animals or human volunteers to investigate occupant motion under subinjurious loading conditions. The use of post mortem human subjects (PMHS) to explore the response of the human body goes back to the nineteenth century and continues to this day. In the mid-1900s, anthropomorphic test devices (ATDs) began to be developed which could replicate human kinematic response under specific loading conditions. Recently, finite element (FE) modeling has been used to investigate the effects of a wide range of select crash and vehicle factors on occupant response outcomes by performing large amounts of simulations that could not practically be performed in a physical setting. Over the past 10 years, significant efforts have been made to create highly accurate FE models of the human body. These models allow for even greater control and variability of occupant anthropometry and morphology. Crash investigations and epidemiological studies provide the road safety community with the evidence base necessary to proceed with safety recommendations, design changes, or research projects to investigate the causes and mechanisms of pertinent injuries. These studies utilize large nationally representative databases or small subsets of very detailed crash reconstructions to identify the characteristics, and distribution of injuries occurring on the road and their findings translate into work focused on remedying deficiencies. This process is continuous and ongoing as the patterns of injury shift over time due to the changes in vehicle and road design and the implementation of improved safety features. The following will provide an overview of the current state of knowledge of the general characteristics of occupant kinematics, injury patterns, and injury sources for the four main crash configurations. Current capabilities in mitigating injury severity through the use of countermeasures and design choice will also be discussed. Much of which is provided below is available because of the aforementioned types of research that has been conducted and is available in the literature. For brevity and clarity, this chapter will focus on adult occupants of light passenger vehicles that are restrained with three-point seatbelts and involved in single-event crashes, unless otherwise noted. Injuries are described in general terms and their severity is defined

Vehicle Occupants in Traffic Accidents

3

using the Abbreviated Injury Scale (AIS) which rates injury from 1 to 6 based on the risk of mortality (AAAM 2008).

State of the Art Improvements in experimental methods, human surrogates, and data acquisition technologies continue to provide opportunities to study occupant response with more detail than before. For example, with regard to experimental methods, crash testing facilities are continuing to push the envelope in crash severity. Frontal crashes are being conducted at reduced amounts of overlap and considerations are being made to evaluate the effect of automated emergency braking on occupant response prior to and during the crash. Recently, for the first time, full-scale rollover tests have been performed using PMHS to study occupant kinematics and injury response. Improvements in FE model design, such as the definition of active musculature, and in computing power have allowed for more complex and realistic replications of the occupant behavior. Physical testing has similarly improved with the development of advanced acquisition systems that allow for investigators to visualize and accurately measure occupant movement. These include high-speed x-ray systems that can record the motion of the skeletal system during impacts and multicamera motion tracking systems that provide three-dimensional occupant motion measurements that are accurate to a fraction of a millimeter.

Frontal Frontal crashes account for over half of all crashes involving passenger vehicles. They occur when the front plane of a forward-traveling vehicle strikes another vehicle or fixed object. These impacts can involve the entire face or just a fraction of overlap of the struck vehicle’s front plane. The vehicle and occupant kinematics are dependent on the amount of overlap, defined by the amount of the frontal plane of the struck vehicle that is impacted by the striking vehicle or object. In full frontal crashes, the vehicle decelerates almost purely along its longitudinal axis as the crushzone structures of the vehicle frame deform and absorb the crash energy, while generally preventing intrusion into the occupant compartment. This impact scenario results in the occupants moving directly forward and into their seatbelts and forward airbags. The combination of seatbelt, airbags, and crush tubes aims to allow the occupant to ride down the crash event under a survivable deceleration. As the amount of overlap decreases, greater lateral and rotational motions begin to come into play, especially when the impact is applied at an oblique angle. Smalloverlap crashes, in which less than one-quarter of the front structure is impacted, often do not engage the crush-zone structure of the vehicle frame. In this scenario, the vehicle sustains major damage that can extend from the forward wheel to the passenger footwell, and even to the B-pillar in severe crashes. While the vehicle

4

G.A. Mattos

Fig. 1 Occupant and vehicle motion in small-overlap frontal crash

decelerates along its longitudinal axis, it also moves laterally and rotates, as shown in Fig. 1. The multidirectional vehicle response in small-overlap and oblique frontal crashes causes the occupants to move toward the impact on the front corner of the vehicle. Moving in this angled direction, forward and lateral relative to the vehicle, reduces the effectiveness of the airbags or prevents engagement altogether. A common phenomenon that occurs in frontal crashes, especially to rear-seated occupants, is known as submarining. Defined as insufficient restraint of the pelvis by the lap belt, submarining occurs when the lap belt fails to effectively engage the pelvis due to a combination of poor belt geometry and improper placement. For ideal performance, the lap belt is designed to be placed across the pelvis which is able to withstand the forces applied by the lap belt in a frontal crash. During submarining, the occupant’s pelvis moves forward while the lap belt rides up onto the abdomen. This can result in severe loading of the abdomen and lumbar spine, increased excursion of the lower extremities, and decreased displacement of the head.

Injury Patterns Injuries sustained in frontal crashes are primarily due to direct impact with the vehicle interior or loading on the body by the seatbelt or seatpan. The pattern of injuries is dependent on the crash scenario and occupant seat position, both relative to the location of impact and differentiated by front or rear location as demonstrated in Fig. 2. Small-overlap crashes tend to be more severe in terms of injury outcome, vehicle structural deformation, and occupant response. Children seated in front seats experienced significantly increased injury risk due to the mismatch between their size and injury tolerance and the performance of front seat restraints and airbags which are optimized for the 50th percentile male. The use of seatbelts greatly reduces

Vehicle Occupants in Traffic Accidents

5

Fig. 2. Distribution of seriously (AIS3+) injured occupants in large-overlap (left) and smalloverlap (right) frontal impact crashes by body region sustaining serious injury (Hallman et al. 2011)

the risk of injury to all body regions, especially the head and thorax, in low- and high-speed frontal crashes (Viano and Parenteau 2010). Head injury patterns differ between large- and small-overlap frontal crashes. With a decrease in the amount of overlap comes an increase in the likelihood of the head impacting components not protected by airbags, such as the A-pillar or center instrument panel. This results in a disparity of head injuries which is highlighted in field data. Occupants in large-overlap frontal crashes generally experience impacts between their head and the steering wheel or dash panel airbag directly in front of their seat. Impacts with the frontal airbag generally have a lower incidence of minor head injuries, such as bruising, and a greater incidence of moderate (AI2+) injuries involving a loss of consciousness (Hallman et al. 2011). The interaction between an occupant’s head and rigid interior components is more common in small-overlap frontal crashes and results in a higher rate of skull fractures and brain tissue damage. Moderate and severe head injuries are relatively rare for rear seat occupants. This is due to the combined effect of the vehicle’s structural performance and the rear seat occupant’s kinematics. Structural intrusion rarely reaches the rear seat occupants and, thus, limits their exposure to impacts with intruding objects. Secondly, rear seat occupants are more likely to experience submarining resulting in reduced upper torso and head excursion and limiting their exposure to direct impact. Spinal injuries, specifically those involving the posterior aspects of the cervical, i.e., upper, vertebra, occur at twice the rate in small-overlap versus large-overlap

6

G.A. Mattos

frontal crashes due to the oblique nature of the event (Hallman et al. 2011). The mechanism of these types of fractures is head impact with the A-pillar resulting in compression-extension of the cervical spine. Injuries to the lumbar, i.e., lower, spine are likely caused by seatpan loading of the pelvis that compresses the spine (Pintar et al. 2014). Chest injuries on the lower end of the injury severity spectrum frequently sustained in frontal crashes include skin contusions and abrasions and sternum fractures directly related to seatbelt loading. The most common serious (AIS3+) thoracic injury, also most often attributed to belt loading, is a unilateral lung contusion in higher severity crashes. Small-overlap impacts have a higher rate of bilateral lung contusions than large-overlap crashes, while both crash configurations have similar rates of injuries involving multiple uni- and bilateral rib fractures (Hallman et al. 2011). Thoracic injury risk, specifically to the lungs and heart, is increased in severe impact events that result in the occupant bottoming-out the airbag and directly impacting the steering column (Chen and Gabler 2014). For rear-seated occupants in frontal crashes, the thorax is the most commonly injured body region and occurs almost exclusively due to seatbelt loading (Beck et al. 2016). The balance between practicality and performance in seatbelt design has resulted in almost universal use of three-point seatbelt systems that create asymmetric chest loading in frontal impacts that is suboptimal for thoracic injury mitigation. However, the overall benefits of seatbelt use far outweigh any minor deficiencies in their design. Unbelted drivers are significantly more likely than belted drivers to impact the steering wheel or front dash in a frontal crash. Such impacts greatly increase the risk of serious thoracic injuries such as aortic, heart, and liver lacerations (Chen and Gabler 2014). A factor that can increase the severity of front belt-restrained driver and passenger injuries in a frontal impact is the presence of an unrestrained rear seat occupant. The rear seat occupant can load the rear of the front seat increasing the deformation of the driver or passenger’s thorax. Abdomen injuries are relatively rare in frontal crashes for front seatbeltrestrained occupants and account for approximately 5% of all AIS3+ injuries. For frontal occupants, they are primarily caused by interaction between the occupant and the steering wheel or lap belt (Reichert et al. 2013). Rear seat occupants, especially adolescents, are at a much higher risk of sustaining abdomen injuries due to their increased risk of submarining. Occupants that experience submarining are subjected to loading of the abdomen by the lap belt which can result in upper abdominal injuries as well as concomitant fractures of the lower rib cage. Upper extremity injuries in frontal crashes commonly consist of fractures of the hand, radius, and ulna. The incidence of injury to the outboard extremity is increased in small-overlap crashes due to its interaction with intruding components and exposure to crash forces. Airbag deployment increases the risk of upper extremity injury, especially to the hand and forearm for drivers that have their hands on the steering wheel (Jernigan et al. 2005). The lower extremities constitute the most commonly injured body region in frontal crashes, and for large-overlap crashes, they are the most frequently injured body region at the moderate (AIS2+) and serious (AIS3+) levels. Moderate (AIS2+)

Vehicle Occupants in Traffic Accidents

7

lower extremity injuries sustained in frontal crashes most often involve the pelvis, femur, knee, and tibia in the form of fractures (Hallman et al. 2011). The oblique occupant kinematics resulting from small-overlap frontal crashes increases pelvis loading while decreasing loading of the feet, as compared to large-overlap crashes. These injuries are sustained due to footpan loading caused by intrusion or from interaction between the knee and instrument panel as the occupant moves forward relative to the vehicle.

Side Side impact crashes are generally defined as planar crashes between two vehicles, or between one vehicle and a fixed object, in which the primary direction of force is within 45 of the lateral axis of the vehicle. While multivehicle crashes, in which the front of one vehicle impacts the side of another, are more common than those involving a single vehicle, i.e., fixed object collisions, the resulting occupant kinematics is similar. This crash configuration is dominated by a lateral acceleration of the vehicle, but generally also involves a longitudinal acceleration component. Typical impact scenarios include intersection crashes, in which both the struck and striking vehicles are moving, and single-vehicle crashes, in which loss of control events lead to a side impact with a fixed object. Occupants involved in side impact crashes can be classified by their seat position relative to the impacted side of the vehicle. Those seated on the struck-side are identified as near-side occupants, while those seated on the nonstruck side are identified as far-side occupants. The kinematic and injury response differs for near- and far-side occupants. The kinematic response of the near-side occupant is dominated by the intrusion of the adjacent door and B-pillar. The term “intrusion” is used here to describe relative motion between the vehicle’s nominal shape and its deformed shape. In the case of a side-impact against a stationary fixed object such as a tree, the intruding structure does not necessarily move relative to the global reference frame, e.g., surrounding environment, since it is also stationary against the fixed object. In a vehicle-tovehicle side impact crash, the intruding structure moves relative to both the global and the local vehicle reference frame. At impact, the struck side of the vehicle begins to intrude inward while the vehicle is simultaneously accelerated laterally. This results in a compounded relative lateral motion between the occupant and the vehicle interior, resulting in impact between the occupant and the intruding structure at essentially the preimpact speed. This impact typically begins with the pelvis and progresses with time upward through to the upper thorax and head (Yoganandan et al. 2015). Near-side (also referred to as struck side) occupant responses may be modulated by the deployment of torso side airbags and side air curtains and can be affected by the position of the upper extremity. As the event proceeds, the occupant will unload the vehicle’s struck side interior and begin to move toward the center of the vehicle. At this point, the occupant’s inboard pelvis will decelerate against the lap belt and center console, if one exists, and the upper

8

G.A. Mattos

torso may displace toward the far-side of the vehicle, depending on the performance of the shoulder belt. Unlike the response of the near-side occupant, the initial response of the far-side occupant is driven by the vehicle’s lateral acceleration, rather than its deformation, and is heavily influenced by the geometry and performance of the restraint system. As the vehicle accelerates laterally, the occupant is displaced toward the struck side. The displacement of the pelvis is more effectively mitigated than the displacement of the torso. The geometry of the lap belt and the additional support provided by the center console, if one exists, limits pelvis displacement to 100–300 mm. The shoulder belt is the sole source of torso restraint, and its effectiveness depends on its ability to remain in position over the shoulder. Increased belt slip, defined as the amount the shoulder belt moves off of its nominal position, reduces the effectiveness of the restraint. Depending on the amount of belt slip, the head can displace laterally up to 732 mm (Forman et al. 2013). As the head and torso move across the center of the vehicle, they interact with the near-side occupant, if there is one, and the deformed vehicle structure. By the time the far-side occupant reaches the struck side, the deformation event is generally over. The upper extremities, which are essentially unrestrained, often flail toward the struck side.

Injury Patterns Injury characteristics, risk, and patterns differ between near- and far-side occupants as indicated in Fig. 3. The proximity of near-side occupants to the impact zone increases their risk of injury for a given crash due to the direct loading that occurs between their body and the door. Head injuries have been found to almost exclusively result from direct contact and most often with the roof rail, B-pillar, or striking vehicle/fixed object (Yoganandan et al. 2010). Serious head injuries more frequently involve the brain than the skull, which is likely the result of an airbag’s ability to mitigate skull fractures more effectively than brain injuries (Yoganandan et al. 2010). Depending on the severity of the crash and the performance of the side air curtain, if present, the near-side occupant’s head may come into direct contact with the impacting face of the striking vehicle or fixed object. The head of the far-side occupant is prone to impacting a wide range of interior components, often on the struck-side of the vehicle, and the adjacent occupant, if there is one, due to its relatively unrestrained nature (Gabler et al. 2005). Spine injuries are somewhat rare in side impact crashes, though they comprise a significant portion of severe injuries and are often cited as a main cause of death. They can result from head impacts which load the neck axially and laterally (McIntosh et al. 2007). Soft tissue and joint injuries are also possible under inertial loading in side impacts. Thoracic injuries sustained by far-side occupants typically result from impact with the seatback, seatbelt, and vehicle interior (Gabler et al. 2005). Far-side occupants are likely to impact near-side seats that have deformed into the far-side

Vehicle Occupants in Traffic Accidents

9

Fig. 3. Distribution of seriously (AIS3+) injured near-side (left) and far-side (right) occupants in side impact crashes by body region sustaining moderate serious (AIS3+) injury (Rupp et al. 2013)

occupant’s trajectory. For near-side occupants, structural intrusion into the occupant space is believed to be a causal factor in producing chest injuries (Pintar et al. 2007). As the impact direction becomes more oblique, chest injury severity tends to increase due to the reduced injury tolerance of the ribcage to oblique loading. Rib fractures and lung contusions dominate the injury profile, with the vast majority of injuries located on the struck side of the body. While aortic injury is rare in side impact crashes, sustained by less than