Article Text

Download PDFPDF

Analysing voice quality and pitch in interactions of emergency care simulation
  1. Frank Coffey1,
  2. Keiko Tsuchiya2,
  3. Stephen Timmons3,
  4. Bryn Baxendale4,
  5. Svenja Adolphs5,
  6. Sarah Atkins5
  1. 1 Department of Research and Education in Emergency Medicine, Acute Medicine and Major Trauma, Nottingham University Hospitals NHS Trust, Nottingham, UK
  2. 2 International College of Arts and Sciences, Yokohama City University, Yokohama, Kanagawa, Japan
  3. 3 Centre for Health Innovation, Leadership and Learning, University of Nottingham, Nottingham University Business School, Nottingham, UK
  4. 4 The Trent Simulation and Clinical Skills Centre, Nottingham University Hospitals NHS Trust, Nottingham, UK
  5. 5 School of English, University of Nottingham, Nottingham, UK
  1. Correspondence to Dr Keiko Tsuchiya, International College of Arts and Sciences, Yokohama City University, 2360027, Yokohama, Japan; ktsuchiy{at}


Background/aims In emergency care, healthcare professionals (HCPs) interact with both a patient and their colleagues at the same time. How HCPs regulate the two distinct interactions is our central interest. Focusing on HCPs’ use of their voice quality and pitch, a multimodal analysis of the interaction in a simulation training session was conducted. Our aims are (1) to compare the use of HCPs’ voice quality and pitch in HCP–patient and HCP–HCP interactions, (2) to examine how different voice quality and pitch function in interaction, and (3) to develop the research methodology so as to integrate multimodal features in emergency care interaction for analysis.

Methods Three HCPs performed a scripted acute care scenario (chest pain) at the simulation centre. The multimodal corpus-based approach was applied to analyse the varying voice pitch and quality of the HCPs, in interactions with a simulated patient (SP) and with two other HCPs, in emergency care training.

Results The HCPs tended to use a clear voice when they talk to an SP and a ‘shattered’ voice to colleagues in the team. The pitch was raised to talk to an SP, by Helen (a nurse) and Mike (a doctor).

Conclusion This indicates that the HCPs strategically change their voice quality and pitch according to the addressees, regulating the interaction.

  • emergency care interaction
  • simulation training
  • voice quality
  • pitch
  • multimodal analysis

Statistics from


In a prototypical dyad medical consultation, a doctor talks to a patient, moving from history taking to diagnosis. In emergency care, however, several healthcare professionals (HCPs) simultaneously deal with several events as a team, talking to a patient (front stage) and to the other HCPs in the team (backstage), in the presence of the patient.1 Thus, HCPs need to regulate interactions, signalling their intention and whom they address verbally and non-verbally. Voice quality and pitch appear to be crucial devices for doing this. In family interactions, for instance, parents use low pitch voice in directive utterances, while holding a child, to get him/her go to bed.2 In medical consultations, general practitioners show their understanding of patients’ problem with a particular ‘prosodic contour’, using higher pitch in emphatic expressions, that is, ‘I do do you know what I really do understand that’.3 How HCPs use their voice quality and pitch in emergency care interaction has not been explored. This preliminary study investigates HCPs’ voice quality and pitch, applying a multimodal corpus analysis, which enables us to obtain finer descriptions of the complex interactions in emergency care. Two research aims are addressed here:

  1. To compare the use of HCPs’ voice quality and pitch in HCP–patient and HCP–HCP interactions

  2. To develop the research methodology in order to integrate analysis of multimodal features in emergency care interaction.


Multimodal corpus analysis

A corpus analysis has been traditionally used to identify linguistic patterns by observing concordance lines of a database of written texts or transcribed spoken interaction. More recently, multimodal aspects have been integrated into a corpus, that is, touch in emergency care training.4 The methodology of a multimodal corpus analysis has been significantly influenced by studies in the two fields: multimodal discourse analysis,5 which places emphasis on agency and action in discourse; and ethnomethodological conversation analysis,6 which, on the other hand, uncovers underlying social order through detailed observation of interactions.

Phonetic analysis

In multimodal discourse analysis, voice quality is described with five features: pitch range, loudness, roughness/smoothness, articulation and resonance.7 The seminal study in therapeutic discourse identified that a psychiatrist increased the pitch level in response tokens, which are short responses from a listener such as mhm in order to show active participation.8 We examined the voice quality and pitch of HCPs and the dynamics of group interactions in emergency care training.

Research data

The preliminary study analysed one 10-minute simulated training session. A medical student (Mike) and two experienced HCPs (Helen and David) performed a scenario that involved a simulated patient, ‘Ken’ (KSP), a 62-year-old man, who presented with chest pain.i The recording took place at the simulation centre as part of training for final-year medical students so that medical students rather than qualified physicians were involved (for convenience). Each participant carried a portable microphone for the recording and there was also recording equipment installed in the simulation room. The recording was stored in a miniature multimodal corpus, which included transcribed data sets of all the participants’ verbal utterances.ii Using a multimodal annotation tool (ELAN)9 and a phonetic analysis tool (Praat),10 the voice quality and mean pitch of the three HCPs’ utterances were annotated in timeline along with functions and addressees. For the voice quality, three measures were applied: jitter (ppq5), which is ‘a measure of the periodic deviation in the voice signal’; shimmer (apq5), which ‘measures the difference in amplitude from cycle to cycle’; and harmonicity (HNR), which is a measure of ‘the amount of periodic noise compared to the amount of irregular, aperiodic noise in the voicing signal’.11 A voice sounds harsh when the values of jitter and shimmer are large, and a shattered voice has a small value of HNR.


Differences in mean pitch and HNR in the HCPs’ utterances to KSP and to other colleagues were identified from the analysis, especially in the instances of Helen and Mike. We focus on Helen and Mike’s utterances here since the number of David’s utterances in the interaction was limited. Table 1 summarises the numbers of utterances the three HCPs addressed to each participant, the averages of the mean pitch (Hz), the voice quality (jitter (ppq5, %), shimmer (apq5, %) and HNR (dB)) and SD. Helen, for instance, talked to KSP 13 times, to David 9 times and to Mike 44 times. Helen adjusted her pitch to approximately 220 Hz and her HNR in 10 dB on average when she talked to the patient, and lowers her pitch to about 170 Hz and HNR to about 7 dB on average to her colleagues. Thus, when Helen talked to KSP, she tended to increase the pitch level and to use a clearer voice. At 00:08.2, for example, just after Helen came into the room and greeted KSP, she asked him, “can you tell me why you have come into hospital today?” with a higher pitch level. The mean pitch of the utterance to KSP is 291.6 Hz and the HNR is 11.9 dB (see table 2). After KSP’s problem presentation (chest pain), Helen turns to David (another nurse) and shows the ECG document to him, saying, “right, here’s the ECG,” at 00:40.8 with a lower voice. The mean pitch of this utterance to David is 149.7 Hz and the HNR is 6.7 dB (see table 3). Similarly, Helen lowered her pitch when she talked to Mike. Helen invited Mike into the room while informing him of the patient’s condition. At 02:09.2, after Mike listened to the medical problem from KSP, Helen, who was standing by the other side of the bed and listening to the conversation between Mike and KSP, asked Mike, “can I get some pain relief?” with a lower pitch. The mean pitch of the utterance to Mike was 165.6 Hz and the HNR was 7.5 dB (see table 4).

Table 1

Summary of the HCPs’ mean pitch and voice quality

Table 2

Mean pitch and voice quality in the utterances of Helen to KSP

Table 3

Mean pitch and voice quality in the utterances of Helen to David

Table 4

Mean pitch and voice quality in the utterances of Helen to Mike

To see whether Helen’s pitch and voice quality were statistically different, the Wilcoxon rank-sum test was conducted with the statistical computing program R,12 comparing the mean pitches and the voice qualities of Helen’s utterances to KSP and to her colleague, Mike, to whom Helen had spoken more frequently than David during the interaction. The Wilcoxon rank-sum test was chosen here rather than parametric approaches, such as t-tests or analyses of variance, since the samples are not paired in our data. We adjusted the significance as 0.0125 instead of 0.05, applying the Bonferroni correction to avoid the problem of multiple comparisons. The p value of the difference in the mean pitch is 0.0002, the HNR is 0.0001 and the shimmer is 0.001, which indicate that the differences are significant, although there is no significant difference in jitter values (the p value of the jitter values is 0.357) (see table 5). Thus, Helen seemed to strategically change the pitch, the loudness and the HNR of her voice according to the addressees, regulating ingroup (HCPs–HCPs) and outgroup (HCPs–patient) interactions. The z scores were also added to the table to test reliability, which supports the results from the p values in Helen.

Table 5

Mean pitch and voice quality in the utterances of Helen to David

However, this tendency is not shared with Mike since statistically there are no significant differences in his p values of pitch (p=0.1296), jitter (p=0.0184), shimmer (p=0.0466) and HNR (p=0.0341) between his voice talking to KSP and to Helen. z Scores in Mike are within the critical z score values (=±2.58).

The participants’ awareness of the change in Helen’s pitch and HNR was observed in one occasion in the interaction shown in the extract below.

Extract: A deviant case at 00:01:53

MikeAll right sir, so you’re in a lot of pain.


MikeShall we get some erm =


HelenRight, will you mind doing the cannula and some bloods <$G?>for me? Okay? and ECG.

Mike(.) we’ll carry on sir, how long=how long has this pain been going on for?

In this conversation, all the three HCPs were standing close to the patient and Mike was asking KSP about the pain. Then, Helen asked David, “Right, will you mind doing the cannula and some bloods <$G?>for me? Okay? and ECG.” in line 5 with a slightly higher pitch with a clear voice (mean pitch is 197.3 Hz and HNR is 9.8). At that moment, Mike and KSP paused their conversation and both looked at Helen to see whether she was talking to them. They immediately realised the utterance was addressed to David, and Mike resumed their conversation in line 6, uttering, “we’ll carry on sir, how long=how long has this pain been going on for.” This is a deviant case where Helen raised her voice to talk to her colleague, David, so that KSP and Mike mistakenly thought that the utterance was addressed to them.


This pitch analysis indicates that the mean pitch and the HNR of HCPs in an emergency care interaction function to regulate ingroup (HCP–HCP) and outgroup (patient–HCP) interactions. It also suggests the possibility to provide a finer description of emergency care interaction, adding phonological and non-verbal features to verbal utterances, although the data analysed here are too small to generalise the results. Applying the methodology developed here, we are planning to replicate the study with a larger set of data.



  • i All names are anonymised. The 62-year-old male patient does not exist, who was the character the simulated patient acted in the scenario.

  • ii The equal symbol = signals an unfinished sentence; <$G?> indicates inaudible sounds; <$E>… shows extralinguistic information including laughter and cough; and <$H>… appears where the accuracy of the transcription is uncertain. (.) indicates a very short untimed pause.

  • Contributors FC (emergency care consultant) and KT (linguist) have initiated the research project. FC planned the project, and arranged the data collection session and the scenarios. KT discussed the research plan with FC, attended the data collection and analysed the data. BB (consultant, director of the simulation centre) supported the data collection and provided valuable advice for collecting the data of the simulation training. ST (sociologist), SvA (linguist) and SaA (linguist) contributed to the data analysis and reviewed/improved the manuscript.

  • Funding This project was funded by the Great Britain Sasakawa Foundation (No. 4111). We thank the reviewers for their insightful comments.

  • Competing interests None declared.

  • Ethics approval The Ethics Committee of Nottingham University.

  • Provenance and peer review Not commissioned; externally peer reviewed.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.