Cochlear Implants: My Perspective

By William F. House, D.D.S., M.D. (Edited by David House)

To overturn orthodoxy
is no easier in science
than in philosophy, religion, economics,
or any of the other disciplines
through which we try to comprehend the world
and the society in which we live

~Ruth Hubbard, U.S. biologist
"Have Only Men Evolved?" (1979)

Chapter III, Fitting Implants and The Articulation Index

THE WHOLE FIELD OF IMPLANTS is in a rapid state of flux, which in itself is one of the characteristics of immaturity: the plant grows fastest, relative to its size, when very young.

Beyond the controversy surrounding the issues of electrode design and its effect on the ability of the patients to perceive sounds and thereby recognize speech, lies another controversy regarding how we can determine whether implants will allow a given patient to understand speech. It seems to me that the whole discussion regarding evaluating implants has focused inappropriately on final results rather than on measurable inputs.  

Studied speech

In my mind, there is a confusion and as well a bias which feeds this controversy. The confusion is regarding the precise role of hardware in the complex chain of speech. The bias is regarding long electrodes, and we have discussed that in some detail above.

With regard to this bias, I have shown, I believe conclusively, that long electrodes do not and cannot add any significant benefit to implants. However, beyond the fact that this bias first arose in the absence of any facts whatsoever, it is also true that some studies of the speech skills of implant patients have shown that there are some reasons to believe that patients more recently implanted (virtually all of whom use multiple electrode implants) do better on some speech tests.

If my premise is true, how can this be explained?

Many of these studies are certainly valid and well done, when the conclusions are restricted to those areas where their data allows statistically valid conclusions to be made. However, with regard to reaching the conclusion that multiple electrode designs are inherently superior to single electrode designs, these studies cannot help us. The problem with each of these studies in that context is that they have ignored or failed to report many significant factors, such as the sound encoding scheme of external processor or the educational background of the patients. Absent data which allows these factors to be examined, how can anyone authoritatively assert that the differences revealed have to do with electrode design, rather than one or multiple other factors?

Clearly, no one can.

A sound by any other name

The confusion regarding the role of hardware arises because we have failed to realize that regardless of how it is that implants actually work, there are only a few vital parameters of sound: intensity, pitch and timing.

It seems almost as if some of us believe that implants are offering the brain some unknown stimulus, as contrasted with giving a known stimulus in an unknown way. Implants provide access to sound, do they not? To say no is to engage in a semantic dispute which begins in words and ends in words, and which has no pragmatic consequence. Come, let us admit the matter until we have some useful reason to deny it: implants provide access to sound. [40]

Boothroyd [41] offers a clear statement about what prosthetic devices do:

The immediate purpose of hearing aids, tactile aids, cochlear implants, and visual aids is to enhance sensory evidence. This point cannot be emphasized strongly enough. Prosthetic assistance does not directly change the perceiver's knowledge or skills. It may do so eventually, in combination with training, maturation, and experience, but its immediate effect is at the sensory level. If, therefore, we wish to assess prosthetic needs and results, we should do so with tests that are maximally sensitive to the adequacy of sensory evidence and minimally sensitive to the adequacy of the perceiver's ability to take advantage of contextual evidence.

Therefore it is inappropriate to evaluate implants on the basis of the patient's ability to recognize speech because it ignores almost everything we have learned in more than 50 years of fitting hearing aids, it is impractical, and because it is completely unnecessary. Consider each of these points:

Fitting by speech results ignores history

Many of us are young enough that we have little perspective on how past controversies mirror current ones.

In 1946 Carhart [42] introduced the comparative method of hearing aid fitting, which was to select a hearing aid based on the percentage of correctly identified monosyllabic words. It was reasoned that since the right true end of fitting a hearing aid is to enable the patient to understand speech, the shortest distance between fitting and goal was to present the patient with speech, and evaluate which is the best aid accordinglyƅ as is now done with implants. This thought held sway for many years, but by the mid-sixties it had been concluded, as Jerger et al [43] pointed out in 1966:

Hearing aid performance measures based on single monosyllabic word lists are sufficiently contaminated by error that they do not necessarily reflect meaningful differences between various hearing aids.

These contaminating errors occurred because there are so many factors, far beyond the reach and ken of hearing aids, which combine to allow any patient to correctly report speech, or conversely which cause the patient to fail to do so.

Lets be very clear about this: what the aid does — indeed all the aid does — is provide better access to sound, and that benefit is best measured by standard pure-tone audiometric and similar techniques, aimed at discovering how well the aid assists the patient in perceiving intensity, pitch, and timing.

On the other hand, when we test speech recognition or understanding, we have introduced a whole set of factors far beyond the well-focused question of how well the aid helps the patient hear the sounds: we are then testing the patient, as well as the aid.

Further, as this is true of hearing aids, it is also true of implants. They may not be hearing aids, but they are certainly aids to hearing.

However, we should draw an important distinction. By saying that the measurement of speech is not useful or necessary for the evaluation of hearing aids or implants, we do not intend to say that it is no longer the primary goal as regards their use: certainly it is. As such, speech testing and evaluation has a vital part in working with a given patient as they strive to enhance their speech and language skills. [44]

Thus for the purpose of evaluation, we must test how well the implant provides intensity, pitch, and timing — access to sound — and we must focus on this. We very likely will not otherwise develop a body of data which is directly comparable among implants, and will thus seriously retard progress in this field. Returning to the differences found in studies of implant patients tends to illustrate this, because where complete information on potentially relevant factors is missing, we are left only with a series of unanswered questions. This often renders the whole study unusable.

The history of our field shows that, as entrenched as the comparative method once was, times change, and today most hearing aids are fitted by the prescriptive method as defined by Northern: [45]

The prescriptive hearing aid evaluation method is based on the assumption that given a patient's pure-tone auditory thresholds, most-comfortable listening levels, and/or loudness discomfort levels, the appropriate amount of gain for each frequency can be calculated mathematically and optimum aided speech intelligibility can be obtained through a predetermined formula.

In other words, the assumption of the prescriptive method is that if the auditory gain supplied by the hearing aid allows the patient to detect the sounds of speech, intelligibility will follow. This is further emphasized by Preves, [46] who states:

ƅthat the highest level of 'realism' attainable in hearing aid measurement techniques is that of sound field audiometry for obtaining functional gain — that is, the amount by which the hearing aid improves the patient's hearing threshold levels.

For the same reasons, we must also fit implants by "obtaining functional gain — that is, the amount by which the [implant] improves the patient's hearing threshold levels."

Fitting by speech results is impractical

Many of those who get implants can neither speak nor understand speech, either because they are too young, or because the vital stimulus of sound was denied them at a critical period.

This being the case, the only possible way for these patients to be fitted with an implant is to measure their sound reception, while they are wearing an implant, in some standard manner. Beyond that, even those who have either good speech reception or well-developed language skills will require time before they can be comfortable with any new implant and use it to the fullest in speech reception.

If the results gained by testing speech reception "are sufficiently contaminated by error that they do not necessarily reflect meaningful differences between various [implants]," what sense does it make to have to wait for the patient to learn enough that we can do such tests, only to obtain a result which — if we assume we have measured something about the implant alone or primarily — is prone to multiple errors?

The extreme case (patients who have very poor speech and language skills, or who lack any such skills) points out the difficulties experienced in all cases: what we are measuring when we measure speech reception is, to a large degree, the patient's skills, talents, native intelligence as regards speech, her or his previous exposure to speech, educational background, and the like.

All of these factors (e.g. contaminating errors) tend to mask the contribution of the implant, which, precisely like the hearing aid, is simply to provide access to the three main parameters of sound: intensity, pitch, and timing. Why not measure these more directly?

Fitting by speech results is unnecessary

Finally, it is clear that we have the tools and the experience to evaluate implants without any reference to speech results. As such, the evaluation of speech results — for the purpose of fitting an implant — is unnecessary.

We have a well-explored and well-understood and "predetermined formula": The articulation index (AI), which I will explain momentarily.

The sum, however, of the points above, again, is that we should not, and indeed cannot evaluate implants by measuring speech results: it ignores history, it is impractical and it is unnecessary.

Better sound detection leads to better sound perception. Whereas the former is more mechanical, the latter has to do with bringing the sounds to consciousness, and more particularly with recognizing patterns within complex sounds, such as the pattern of sound which represents our name. Finally we have speech recognition, which is a further internal step: it involves assigning meaning to the perceived sounds and sound patterns. (Consider the difficulties we face when learning a foreign language later in life!) Surely then the emphasis when fitting should be in measuring sound detection: measuring any of the other steps offers us insight into what the patient is able to make of the detected sounds, as contrasted with what the implant is able to supply as far as the quality and quantity of those sounds.

When we want to measure the patient, let us measure the patient; when we want to measure the implant, let us measure the implant.

Approving new implants

At present, of course, we do not have a wide choice among commercially-available implants.

Therefore one of the most important near-term implications of this approach to fitting implants has to do with progress in this field generally. Because we have such poor knowledge, at this point, about how implants work, and because (therefore) we have such a poor understanding of what will work better, we must, as the Chinese saying has it, "let a thousand flowers bloom." That is, we must, within the limits imposed by the safety of our patients, examine how we can allow for a significant number of new implants to be tested and approved.

Of course, the processes involved will vary on a per-country basis, but it is also true that in many respects, the U.S. breaks ground in this area, and my concern is primarily with the process by which implants gain approval in this country.

At present, that approval process seems predicated on a statistical demonstration that a given implant will provide patients the ability to develop speech recognition.

Certainly, as far as it goes, the goal is right and true; our patients must gain access to speech. But to insist that a significant number of patients must wear a given implant for long enough that they can then demonstrate speech capacity, as contrasted with demonstrating that patients who have that implant system can hear all the relevant sounds of speech, makes a world of difference.

In the former case (where it is required to have a number of patients who demonstrate speech reception), the approval process will require a long and expensive series of studies which are, as we have demonstrated above and will further demonstrate below, contaminated by errors, because we are testing the patients much more than the implant. Further, such studies will necessarily take several years, greatly slowing the process. [47]

In the latter case — where what is required is to demonstrate that patients have gained access to the relevant parameters of speech sounds — the approval process will be greatly shortened, less expensive, and further much more accurate as regards providing information about the implant, as opposed to information about the patients implanted and their course of training.

The consequence, clearly, of taking the latter course will be a flowering of the field, and much better and more directly comparable information about implants.

A measured response

Of course, in any scientific endeavor it is necessary not only to state the logic, but as well to offer the demonstration and proofs. Do studies which bear on the relationship between audiometric and speech studies bear out the thought that good hearing makes for good speech skills, all else being equal? A related and equally important question is: how do we draw out from the audiometric data some meaningful numbers, so that we can compare one audiogram with another as regards the patient's chances at gaining speech? As I mentioned above, the answer is yes and the number is the articulation index, or AI.

The articulation index

The AI offers an easily used method of quantifying the possibility of understanding speech, based on the audiogram. [48] [49] It allows us to make broad comparisons between rather different looking audiograms, and to have a convenient indication of the expectation [50] we may have for a given patient.

The articulation index was developed for and has proven very useful in predicting hearing aid success, and should prove equally useful in predicting cochlear implant success.

As the reader can see (in Figure 8), the patient-specific information used in the calculation of the articulation index is precisely the same as that used in a standard audiogram. The difference is that in plotting this information on what is in essence an enhanced audiogram form, interpretation is facilitated.

A standard audiogram form enhanced with numbers to facilitate the calculation of the articulation index

Figure 8

Figure 8. A standard audiogram form enhanced with numbers to facilitate the calculation of the articulation index. After drawing the
patient's aided measures, adding all of the numbers on or below the line will yield the articulation index. A patient with a flat 30 dB loss,
for example, will have an AI, as estimated by this form, of 65. (1 + 1 + 2 + 3 + 4 + 4 + 5 + 6 + 8 + 8 + 8 + 8 + 4 + 3) The articulation index as
calculated mathematically for such a loss is 63, demonstrating a good correspondence between form and formula. Please note that the
form is deliberately printed large enough that it can be copied and used.

This version of the articulation index employs a series of 100 numbers placed on a conventional audiogram, in a shaded area that represents the average speech spectrum (e.g. the "speech banana"). The value of the numbers is related to the importance of that particular frequency for understanding speech. [51] The numbers are of much higher density in the 1 to 4 KHz region, thus emphasizing the importance of high frequencies — which apparently carry most of the information regarding consonants — in understanding speech. Thus, by plotting a patient's warble-tone thresholds on the articulation index form, one sees at a glance which frequencies the hearing aid or implant user can detect, but more significantly their relative importance is made instantly apparent.

Thus one can either use this form, calculating the AI based on the sum of the highest numbers on or below the decibel rating at each of the critical frequencies, or it can be calculated mathematically. [52]

This is not the whole story, of course. It must be said that neither the standard audiogram nor by extension the AI — all by themselves — can tell us if the patient can discriminate between frequencies, only that the patient can detect them. That is, it may be possible for a patient to hear that a tone is present, without being able to tell if it is high or low. (This would be the auditory equivalent of color blindness, if you will.)

In order to discover if the patient can discriminate between frequencies, some fairly simple tests can be done, but generally are not. However, we have quite good indirect evidence that the majority of implant patients can and do discriminate between frequencies, based on the degree of open-set discrimination which they eventually develop. (It should also be said that because we apparently have few if any "frequency blind" patients, it has been a good working assumption that a patient who can detect a given frequency can also tell if it is high or low.)

We will cover this in greater detail below, but the thought concerning indirect evidence is based in the premise that a good deal of the information encoded in speech is available primarily or only to those who can discriminate between frequencies. Of course, the parameters of sound (intensity, pitch, and timing) and their relationship to speech discrimination turns out to be a surprisingly complex area, and has been difficult to quantify. Among other things, it may depend on the language being spoken, the degree of ambient noise, as well as a whole wilderness of non-auditory parameters. Speech as a code also turns out to have a good deal of redundant information, so that where one might get the best information about a particular feature of speech from a particular frequency range, it is rarely the only range where information about the phoneme in question is found.

However, taken from the point of view of the phonemes involved-- the consonants, vowels, diphthongs and modifiers — work has been done which demonstrates many of the relationships between the components of sound and the components of speech. [53] While the matter is more complex than this, one pertinent fact is that the frequency ranges between 2 and 4 KHz are critical to the understanding of consonant sounds. Further, because many consonants are easily confused when lip-reading, hearing these sounds remains important for greatest ease of communication.

The central fact remains: better access to sound means better access to speech.

It should be re-emphasized, however, that the questions which exist with regard to implants are the same as the questions which exist with regard to hearing aids, and the broadly-based conclusions which we reach as a professional group, I believe, will in the end be very much the same for implants as for hearing aids, if they are not identical.

AI literature

Of course, I am not the first to suggest that we use the AI to quantify the chances for a patient to develop speech detection.

Moog and Geers state: [54]

For a child who has not yet developed any verbal ability, the aided articulation index (AI) may be used to predict speech perception ability. This value represents the percentage of the amplified speech spectrum available to a child when wearing a hearing aid. Calculation of the aided AI requires only aided and unaided thresholds from the child. A comparison between the aided AI and speech perception scores resulted in sufficient agreement to cautiously recommend the aided AI as a reasonable measure that can be used to categorize the speech perception ability of profoundly hearing-impaired children who are not able to respond to speech tests. This approach can be used until it is possible to measure speech perception directly.

It should be emphasized, as Moog and Geers have indicated, that speech tests are necessary as the prime benchmark of the ultimate success of a given patient, using one or more implants. Again, however, this is a very different thing than testing to properly fit or evaluate an implant initially. It is difficult to see how one implant will differ from another in its ability to offer a given patient a foundation for the development of speech skills unless there is, at some point, a measurable difference between the implants with regard to their ability to deliver information about the intensity, pitch, or timing of sound to that patient. Consider that if our testing cannot reveal a difference (because the patient cannot discern and report a difference), then it can be said, clinically speaking, that no difference exists.

Moog and Geers [55] have devised four speech perception categories (which they describe as a measure of the potential to develop normal or near normal language and academic achievement), and have found rough correlations between these and the patient's AI scores:

  • Category 1: No pattern perception (AI score of 0-20%)

  • Category 2: Pattern perception (AI score of 21-49%)

  • Category 3: Some word identification (AI score of 50-69%)

  • Category 4: Consistent word identification (AI score of 70-100%)

Children who attain categories 3 or 4 with the help of their auditory prosthesis are considered capable of attaining to a good level of speech recognition.

I might mention that from this point to the end of the chapter, I will assume a good many things about the background of the reader. The narrative should make as much sense to you, if you are reading primarily for concepts, whether you read through the end of this chapter or not. More casual readers, therefore, may wish to skip to the next chapter.

Some numbers

Moog and Geers found a correlation between the AI and the ability of the patient to develop speech. Have other studies found a similar correlation? Of course they have. As well, such correlations can be demonstrated where sufficient information is available (audiograms and consequent speech scores) even from studies where this particular question was not asked.

For example, through the courtesy of Richard Tyler [56] I was given the sound field thresholds and the phoneme composite scores for 48 cochlear implant adults who were up to 5+ year users of their implants. Twenty-five of the patients used the Nucleus 22-electrode F0/F1/F2 pulsatile strategy and 23 used the Ineraid 4-electrode analog strategy. (There were no major differences in the performance of these two groups, so in this report their data are pooled together.)

The phoneme composite score is the percent correct of the sum of three open-set phoneme tests, namely: the Iowa Laser Videodisc Medial Consonant Test, the Iowa Laser Videodisc Vowel Test, and the Iowa NU-6 Test. This composite score is called the NU-6p score. In the study referenced above it was found that the composite scores of the two different hardware groups of patients improved steadily and strongly for the first 9 months of implant use, and more slowly and to a smaller degree for 18 months or longer.

I asked Karen Berliner, Ph.D., to analyze the data from these 48 patients, and her report is as follows:

Procedures:

From the data provided by Dr. Richard Tyler, thresholds and NU-6p scores for three intervals were selected for each subject. These intervals were the test session closest to one month since connection, 18 months (+5/-1 months), and last follow-up. Simple t-tests were performed to examine pair-wise differences between intervals, and Pearson correlations were used to examine the relationships between thresholds and discrimination.

Results:

The mean number of months post-connection for each of the three intervals were 1.3, 18.2, and 62.4, respectively.

There were no statistically significant changes in thresholds (250 Hz, 500 Hz, 1000 Hz, 2000 Hz, and 4000 Hz) between one month and 18 months. Interestingly, the correlation between threshold at a given frequency for the two test periods was not large, although it was statistically significant at 250, 1000, and 2000 Hz. None of the changes from 18 months to last follow-up were significant either. However, the difference between one month and last follow-up did achieve statistical significance at 500 and 1000 Hz, by averages of 5.3 and 4.4 dB, respectively.

Discrimination scores, on the other hand, did improve from one month to 18 months, but did not increase significantly between 18 months and the last follow-up.

From the last follow-up threshold data I was able to calculate the articulation indices (AIs) of these 48 patients. I also asked Dr. Berliner to look at the AIs in relation to the NU-6p scores. She reports:

There is a significant correlation between the AI calculated from the last follow-up and the one month, 18 month and last follow-up NU-6p scores. The correlation coefficient is moderate (r = 0.4 p < .003 for the 1 month NU-6p, r = 0.4 p.< .007 at 18 mo., r= 0.5 p. < .000 at the last follow-up). The correlations are positive, that is, the larger the AI, the better the NU-6p score.

The implication of this data is that sound field thresholds appear essentially stable over time and that the 1 month thresholds and AI can be used as a reasonable measure of the potential for patients to develop good speech reception skills.

The small long-term improvement in the 500 and 1000 Hz thresholds, I would speculate, represents the positive effect of long term electrical stimulation on the spiral ganglion cells and auditory CNS pathways.

Correlations imply that there is a relationship of the factors under study, as there is in this study between the AI and NU-6p scores. Of course, these correlations do not tell us the cause of such an effect, but I believe it is clear that a better AI represents better sound detection, which in turn leads to better sound perception and word identification as demonstrated by the NU-6p scores. Given the many factors which work together to produce the ability to perceive speech, these correlations are, in my opinion, quite significant.

Regarding a similar situation, Boothroyd states: [57]

The conclusion that the results reflect adequacy of sensory data is supported by two observations. First, in experiments with hearing-impaired subjects, test scores are significantly correlated with pure tone threshold. Second the relative scores on the various contrasts are explicable on the basis of acoustic cues that are believed to underlie their perception.

The importance of the audiogram is also emphasized by Watson: [58]

In general, the one non-speech measure that does a fair job of predicting speech processing ability is the audiogram itself. (If the critical acoustic features of a phoneme cannot be heard, they cannot be discriminated or identified: hardly a surprise).

Interestingly, some reported failures of the correlation between AI and the development of speech recognition may provide further strong evidence for depending on it to classify and fit implants.

For instance, Tye-Murray et al make a report of two cases: [59]

The first example labeled in their study Iowa Symbion 14 (IS14) was a 61 year old male, with progressive hearing loss starting at age 16 and becoming profound at age 39. He was profoundly deaf for 22 years before being implanted. His NU-6p score at 1 month was 0, 26 at 9 months and 28 at 18 months indicating he was fairly low implant performer. The striking thing is that his AI was 74 the highest of the entire study group.

The second example is IS12. She is a 33 year old female with progressive hearing loss starting at age 7 and becoming total 1 year before being implanted. Her NU-6p score at 1 month was 51, at 9 months 71 and at 18 months 75 which was the highest NU-6p score attained by any of the subjects. Her AI was 22 which is in the low range for these patients.

Note that in the first case the patient had been deaf a long time, starting from an early age. We can surmise that this patient had poorly developed speech and language skills, but in any case, he apparently had very little experience in connecting sound with language. In the second case, the patient was recently totally deafened, and, it seems, had spent years with gradually diminishing hearing, perhaps enabling her to hone her skills in discerning speech in circumstances where she was getting progressively fewer sound cues.

However, the value of the AI is implicit in the example, regardless that the result is not what we might hope in the first case. The remarkable thing about the first case is that, with so much information, the patient was able to do so little with it; in the latter case, what is remarkable is precisely the opposite. That is, we accept that the AI is the measure of what is being provided through the implant, and the speech score represents what the patient can do with that information. We can clearly see that the speech results have "contaminating errors," if we take these results as being indicative of what the implant is offering the patient. Further, it becomes even clearer that, with different populations of patients, different training methods and so on, the resulting speech scores between two implant programs cannot be taken as a valid comparison of the two different hardware sets.

Thus at present it appears that the best method of measuring the potential for future success is the one month post-implant AI. It is as accurate as any other tool we have available, it offers great convenience, and it will give us an excellent guide to the sound quality the implant is offering the patient.

If we want to evaluate the implant, we have no better tool than the AI; if we want to evaluate what the patient has done with the implant, speech tests are the means of choice.  

Chapter II, Controversy < PREVIOUSNEXT > Chapter IV, Auditory Success