Inventor Ray Kurzweil talks with Electronic Design Editor-In-Chief Mark David about the Kurzweil/National Federation of the Blind portable text-to-speech reader. In discussing the engineering challenges of integrating a digital camera and a PDA into a new device, Kurzweil also considers timing inventions by predicting the future, the future of portable object recognition, and his "law of of accelerating returns."
Mark David: Ray, thank you very much for taking the time to talk to me.
I've been interested in the contributions you've made over the course of my career, in that I spent a lot of years at a magazine called Automatic I.D. News , which covered alternatives to keyboard data entry, optical character recognition (OCR), bar code, and so forth. So, I was aware of a lot of your work in the OCR field. It's a pleasure and an honor to talk to you.
And then, I'm also a musician. So I'm certainly aware of your contributions there on the keyboard side, too. Was it the first sampling keyboard? Is that what your initial invention was there?
Ray Kurzweil: Well, it's the first electronic keyboard that could accurately recreate the grand piano and other orchestral instruments.
Mark David: So, it was really the level of sampling and...
Ray Kurzweil: Yeah; it was more than sampling, although it did incorporate sampling. It really modeled the response of a piano. Because if you just sample a piano, it doesn't convincingly recreate it. For example, samples will loop the last wave form because they don't really have enough memory to have the note sustain for 30, 40 seconds. When you loop the last wave form of a piano, all the overtones become perfect multiples of the harmonic, of the fundamentals. And this begins to sound like an organ.
Mark David: Right.
Ray Kurzweil: One of the things that make a piano sound unique is that the partials are actually slightly off the perfect multiple. They are called enharmonic. And there's a lot of other details like that \[which\] samples fail to capture. If you hit middle-C harder, it's not just louder. There's high frequency partials \[which\] attack more quickly and die off in a different pattern.
So we captured all of those subtle differences—really modeled the acoustic response of a piano—and it really sounded like a piano and felt like a piano. And we did AB tests with concert pianists and they were successful. Other samplers just didn't come close.
So, it was the first really successful recreation of complex acoustic instruments like the piano in an electronic instrument.
Mark David: Great contribution. That's great.
Ray Kurzweil: Thanks.
Mark David: So today, the reason for the interview is to talk about the reader for the blind. But, I'd like to have you give me a little bit of context for that.
I know you've been working on it for decades, and it seems like a natural confluence for somebody who's working in OCR to be involved in working with the blind. But, I wonder if you could talk a little bit about which came first, how you got involved with the National Federation for the Blind, and how that dovetailed with the early work you were doing in OCR?
Ray Kurzweil: Sure. I mean, my connection to this project originally comes from my interest in pattern recognition, not from a personal relationship to blindness. Which is relative.
Back in the '70s, Omni Font—any font character recognition was a classical problem, an unsolved problem. And the state of the art at that point was called template matching, which literally just had a pixel for pixel picture of what an A and a B and all the other letters and characters looked like. It couldn't even normalize for things like size, and it tripped up by letters touching each other, and all kinds of vagaries of real world print.
And, in fact, they were used in something called "type and scan," where people would take an original document and retype it using either Courier or OCRA type font. It had to be unit width; it couldn't be proportionally spaced. And then they could scan those typed documents.
And you might wonder what's the point of that since it's not eliminating the manual keyboarding step? But actually, in those days terminals were very expensive and an electric typewriter was a lot cheaper, so, it was actually worth it to retype the document?
Mark David: To retype to get it scanned in? Now, that's interesting....
Ray Kurzweil: Using an inexpensive typewriter—rather than these very expensive c omputer terminals. So I developed—with my team at Kurzweil Computer Products—the first Omni Font. And it could also deal with broken letters, letters touching each other, proportionally spaced print, photocopies, and things like that. It was a bit of a solution in search of a problem.
We were aware of the blind reading problem and also commercial applications, and we were reviewing these different markets. I happened to sit next to a blind gentleman on an airplane, and he was telling me how blindness is really not a handicap. And he represents his company. He flies all over the world, which he was doing right there, and conducts business all around the world. But then he said, 'Actually there's one handicap I do have, which is \[that\] I can't read ordinary printed material.'
And Braille is only 3 percent of the books, and most of the stuff he reads isn't books anyway. And same thing with tape recordings and recorded books.
If I could read my inter-office memos and other printed material on my own, that would overcome this handicap. And that was inspiring enough for us to decide that would be the focus of the project. This was back in '74.
We went looking for organizations we could work with that would support the project. A lot of other organizations were interested, \[but it was this\] 'let us know how it goes' kind of thing. But the National Federation of the Blind was immediately enthusiastic and wanted to work with me.
And they were, of course, very helpful with funding. We raised money from foundations and from the government. But they really wanted to work with me on every facet of the project. And that's what we did. They organized seven scientists and engineers who worked with my development team, and they really?
Mark David: And that, again?
Ray Kurzweil: ?Got very close?
Mark David: ...That's going all the way back to the beginning of the....
Ray Kurzweil: ...It's going back to the '70s.
And so, they worked with me on the development and the user interface and the testing, as well as things like marketing and manufacturing.
So it was a really close collaboration on the first print-to-speech reading machine— the Kurzweil Reading Machine—which we introduced January 13, 1976. And actually demonstrating it was Jim Gashell \[sp\], who is having the same role now....
Mark David: ...Oh, that's neat.
Ray Kurzweil: He demonstrated it with me. It was on \[TV\]...Walter Cronkite used it that evening for his signature signoff.
Mark David: Cool.
Ray Kurzweil: He had the reading machine read, "And that's the way it was; January 13, 1976..."
Mark David: ...Oh, that's neat.
Ray Kurzweil: I ran into him, actually, recently on Martha's Vineyard, and he said, yes, that was the first time he didn't personally read the signature signoff himself.
So, I've stayed closely involved through a whole series of different models of the Kurzweil Reading Machine, working closely with the National Federation of the Blind for 30 years prior to the introduction of this product. Because \[of all of the\] National Federation of the Blind readers...the principal reading machine is the Kurzweil 1000. And there are probably 100,000 users.
And as I said, that's the base system.
Mark David: ... Desk-based, PC-based, or...?
Ray Kurzweil: ...It's PC-based. You have a scanner and a PC....
Mark David: ...Right....
Ray Kurzweil: ...And Kurzweil 1000 software. But you have to bring reading material to your desk.
It's hard to bring the bank ATM display to your desk. And it's hard to bring a street sign to your desk or a sign on the wall...or a package in a supermarket. And there's a lot of things you can't put on the scanner like a soup can, etc.
And so the reading machine— while it does read ordinary printed material—doesn't allow blind people to read printed material as they go through the day.
And as a sighted person, if you think how often you read something that might just be a few words on a sign or label on clothing in a clothing store, a menu and so on, we're reading all the time. It's part of the visual information we take in. And blind people really have not been able to access that other than to get someone to read it to them.
Now, for many...one of the things I do, actually, is project technology trends, and I have a reputation as a futurist. I'd actually developed this because of realizing that it's important to be able to time your inventions, that it's key to being successful as an inventor. Most inventors fail not because they can't get their inventions to work, but because the timing is wrong.
And I realized that 25 years ago. And I've had a series of six books I've written on this topic. The latest is "The Singularity is Near."
I give a lot of speeches. And in my speeches to the blind community and, for example, to the National Federation of the Blind, I'd routinely say, 'Well, someday we'll have a reading machine you can just take out of your pocket and read printed material as you go through the day, signs on the wall, etc.' And I kept projecting that this would be feasible in terms of the hardware around 2006.
So in 2002, I was talking with the leadership of the National Federation of the Blind and they were saying, 'Well, Ray, how long do you think it would take to actually do this?' And I said, 'Well, the software would probably take about four years.' And \[they said\] 'Okay, you're predicting the hardware will be available in four years. So, why don't we get started?' So, we started this project about four years ago.
Ray Kurzweil: And the timing has worked out exactly right...
Mark David: ...Yeah, it's one thing to be a futurist. It's another thing to be an inventor who can actually make their predictions come true.
Ray Kurzweil: ...Yeah. That is actually why I'm a futurist, so that was the original reason is to have the timing work out....
Mark David: ...Right....
Ray Kurzweil: ...Because if you wait until things are feasible and then you start working on them, you'll miss the window of opportunity and so on.
Now, what we needed to do...well, obviously, we needed to, \[was to\] squeeze Omni Font OCR and music synthesis and user interface into a pocket computer. But we needed to develop a substantive piece of new software that was intelligent image enhancement. Because if you just take the images that a digital camera captures, with a blind person capturing print in the real world, it doesn't work. We tried that.
We can identify seven or eight types of distortion. For example: the images will be curved from the book. There's three different degrees of freedom of tilts and rotation. There's uneven illumination. In a scanner the illumination is controlled and it's perfectly even. The images are very often fuzzy because they're not quite in the right focus or the right angle or they're taken at a tilt, etc.
Mark David: Right, right. And I mean, this is compounded by the fact that the person capturing the image can't see what they're capturing. So....
Ray Kurzweil: ...Right. So that's actually a whole nother issue. For that we developed this whole field of view report, where it actually reports back to the blind person what it sees what to do. And say, well, I can see the left and top edges, but I can't see the right edge and that you should move to the right.
Mark David: But that's separate from the intelligent image enhancement?
Ray Kurzweil: Right, but that's actually another piece of reasonably intelligent software....
Mark David: ...Oh. Two totally separate pieces. Okay.
Ray Kurzweil: Yes, we developed that also.
But, the biggest challenge was this intelligent image enhancement.
We can show before and after pictures. It really does clean up the images nicely...And then, the OCR is able to work on it. And it works very well. Really, it's the top end of our expectations. If it sees some print at all it generally reads it quite well.
So, we've been working together.
Mark David: So, tell me a bit more about that second piece of it, then. I'd seen some of the things in the notes, but \[why don't you talk about\] the part of the software that tells you the percentage and the field of view and those sorts of things.
Ray Kurzweil: Right. We do this field of view report. The National Federation of the Blind was very helpful in helping us frame what needed to be articulated and how to best to convey that information auditorily. Because it's not just reading the document, it's giving you feedback on where the document is.
And so it has various cues, and it will tell you what it's seeing and give you enough information to move the camera away from the print, or you're missing the right edge, and things like that. We ultimately plan to make this even more sophisticated...
Mark David: ...And then, obviously, you have a lot of speech prompts that go out as part of that as the output.
Ray Kurzweil: That's right.
Ray Kurzweil: Completely interactive system...
We'll ultimately be adding object recognition. It'll be recognizing things besides print.
Mark David: Oh, cool.
Ray Kurzweil: So, not just directing a blind person as to where the print is in the room, but they can just point this in a room and it will actually recognize cats and lamps and...
Mark David: ...Right.
Ray Kurzweil: So, that's part of the future development....
Mark David: ...Oh, that's really neat. That's a great idea.
Ray Kurzweil: So, we worked closely with the National Federation of the Blind. I mean, their team kind of guided the priorities and the features and the user interface from the beginning. More recently, they've been testing the unit with a rather extensive testing program involving 400 or 500 users. And we're getting a tremendous amount of feedback. I mean, it's basically very seemingly positive feedback, but we also get lots of detailed issues that have been guiding...
Mark David: ...Right. Ways to fine tune. .
Mark David: So, our readers of Electronic Design ... and I don't know if you see the magazine or know it, but...
Ray Kurzweil: ...Yeah, I'm familiar with it.
Mark David: Right, yeah. Of course, we've been around for 52 years, so you've probably been a reader at some point in your career. And you know they're mainly hardware designers.
It's interesting?this particular project. One of the things that caught my eye about it from that perspective was the fact that you're merging together a couple of technologies that are mainstream consumer technologies, and creating something new.
What can you say about the challenges in doing that, if there were any, in terms of bringing two products together to create something new?
And also, you had mentioned that you could envision that these technologies would be at a certain level in terms of cameras and PDAs. Were there any specifics in terms of the functionality of those individual units?
Ray Kurzweil: Yeah. I'd like to make a few comments on that. First of all, we recognized right away that it was very desirable, if not imperative, to do it this way as opposed to try to develop some special purpose computer and camera for this blindness application. Because by the time you did that, you'd have something obsolete and you wouldn't be able to take advantage of the tremendous price performance of consumer electronics.
And I'm sure that as you at your magazine recognize, that is pretty phenomenal price performance, and it keeps moving. In fact it doubles every year. So, we definitely wanted to take advantage of that. And that is key to consumer electronics. Otherwise this would cost $20,000 and....
Mark David: ...Right, right. Just the processor design alone and so forth would be eating up your whole budget. But, yeah, if it costs $20,000 for the user, it's not gonna get out to a very wide audience.
Ray Kurzweil: ...Right. And the challenge is that these devices are not designed—their particular specifications, including all the intricacies of how information is communicated within the devices—with our application in mind at all.
They're designed for mass use and the conventional applications. And for example, most digital cameras don't allow you to control them other than with a human photographer snapping pictures.
So, to get inside these units and be able to actually control the camera, remote control...in a remote controlled fashion to take over the computer and not be just a Windows device, but be a dedicated application where the user interface, which is really designed for different applications works in a responsive manner to blind people, and...
So, there were a lot of nitty-gritty engineering issues. They were made difficult because these devices are really not designed to be effective OEM devices, to be built into products like this. We had to really defeat a lot of their engineering, which is targeted at a different kind of application and different kind of audience.
So, we were able to do that, but that was definitely a challenge. I mean, the intelligent image processing is a scientific challenge.
Ray Kurzweil: This was an engineering challenge.
Ray Kurzweil: But, it was necessary in order to achieve this cost effectiveness of using consumer electronics.
Mark David: Was there any one particular hurdle that was particularly challenging in terms of that engineering work?
Ray Kurzweil: Well, I mean, we do...it may not seem like the hardest problem in the world, but actually developing this case that holds the two units together is an electronic case. It actually sends the signals between the camera and the computer, and having the computer control the camera. And having it built such that a blind person can take it apart and put it back together and be reliable....
Ray Kurzweil: ...You know that was challenging engineering.
Mark David: And what kind of protocol did you use to have the two devices communicate? Was that taking advantage of the protocols they already had, or was that a new communication—Is it using IR, or how are they coupling?
Ray Kurzweil: It's using the standard interfaces....
Ray Kurzweil: ...But, we had to really reverse-engineer the APIs which are not published of these devices.
Mark David: Right. Okay. And what about the polarization? I saw in some of the press material information on the different filters and so forth. Was that sort of a hardware aspect of the image cleanup issues that you were tackling?
Ray Kurzweil: Well, part of the image enhancement is to use a polarizing filter that gets rid of glare . So, that was a bit of engineering that we did.
But mostly, the intelligence of the image enhancement is in the software. There's seven or eight different types of image degradation that occur with a handheld camera, and with three different degrees of freedom of tilt and rotation, uneven illumination, curved lines, out of focus images, and things like that.
Mark David: You talked a bit already—this is one of the questions I was going to ask—and it was really interesting what you were already saying about what's next in the vision for the future where the device would be used to recognize things beyond print.
Sort of two questions coming out of that. Having worked on this project, do you see any new sorts of applications for OCR outside of the blind reading field? And then within devices for the blind, do you have additional comments based on the experiences you've gained in this project; does it change your vision for where things are going?
Ray Kurzweil: Well, we're focused on this application.
There's a whole world of print out there, and....
Mark David: So, there's so much more for this. Right?
Ray Kurzweil: Products like the Kurzweil 1000 are optimized for printed documents like books and magazines. Print in the real world actually has much more variety in terms of formats and can exist...amidst the trees and other vagaries of real world images. And trying to figure out where the print is in a real world scene is different than the kind of assumptions you can assume if this was "a document." We're not just reading documents. And it actually works quite well, but we have a series of improvements we're making on that. This is a software-based product, so users will be able to easily update their software on an SD card.
And in terms of a more advanced direction, I mentioned we are seriously pursuing object recognition so this can be more than a reading machine. It can really be like a sighted assistant. It ultimately could be quite sophisticated in recognizing people and objects and describing real world scenes. Well, a seeing eye dog gives you some information, but this could actually be more like a sighted person describing what's in a room to you. So, that's where we're \[heading\]....
Mark David: ...Getting into coupling, what has been machine vision, but object recognition, and then also some biometrics, even, if you wanted to get specific about face recognitions and so forth?
Ray Kurzweil: Right. I mean there is face recognition software that actually works quite well, and we're working on integrating that into this application.
Mark David: But again, the challenge to bring it into the portable unit and to make it work for these unique applications....
Ray Kurzweil: ...Yeah....
Mark David: ...Gives you a lot to work on, for sure.
Again, considering our audience of engineers and designers, and your quite impressive track record in your career, do you have any sort of more general advice you might want to share? You talked about timing the market right. And that's very interesting that you got into being a futurist because it allowed you to help time your inventions....
Ray Kurzweil: ...I mean, it's more than a casual observation....
Mark David: ...Right....
Ray Kurzweil: ...And I've spent a lot of time on that. And actually, I have a group of ten people that gather data methodically in different fields, and...
Ray Kurzweil: ... I've written a whole series of books on it. And so, it's actually gone beyond just touting my inventions. Although I will say that the primary application I have for this technology forecasting is to time my inventions. And this project's a good example of that.
Ray Kurzweil: But, it does enable us to, I think, get very realistic ideas of what computation, communication, and even biological technologies will be like in 2020 or 2030. And it's...I mean, this gets me into a whole different area about which I could say a lot.
Ray Kurzweil: But people say that you can't predict the future. And I maintain, actually, there's certain aspects of the future that you can reliably anticipate. And I say this now, not just looking backwards, but I've been making forward-looking predictions for 25 years.
My first book, which I wrote 20 years ago, predicted in the '80s the emergence of a worldwide communication network in the mid-'90s. And computer taking the world's chess championship by '98, which happened in '97. The dominance of intelligent weapons in warfare. And a lot of other...hundreds of predictions about the 1990s, certainly the 2000 years, which have tracked very well.
I have a whole theory as to why this is. You know, specific projects are unpredictable, but the overall impact of information technology is predictable.
And another example of where you can get predictable results out of unpredictable events is thermodynamics. The path of any one molecule in a gas is completely unpredictable. And you have this whole gas made up of a large number unpredictable, chaotically, randomly interacting particles. But the overall properties of the gas are very predictable, according to the laws of thermodynamics, to a very high degree of precision.
And the whole evolution of technology is a similarly dynamic, chaotic, rich system that has predictable outcomes. And the power of price performance, capacity, and bandwidth of information technologies doubles every year, which is pretty phenomenal. It's a factor of 1,000 in 10 years and 1 billion in 30 years. And we're also shrinking the size of technology at a predictable rate. And this also applies to biological technologies. The amount of genetic data we've sequenced has doubled every year. Cost has come down by half every year. So, what was $10.00 in 1990 is a penny today. I could actually go on for hours about this.
Mark David: Well, it's fascinating observations. And particularly, I mean, for an audience of people who are... our readers are the ones who are working to keep that Moore's Law on track and to keep the technology moving forward. And so, you have to think about the future when things are moving as quickly as they are in this field.
Ray Kurzweil: A key point, though, is Moore's Law is really just one example of many of what I...of this broader phenomenon, which I call the law of accelerating returns.
Ray Kurzweil: You see it in areas that have nothing to do with chips.
Really, if you can measure the information content, it generally doubles in capacity or price performance—whatever it is you're measuring—every 11 months, 12 months, 13 months, depending on what you're measuring. It grows exponentially. And people don't factor that into consideration. People tend to think linearly. I call that the intuitive linear view. But, it's wrong.
The historically correct view is exponential. And if you \[look at it\]...and it's highly predictable. Not necessarily what society will do with these capabilities, but the capabilities are predictable.
So, I have pursued this quite seriously. I have a group of ten people that help me gather data and develop these mathematical models.
Mark David: Very interesting.
Okay. Well, I won't take up more of your time today because I promised you that we'd keep it around a half an hour. Thanks so much, Ray.
Ray Kurzweil: Yep. Yep....
Mark David: ...Pleasure talking with you....
Ray Kurzweil: ...Likewise.
Mark David: Bye.