Advanced technology in speech-based interfaces


Speech-based interfaces are non new to calculating, they have been comparatively underused as an efficient and effectual method of homo and computing machine interaction. The engineering has been of great involvement over the past few old ages, although there are still important betterments and possibilities for the hereafter. This paper investigates current uses and criterions of the engineering and what parts are being made. The paper besides identifies some possible future utilizations of Speech-based interfaces, and possible future benefits of this engineering, when compared to current methods and certain types of users.

Speech-based interfaces are non new to calculating, they have been comparatively underused as an efficient and effectual method of homo and computing machine interaction. A background to the engineering is included and it is described how the demand for natural linguistic communication and speech interfaces increased, and at that place became a demand for standardization, and the standard VoiceXML was released. From this criterion other engineerings were born, including a combination of XHTML and VoiceXML to develop Internet applications with a speech-based interface. These engineerings combined with web and auto engineerings have provided an chance for voice accountant motor vehicle control in the close hereafter. While this engineering has been designed to assist the mean individual be more efficient, with some little alterations there can be benefits to be gained from aged users and handicapped users every bit good. With every new engineering there exist jobs which will be discussed every bit good, and this will take to a decision summarizing points and warranting the benefits.

Natural linguistic communication interfaces are an of import portion of Human Computer Interaction, as the figure of telephones in the universe still outnumbers of computing machines and hence natural linguistic communication is more widely used than a mouse or keyboard. To smooth the advancement of exchanges between worlds and machines the World Wide Web Consortium ( W3C ) has published a recommendation for vocal interaction linguistic communication based on XML, which allows interactions on many interfaces including Internet applications by utilizing XHTML combined with VoiceXML. Because VoiceXML uses the HTTP protocol to pass on it is possible for a VoiceXML telephone gateway to pass on with a web waiter, in this type of environment the web waiter is supplying a response to a user on a telephone and bridging the spread between phone and Internet. This is supported by the World Wide Web Consortium ( 2010 ) :

The telephone was invented more than 150 old ages ago, and continues to be a really of import agencies for us to pass on with each other. The Web by comparing is really recent, but has quickly become a viing communications channel. The convergence of telecommunications and the Web is now conveying the benefits of Web engineering to the telephone, enabling Web developers to make applications that can be accessed via any telephone, and leting people to interact with these applications via address and telephone computer keyboards ( p. 1 ) .

VoiceXML is going a criterion for Human-Computer sound, with speech synthesis and acknowledgment of spoken input. This engineering brings the ability to hold a natural conversation as an Internet and content use interface. An automated phone system with VoiceXML besides has the ability to understand or interpret multiple linguistic communications. The popularity is increasing as major companies such as IBM, HP and Motorola are now back uping and utilizing VoiceXML. A major end is to “ convey the advantages of web-based development and content bringing to interactive voice response applications ” ( Rouillard, 2007, p. 27 ) .

XHTML + Voice ( X+V ) are a engineering for depicting ocular and audio web pages, ocular interaction is described by XHTML and audile interaction is described by VoiceXML. Enabling users to hold a HTML show of a web site, with the ability to voyage and utilize the site by voice or by traditional methods of input. Until late XHTML and VoiceXML ( X+V ) functionality had non been implemented by major Internet browser companies, alternatively it had been used by little companies with authorities grants and been talked about as a possible hereafter engineering. Presently the Opera web browser offers native support for XHTML and VoiceXML, it will besides try voice interaction with standard XHTML pages. While Internet Explorer and Firefox still do non hold native support for XHTML and VoiceXML, although 3rd party extensions and additions have been created. Opera Software ASA say, “ any ordinary browser bid can be done by voice, such as voyaging to, and following the following nexus in a papers, traveling to the following slide in an Opera Show presentation, or logging on to a watchword protected Website ” ( p. 1 ) . XHTML and VoiceXML offer an increased chance with Opera web browser now being installed in Ford vehicles, for a speech-based interface to enable eye-free and hands-free computing machine interaction while driving. This engineering could potentially command dash-panel and computing machine systems via speech-based interfaces, enable users functionality from altering the temperature of the warmer to directing electronic mails by voice while driving a auto. Opera Software ASA say, “ This solution will let Ford truck and new wave proprietors to keep a practical work environment with entree to all of the of import files, information and applications they need on a day-to-day footing ” ( p. 1 ) .

Because XML is a dynamic and cosmopolitan linguistic communication overseen by the W3C, it means that XML based engineerings such as VoiceXML are non limited to Internet applications. The same piece of XML can be used for assorted applications and imported into other applications if they support it, and there is no ground why VoiceXML can non be the same in the hereafter as good. Mobile phones for some clip hold had the ability to read text messages and electronic mail messages aloud to the user, which could be good for visually impaired individuals and individuals runing a vehicle. “ Text-to-speech package reads the text on the screen aloud in a natural sounding voice, giving you convenient entree to phone bill of fares and maps, short messages, e-mail messages ” ( Nokia, n.d. , p. 1 ) . Using VoiceXML based engineering it is wholly possible for a user to read a text message aloud to the nomadic phone, the phone translate this to textual content and sends it via the SMS service. This may sound silly at first, due to the engineering to be able to name person and state it verbally without a computing machine interpreting the words into text for you. Although this would give concerns a greater ability to remain in contact while on the move, as text messaging is used extensively in concern and preferred in some instances depending on the message being sent. This could besides supply a solution to a major job with cellular phones, which is texting while driving. In rule a engineering that allows a user to drive and direct text messages safely while speaking to their cell phone will salvage lives and do lives easier. Talking to a rider or vocalizing to the wireless has non been noted as a important cause of clangs, which are really similar maps to verbalizing a text message. “ Government functionaries are n’t the lone 1s acquiring on the texting ban-wagon. Television talk show host Oprah Winfrey has launched a national telecasting and Internet run to promote people to perpetrate to seting their cell phones off while driving ” ( Hattiesburg American, 2010, p. 1 ) . As engineering has progressed, people have continuously sought after smaller and smaller devices with greater item and velocity. Technology has reached the point where the input devices themselves are keeping back the device from going any smaller. “ Voice interaction can get away the physical restrictions on computer keyboards and shows as nomadic devices become of all time smaller ” ( World Wide Web Consortium, 2010, p. 5 ) .

With a planetary aging population it is of import that we enable and help aged people to work and populate every bit independently as engineering will let. Aged people may be able to profit by the promotion of speech-based engineerings, but to foremost understand how they could profit, it is of import to understand their features. “ The human interfaces to most computing machine systems for general usage have been designed, either intentionally or by default, for a ‘typical ‘ , younger user ” ( Gregor, P. , Newell, A. F. , 2001, p. 1 ) . Aged people can be crudely generalised into three groups: fit older people, frail older people and older people with long term disablements. Fit older people can be described as those who appear or do non see themselves handicapped. Frail older people who would be considered as handicapped and have one or more troubles, including at least one that impairs their functionality in some manner. The aged who have had a long-run disablement throughout their life that has affected the aging procedure and their ability to map is dependent on worsening maps. Other facets to maintain into consideration are the variableness in physical, centripetal and cognitive abilities with the aged, as one size does non suit all in this state of affairs. Another facet is the fluctuations in ability to run a computing machine system due to disablements, damages and larning capablenesss. Gregor and Newell ( 2001 ) conclude:

In general, as people grow older their abilities change. This procedure of alteration includes a diminution over clip in the cognitive, physical and centripetal maps, and each of these will worsen at different rates relative to one another for each person. This form of capablenesss varies widely between persons, and as people grow older, this variableness increases. In add-on, any given person ‘s capablenesss vary in the short term due, for illustration, to impermanent lessening in, or loss of, map due to a assortment of causes including unwellness, blood sugar degrees and province of rousing ” ( p. 2 ) .

Interfaces for older people need to hold a greater diverseness of functionality when compared to a younger group, to run into the greater demands. By supplying a address based interface as an option for runing a computing machine, it is dependent on a map that most people have used their full lives and is reliant on a map that is non considered to dramatically diminish with age. This can besides enable them to utilize a computing machine system with a telephone as described antecedently with VoiceXML capablenesss, for those who are intimidated by engineering and the idea of utilizing a computing machine. Finally the interface designed demands to utilize general footings over proficient footings, for illustration traveling to the chief subdivision instead than snaping on the place nexus.

Most systems and interfaces are designed for typical healthy or high operation users, when compared with users with disablements that can hold troubles utilizing a standard keyboard or mouse. It is of import with the growing of the Internet and engineering that disabled users are non left out, and that they are able to entree these resources if they choose, or if it could profit their lives. There may be state of affairss where a computing machine application could profit the life of person with a disability, but they can non utilize a computing machine due to motor-function limitations. This demonstrates the demand for hands-free or eye-free computing machine entree and includes two chief groups, visually impaired users and motor-handicapped. “ The Web Accessibility Initiative ( WAI ) works with organisations around the universe to develop schemes, guidelines, and resources to assist do the Web accessible to people with disablements ” ( Web Accessibility Initiative, 2009, p. 1 ) . Many applications and web browsers are developed to help people with disablements, although many of them have been softly withdrawn go forthing broken links or on the juncture that the system is still available for download it may hold been abandoned and non maintained any longer. An of import facet of developing voice applications for disabled users is that they may desire to utilize voice control in combination with other interfaces such as a control stick or other assistance devices. The purpose of address systems is by and large naturalness and to copy conversations that we have had our full lives, but in the instance of users with disablements it may be more good to take for learn-ability over naturalness. For illustration alternatively of stating ‘activate mike ‘ or something proficient to trip the mike, “ stating ‘Wake Up ‘ : un-mutes the mike and turns on the visible radiation in left side ” ( Brondsted & A ; Aaskoven, 2005, p.4 ) . Technology is presently heading toward eye-free and hands-free entree of systems, for intents such as accessing a computing machine while driving a auto or doing us more productive. The same base engineering is required to back up address based services for handicapped users, but the difference of demands when interacting are really different. We by and large would prefer to talk to a computing machine in a bend based communicating like we have when we are speaking to other human existences, although as an assistance for utilizing systems or interface for handicapped users it would be more good to utilize bid goaded voice systems utilizing non-technical footings. While still utilizing human to human footings, such as wake-up and sleep which even badly mentally handicapped users would understand. There are people with mental disablements so terrible that they are unable to understand wake-up or slumber, but they are extremely improbable to hold any demand for a computing machine, as they are more concerned with lasting twenty-four hours to twenty-four hours.

The VoiceXML criterion has ensured a guideline for developing voice applications, but there are presently no criterions for the development environments or interfaces. This means that the layout and functionality from development environments will be wholly different, and the codification generated by the development environments will non needfully be compatible, as the two different development environments will bring forth wholly different tickets and formats. Building spoken applications from abrasion can take a long period of clip, and several different models and engineerings. As VoiceXML works with preset grammar, which can be troublesome in the development of some applications. But by uniting the VoiceXML platform with independent systems for voice acknowledgment, it is possible to increase its capacities of understanding. VoiceXML is great measure toward address and voice based interfaces, but it has a batch of work to go a complete model for developing address applications. “ Consequently, a great trade of accent has been placed on the development of toolkits and environments that hide some of this complexness and let developers to quickly prototype and deploy speech-based applications. ” ( Bennett & A ; Llitjod & A ; Shriver & A ; Rudnicky & A ; Black, 2002, p. 1 ) .

Natural speech-based interfaces can supply a known and familiar interface for interacting with computing machine systems, because we spend our lives discoursing with other people and pass oning over the telephone. Current engineering makes it possible to interact with a web site or computing machine application via a telephone and it is possible to interpret the linguistic communication spoken for the system, and interpret a response back to the user. The ability to utilize a generic markup linguistic communication like VoiceXML with applications such as XHTML is a leap forward in making an Internet that can be accessible via speech-based interfaces. This enables future engineering such as voice controlled maps of a motor vehicle and improved cell phone address interface. One of the most important impacts of this engineering is the ability for aged people to utilize a map is non known for devolution as a calculating interface. This will besides enable users who are new to computing machines but familiar with telephones to utilize a computing machine more easy. Many handicapped people struggle to keep their independency, with motor map limitations that prevents them from utilizing a computing machine efficaciously. With the ability for handicapped people to pull strings plans and shop the Internet with a address interface, it could assist them keep their freedom and independency. As with all new engineerings, there are terrible jobs that a solution must be found for before this engineering can take off ; this includes a criterion for a complete model instead than merely a markup linguistic communication supplying grammar and big vocabulary support. It is concluded that speech-based interfaces presently, and will go on to, supply benefits in the promotion of the engineering, supplying that the right people get entree to this engineering and non merely the mean user who is happy to type.


