Show & Tell
Tuesday, August 27th, 2013 - 16:30-18:30, Salon Tête d'Or
Wednesday, August 28th, 2013 - 14:00-16:00, Salon Tête d'Or
Show & Tell is a special event organized during the conference, where participants are given the opportunity to demonstrate their most recent progress, developments or innovation tracks, and interact with the conference attendees in an informal way, such as a talk, a poster, a mockup, a demo, or any adapted format of their own choice. These contributions usually (but do not have to) relate to a paper and they must highlight the innovative side of the concept.
Show & Tell participants have submitted a specific 2 page written contribution, together with optional multimedia content. Each proposition for the Show & Tell sessions has been peer-reviewed. Evaluation was based on the originality, significance, quality, and clarity of each submission.
We propose a short preview video of the Show & Tell sessions.
IS13-S&T - 360x270
IS13-S&T - 1440x1080
The demonstrations are divided into three sessions. You can click on the presentation number in the left column to view a video presentation of the demonstration, when available.
During the conference, the Google Best Show & Tell Prize will be awarded.
Monday, August 26th, 2013 - 16:00-18:00, Salon Tête d'Or
|ST1-1||The Furhat Social Companion Talking Head, Samer Al Moubayed, Jonas Beskow, Gabriel Skantze|
|ST1-2||Ultraspeech-Player: Intuitive Visualization of Ultrasound Articulatory Data for Speech Therapy and Pronunciation Training, Thomas Hueber|
|ST1-3||Laughter Modulation: from Speech to Speech-Laugh, Jieun Oh, Ge Wang|
|ST1-4||ReFr: An Open-Source Reranker Framework, Daniel M. Bikel, Keith B. Hall|
|ST1-5||Embedding Speech Recognition to Control Lights, Alessandro Sosi, Fabio Brugnara, Luca Cristoforetti, Marco Matassoni, Mirco Ravanelli, Maurizio Omologo|
|ST1-7||The Edinburgh Speech Production Facility DoubleTalk Corpus, James M. Scobbie, Alice Turk, Christian Geng, Simon King, Robin Lickley, Korin Richmond|
|ST1-8||Lexee: Cloud-Based Platform for Building and Deploying Voice-Enabled Mobile Applications, Dmitry Sityaev, Jonathan Hotz, Vadim Snitkovsky|
|ST1-9||A Tool to Elicit and Collect Multicultural and Multimodal Laughter, Mariette Soury, Clément Gossart, Martine Adda, Laurence Devillers|
|ST1-10||Design of a Mobile App for Interspeech Conferences: Towards an Open Tool for the Spoken Language Community, Robert Schleicher, Tilo Westermann, Jinjin Li, Moritz Lawitschka, Benjamin Mateev, Ralf Reichmuth, Sebastian Möller|
|ST1-11||Visualizing Articulatory Data with VisArtico, Slim Ouni|
|ST1-12||Audition: the Most Important Sense for Humanoid Robots?, Rodolphe Gelin, G. Barbieri|
Tuesday, August 27th, 2013 - 16:30-18:30, Salon Tête d'Or
|ST2-1||The Speech Recognition Virtual Kitchen, Florian Metze, Eric Fosler-Lussier, Rebecca Bates|
|ST2-2||Multilingual Web Conferencing Using Speech-to-Speech Translation, John Chen, Shufei Wen, Vivek Kumar Rangarajan Sridhar, Srinivas Bangalore|
|ST2-3||ROCme! Software for the Recording and Management of Speech Corpora, Emmanuel Ferragne, Sébastien Flavier, Christian Fressard|
|ST2-4||Voice Search in Mobile Applications with the Rootvole Framework, Felix Burkhardt|
|ST2-5||On-Line Audio Dilation for Human Interaction, John S. Novak III, Jason Archer, Valeriy Shafiro, Robert Kenyon, Jason Leigh|
|ST2-6||Phase-Aware Single-Channel Speech Enhancement, Pejman Mowlaee, Mario Kaoru Watanabe, Rahim Saeidi|
|ST2-7||A Free Online Accent and Intonation Dictionary for Teachers and Learners of Japanese, Hiroko Hirano, Ibuki Nakamura, Nobuaki Minematsu, Masayuki Suzuki, Chieko Nakagawa, Noriko Nakamura, Yukinori Tagawa, Keikichi Hirose, Hiroya Hashimoto|
|ST2-8||Reactive Accent Interpolation Through an Interactive Map Application, Maria Astrinaki, Junichi Yamagishi, Simon King, Nicolas d'Alessandro, Thierry Dutoit|
|ST2-9||A Tool to Elicit and Collect Multicultural and Multimodal Laughter, Mariette Soury, Clément Gossart, Martine Adda, Laurence Devillers|
|ST2-10||Design of a Mobile App for Interspeech Conferences: Towards an Open Tool for the Spoken Language Community, Robert Schleicher, Tilo Westermann, Jinjin Li, Moritz Lawitschka, Benjamin Mateev, Ralf Reichmuth, Sebastian Möller|
|ST2-11||A Non-Experts User Interface for Obtaining Automatic Diagnostic Spelling Evaluations for Learners of the German Writing System, Kay Berkling|
Wednesday, August 28th, 2013 - 14:00-16:00, Salon Tête d'Or
|ST3-1||Presentation of the Simple4All Project, The Simple4All project consortium|
|ST3-2||On-Line Learning of Lexical Items and Grammatical Constructions via Speech, Gaze and Action-Based Human-Robot Interaction, Grégoire Pointeau, Maxime Petit, Xavier Hinaut, Guillaume Gibert, Peter Ford Dominey|
|ST3-3||Development of a Pronunciation Training System Based on Auditory-Visual Elements, Haruko Miyakoda|
|ST3-4||Real-time and Non-real-time Voice Conversion Systems with Web Interfaces, Elias Azarov, Maxim Vashkevich, Denis Likhachov, Alexander Petrovsky|
|ST3-5||Application of the NAO Humanoid Robot in the Treatment of Bone Marrow Transplanted Children (Demo), E. Csala, G. Németh, Cs. Zainkó|
|ST3-6||Photo-Realistic Expressive Text to Talking Head Synthesis, Vincent Wan, Robert Anderson, Art Blokland, Norbert Braunschweiler, Langzhou Chen, BalaKrishna Kolluru, Javier Latorre, Ranniery Maia, Björn Stenger, Kayoko Yanagisawa, Yannis Stylianou, Masami Akamine, Mark J. F. Gales, Roberto Cipolla|
|ST3-7||Demonstration of LAPSyD: Lyon-Albuquerque Phonological Systems Database, Ian Maddieson, Sébastien Flavier, Egidio Marsico, François Pellegrino|
|ST3-8||SpeechMark Acoustic Landmark Tool: Application to Voice Pathology, Suzanne Boyce, Marisha Speights, Keiko Ishikawa, Joel MacAuslan|
|ST3-9||A Tool to Elicit and Collect Multicultural and Multimodal Laughter, Mariette Soury, Clément Gossart, Martine Adda, Laurence Devillers|
|ST3-10||Design of a Mobile App for Interspeech Conferences: Towards an Open Tool for the Spoken Language Community, Robert Schleicher, Tilo Westermann, Jinjin Li, Moritz Lawitschka, Benjamin Mateev, Ralf Reichmuth, Sebastian Möller|
|ST3-11||MODIS: an Audio Motif Discovery Software, Laurence Catanese, Nathan Souviraà-Labastie, Bingqing Qu, Sebastien Campion, Guillaume Gravier, Emmanuel Vincent, Frédéric Bimbot|
|ST3-12||The REAL Challenge – Call for Participation, Maxine Eskenazi
Round Table (Panel Session)
Industry Roundtable: "Innovative Products and Services Based on Speech Technologies"
We are pleased to announce this special event on Tuesday, August 27th, 2013 from 13:00 to 14:30.
Be aware that this Roundtable will mainly overlap with the lunch break, so interested people may want to make arrangements in advance to have some "walking" lunch before: neither food nor drinks are allowed in the Amphithéâtre.
Panel Discussion Participants
- Chairman: Roberto PIERACCINI, Chief Executive Officer, ICSI Berkeley, USA
Roberto Pieraccini is the CEO of the International Computer Science Institute in Berkeley, CA. Prior to that he was the CTO of SpeechCycle, a research manager at IBM T.J. Watson Research and SpeechWorks International, and a member of technical staff at Bell Labs and AT&T Shannon Laboratories. He started his career in the 1980s as a researcher at CSELT, the research laboratories of the Italian telephone company. His research interests range from speech recognition to spoken language understanding and dialog, multimodal interaction, and machine learning. He is a fellow of IEEE and ISCA, a member of the AVIOS board, and a member of the editorial board of several scientific and technology magazines. He is the author of “The Voice in the Machine”, a general-audience book published by MIT Press on the history of “computers that understand speech.”
- Michiel BACCHIANI, Google, USA
Michiel Bacchiani has been active in speech recognition research for about 20 years. He received the Ingenieur (MS) title from the Technische Universiteit Eindhoven in the Netherlands in 1994, a Ph.D. from Boston University in 1999, both in Electrical Engineering. In the early 90s he worked as a research staff member at the Advanced Telecommunications Research (ATR) Laboratory in Japan working on acoustic modeling for ASR and large vocabulary search. In 1999 he joined AT&T Labs -- Research in Florham Park, NJ. At AT&T he built the system that was entered into the DARPA TREC8 competition for spoken document retrieval. In addition, he built the recognition system for the ScanMail voicemail transcription system. His algorithmic work related to these efforts was mainly focused on speaker adaptation and environmental normalization. In 2004 he joined IBM Research in Yorktown Heights, NY where he was responsible for the system that was entered into the TC-STAR competitive evaluations. He has been with Google since 2005. At Google he built up the first acoustic modeling infrastructure used in the speech group. He led the group that built the transcription system underlying the automatic captioning of YouTube videos and the Google Voice voicemail transcription. He currently manages the acoustic modeling group which focuses on novel algorithms and large scale training of the models supporting all Google speech recognition products like the recognition engine underlying Android speech applications.
- Jérôme R. BELLEGARDA, Apple Inc., USA
JEROME R. BELLAGARDA
Jerome R. Bellegarda is Apple Distinguished Scientist in Human Language Technologies at Apple Inc, Cupertino, California. His general interests span voice-driven man-machine communications, multiple input/output modalities, and multimedia knowledge management. In these areas he has written over 150 publications, and holds more than 50 U.S. and foreign patents. He has served on many international scientific committees, review panels, and advisory boards. In particular, he has worked as Expert Advisor on speech technology for both the National Science Foundation and the European Commission, was Associate Editor for the IEEE Transactions on Audio, Speech and Language Processing, served on the IEEE Signal Processing Society Speech Technical Committee, and is currently an Editorial Board member for both Speech Communication and the ACM Transactions on Speech and Language Processing. He is a Fellow of the IEEE.
- Rodolphe GELIN, Aldebaran Robotics, France
Rodophe Gelin is engineer from the École Nationale des Ponts et Chaussées (1988) and Masters of Science in Artificial Intelligence from the University of Paris VI (1988). He started his career at CEA (French Atomic Energy Commission), He has been working there for 10 years on mobile robots control for industrial applications and on rehabilitation robotics. Then he had been in charge of different teams working on robotics, virtual reality and cognitics. From 2006 to 2008, he was in charge of business development for Interactive System Program. He has participated to the European Coordinated Action CARE that supports the ETP EUROP on robotics in charge of the robotic roadmap for the European Community. In 2009, he joined Aldebaran Robotics as head of collaborative projects. He is the leader of the French project ROMEO that aims to develop a human size humanoid robot. Since 2012, he is Research Director at Aldebaran Robotics. He is the author of two books “Robot, ami ou ennemi?”, “Comment la réalité peut-elle être virtuelle ?”.
- Dilek HAKKANI-TÜR, Microsoft Research, USA
Dilek Hakkani-Tür is a senior researcher at Microsoft Research. Prior to joining Microsoft, she was a senior researcher at ICSI speech group (2006-2010) and she was a senior technical staff member in the Voice Enabled Services Research Department at AT&T Labs-Research in Florham Park, NJ (2001-2005). She received her BSc degree from Middle East Technical University, in 1994, and MSc and PhD degrees from Bilkent University, Department of Computer Engineering, in 1996 and 2000, respectively. Her PhD thesis is on statistical language modeling for agglutinative languages. She worked on machine translation during her visit to Carnegie Mellon University, Language Technologies Institute in 1997, and her visit to Johns Hopkins University, Computer Science Department, in 1998. In 1998 and 1999, she visited SRI International, Speech Technology and Research Labs, and worked on using lexical and prosodic information for information extraction from speech. In 2000, she worked in Natural Sciences and Engineering Faculty of Sabanci University, Turkey. Her research interests include natural language and speech processing, spoken dialog systems, and active and unsupervised learning for language processing. She has over 20 granted patents and co-authored more than 150 papers in natural language and speech processing. She is the recipient of three best paper awards for her work on active learning, from IEEE Signal Processing Society (with Giuseppe Riccardi), ISCA (with Gokhan Tur and Robert Schapire) and EURASIP (with Gokhan Tur and Robert Schapire).
- Rohit PRASAD,Amazon, USA
Rohit Prasad joined Amazon in April 2013 as the Director of Research for Speech and Language Technologies, where he is assembling and managing a team of speech and machine learning scientists. Before joining Amazon, Rohit was a Sr. Technical Director and the Deputy Manager for the Speech, Language, and Multimedia Business Unit at Raytheon BBN Technologies (BBN). At BBN, Rohit was leading a large team on multiple Government- and Commercially-sponsored R&D efforts spanning spoken language translation, psychological distress detection from informal speech and text, keyword spotting from speech, intelligent tutoring, and text (and document image) classification. Rohit has worked on several spoken language applications such as automated directory assistance, natural language call routing, and speech-to-speech translation. In particular, he led the development of BBN TransTalk, a mobile speech-to-speech translation system for enabling two-way communication across a language barrier. Rohit is a Senior Member of IEEE, named author on 100+ conference and journal publications, and inventor on multiple US patents.
- Alex WAIBEL, KIT, Germany; CMU, USA and IMMI, France; Jibbigo LLC
Dr. Alexander Waibel is a Professor of Computer Science at Carnegie Mellon University, Pittsburgh and at the Karlsruhe Institute of Technology, Germany. He is the director of the International Center for Advanced Communication Technologies (interACT), a joint center between eight leading research institutions worldwide. The Center develops multimodal and multilingual human communication technologies that attempt to improve human-human and human-machine communication. Prof. Waibel also co-directs IMMI, a joint venture between CNRS, KIT, and RWTH in Paris. Prof. Waibel's team developed and demonstrated the first speech translation systems in Europe and the USA in 1990/1991 (ICASSP'91) and the first simultaneous lecture translation system in 2005, and Jibbigo, the world’s first commercially available speech translator product on a phone in 2009 (www.jibbigo.com). Dr. Waibel was one of the founders and chairmen of C-STAR, the Consortium for Speech Translation Research in 1991. Since then he has directed and coordinated many research programs in the field in the US, Europe and Asia. He currently serves as director of EU-Bridge, a large scale multi-site European research initiative aimed to develop speech translation services in Europe and of several US programs aimed at improving language portability and performance. Dr. Waibel has received several awards for pioneering work on multilingual speech communication and translation technology. He has published extensively in the field, received several patents and built several successful companies. The latest of these ventures, Jibbigo, has built the world's first speech-translator on a smart phone, and deploys its technologies in humanitarian and disaster relief missions. Dr. Waibel received his BS, MS and PhD degrees at MIT and CMU, respectively.
Description and Goals
The aim of this roundtable is to give IS participants from industry an opportunity to share their views on the recent advances in spoken language research, the remaining challenges, and the diversity of applications and to foster the exchange between academia and industrial practice.
The hype created during the past few years by applications such as Google voice search and Apple's Siri produced an unprecedented interest in speech technology all around the world. The availability of these and other similar applications to a vast population of users, allows the companies who own them to keep collecting extremely large amounts of speech data that is used to continuously improve their performance.
However there are several questions that need be addressed.
One of them is whether speech technology will reach human-like performance, and possibly go beyond that, just based on increasingly large amount of data, and new techniques such as deep learning or adaptation learning techniques? Or we are still in presence of some fundamental problems that will soon flatten out the accuracy improvement and prevent us to highly exceed the current performance.
A main concern of academic research is the proprietary nature of the data and impossibility of small groups and academic organizations to take advantage of that for advancing theories and models of speech, and thus provide a significant contribution to the field. Moreover collection of user data poses open questions of privacy, confidentiality, and security of which we need to be fully aware and we that should be addressed in a global manner.
In particular, the panel will try to address the following topics:
- Is speech technology going to be a solved problem soon?
- Is it just a matter of more and more data and faster and faster computers?
- What is the role of academic research in speech technology today?
- For which applications speech technology is good as it is?
- For which applications do we need more work?
- What are the new attractive fields of application and innovative products on services based on speech technology?
ISCA's 25th Anniversary
Research in spoken language processing has been very active and productive for many years. At the end of the 80s, concomitant initiatives in different parts of the world helped organizing the international community and led to the creation of large-scale initiatives such as the European Speech Communication Association (ESCA) in 1988, the launching of the biennial Eurospeech conference series in 1989 in Europe, and the launching of the biennial International Conference on Spoken Language Processing (ICSLP) in 1990 in Asia.
These efforts offered invaluable opportunities for the speech research community to emerge as such, as opposed to the previous situation where scientists working on speech were spread apart in various communities, and were publishing in conferences unrelated to one another, such as IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) and the International Congress on Phonetic Sciences (ICPhS).
From 2000 onwards, Eurospeech and ICSLP merged in a single annual INTERSPEECH conference, under the umbrella of the International Speech Communication Association (ISCA), built on ESCA and on the Permanent Council for the organization of the ICSLPs (PC-ICSLP). Nowadays, INTERSPEECH conferences gather 1200 to 1400 participants on a yearly basis.
In this year 2013, the present edition of INTERSPEECH takes place a quarter of a century after the founding events which launched the emergence of our scientific community. This offers an excellent opportunity to celebrate the liveliness of speech science and the achievements of these past 25 years.
For this purpose, we have asked three leading figures of our community, Joseph Mariani, Hiroya Fujisaki and Roger Moore, to address specific aspects of speech research and technology with the purpose to illustrate the strong dynamism and the multiple facets of our domain and to inspire us in building up the steps for the forthcoming 25 years.
This session will thus be the opportunity to warmly acknowledge and thank those who contributed to make the whole enterprise successful.
for the conference Organizing Committee
Programme of the session
- F. Bimbot: Introduction to the session
- J. Mariani: Rediscovering 25 Years of Discoveries in Spoken Language Processing.
- H. Fujisaki: An Inter- and Cross-disciplinary Perspective of Spoken Language Processing.
- R. Moore: Progress and Prospects for Speech Technology: What Ordinary People Think.
- Reactions from the panel - Results of the Quiz - Grand Get Together
Computational Paralinguistic Challenge
After four consecutive Challenges at INTERSPEECH, there still exists a multiplicity of not yet covered, but highly relevant paralinguistic phenomena. In the last instalments, we focused on single speakers. With a new task, we now want to broaden to analysing discussion of multiple speakers in the Conflict Sub-Challenge. A further novelty is introduced by the Social Signals Sub-Challenge: For the first time, non-linguistic events have to be classified and localised – laughter and fillers. In the Emotion Sub-Challenge we are literally "going back to the roots". However, by intention, we use acted material for the first time to fuel the ever on-going discussion on differences between naturalistic and acted material and hope to highlight the differences. Finally, the Autism Sub-Challenge picks up on Autism Spectrum Condition in children's speech in this year. Apart from intelligent and socially competent future agents and robots, main applications are found in the medical domain and surveillance.
» More details