|
Director |
|
 |
Carol Espy-Wilson
Director, Speech Communication Lab
email : espy (at) glue (dot) umd (dot) edu
Dr. Carol Espy-Wilson is a Professor in the Electrical and
Computer Engineering Department and the Institute for
Systems Research at the University of Maryland.
Dr. Espy-Wilson received a B.S. in Electrical Engineering
from Stanford University in 1979, and a M.S., E.E. and Ph.D.
in Electrical Engineering from the Massachusetts Institute
of Technology in 1981, 1984 and 1987, respectively. Prior to
joining the faculty at the University of Maryland, Dr.
Espy-Wilson was a faculty member at Boston University.
Dr. Espy-Wilson's research is in speech communication. She
combines knowledge of digital signal processing, speech
science, linguistics and acoustic phonetics to conduct
interdisciplinary research in speech recognition, speech
production, speaker recognition and speech enhancement.
Specific research projects include the development of (a) a
speech signal representation that contains only linguistic
information, (b) the a speech signal representation that
highlights speaker characteristics, (c) a probabilistic
framework for an event-based speech recognition system, (d)
supervised and unsupervised acoustic models for speaker
recognition and (e) vocal tract models of complex speech
sounds.
Dr. Espy-Wilson has authored or coauthored numerous papers
in journals, conference proceedings and books. She is a
Fellow of the Acoustical Society of America (ASA) and a
Member of the Institute for Electrical and Electronic
Engineers (IEEE). Among the honors and awards she has
received for her research contributions are the Clare Boothe
Luce Professorship in 1990, the Independent Scientist Award
from the National Institutes of Health in 1998 and the Honda
Initiation Award in 2003. Professor Espy-Wilson has been
appointed a Fellow of the Radcliffe Institute for Advanced
Study for the academic year 2008–2009. |
|
Current Graduate Students |
|
Xinhui
Zhou
Research Assistant
Email : zxinhui (at) glue (dot) umd (dot) edu
Xinhui is a
graduate student in the Electrical and Computer Engineering
Department. His interests include incorporating the
knowledge of speech production and auditory system in audio
and speech signal processing. Xinhui's primary projects
involve acoustic modeling of vocal tract for American
English Liquid sounds /r/ and /l/. He joined the Lab in
October 2003 and currently is pursuing his PhD. |
 |
|
 |
Daniel
Garcia-Romero
Research Assistant
Email : dgromero (at) glue (dot) umd (dot) edu
Daniel is a PhD student at the
ECE Department, University of Maryland at College Park since
2006. He earned his BS and MS in Electrical Engineering both
from Universidad Politecnica de Madrid, Spain, in 2000 and
2004.
His research interests are in the broad area of signal
processing, machine learning and information forensics. Most
of his contributions have been in the area of speech
biometrics. He has participated in four NIST speaker
recognition evaluations. Currently, he is working in speech
source identification and media authentication for speech
forensics. |
|
Vikramjit Mitra
Research Assistant
Email : vmitra (at) glue (dot) umd (dot) edu
Vikram is a PhD student at the
ECE Department, University of Maryland, College Park. He
received his MS in Electrical Engineering from the
University of Denver in 2004 and his BS from Jadavpur
University, India in 2000.
His research interests include speech recognition, acoustic
to articulatory speech inversion, language identification,
audio content analysis and information retrieval. Currently
he is working on ways to improve robustness in speech
recognition (ASR) systems, addressing the issue of
robustness against speech variability and noise corruption.
He is addressing the problem of speech variability by using
a gestural model, where speech variations are accounted for
by gestural overlap in time and reduction in space. He is
also working on language detection in conversational speech
as well as in music. In this effort he has shown that
language detection can aid the process of audio content
analysis and can help to realize a systematic audio
description methodology. |
 |
|
 |
Srikanth
Vishnubhotla
Research Assistant
Email : srikanth (at) glue (dot) umd (dot) edu
Srikanth is a Ph.D student in
the Department of Electrical & Computer Engineering. He
obtained his MS in Electrical Engineering from the
University of Maryland in 2007, and his Bachelor's degree in
Electronics & Communications Engineering in 2004 from the
Jawaharlal Nehru Technological University, Hyderabad, India.
He joined the SCL in January 2005. His research interests
include Human and Machine Pattern Recognition, Speech
Enhancement and Separation, and Cognitive Systems. He is
also interested in Blind Source Seperation, general Signal
Processing and Image Processing.|
His current research deals with the extraction of speaker
streams from a mixture of speech signals, and analysis of
the contribution of the envelope and fine structure of
speech signals to human perception. |
|
Vijay
Mahadevan
Research Assistant
Email : vijaym (at) glue (dot) umd (dot) edu
Vijay is a graduate student in
the Electrical and Computer Engineering Department. He
earned his Bachelor's degree in Electrical and Electronics
Engineering from Birla Institute of Technology and Science,
Pilani (BITS, Pilani) India. He joined SCL in January 2009.
His research interests include Signal Processing and
Information Theory. Currently he is working on problem of
multi pitch tracking. |
 |
|
 |
Ayanah
George
Research Assistant
Email : ageorge (at) glue (dot) umd (dot) edu
Ayanah is an MS student in the
Department of Electrical & Computer Engineering. Her
research projects have included working on speech
enhancement during the MERIT program at the University of
Maryland, and on speech segregation during her graduate
studies. |
Alumni |
Post-Doctoral Researchers |
|
Om Deshmukh
(2006 - 2007; currently at IBM India Research Lab) |
|
Gongjun Li |
|
Zhaoyan Zhang
(2002 - 2004; currently Assistant Professor at the UCLA School of
Medicine) |
|
Suzanne Boyce
(1994 - 1995; currently Assistant Professor at the University of
Cincinnati) |
| |
|
Ph. D. Students |
|
Name |
Graduated |
Title of Thesis |
Current Employment |
|
Nabil Bitar |
Fall 1997 |
“Acoustic modeling of speech based on phonetic features”
|
GTE |
|
Amit Juneja |
Dec 2004 |
“Probabilistic landmark detection based on acoustic-phonetic
information for automatic speech recognition” |
Think A Move |
|
Om Deshmukh |
Jul 2006 |
“Synergy of Acoustic-Phonetics and Auditory Modeling Towards
Robust Speech Recognition” |
IBM India Research |
|
Tarun Pruthi |
Jan 2007 |
“Analysis, Vocal-Tract
Modeling, and
Automatic Detection of Vowel Nasalization” |
Think A Move,
Beachwood, OH |
|
M. S. (Thesis)
|
|
Name |
Graduated |
Title of Thesis |
Current Employment |
|
Venkatesh Chari * |
May 1992 |
"Extraction of Formant Frequencies by Adaptive Enhancement
of Fourier Spectra" |
Analog Devices |
|
Michelle Delaney * |
May 1998 |
"An Analysis of the Recognition Errors of a Phonetic Feature
Based Speech Recognizer" |
Speechworks |
|
Ariel Salomon
*
|
Dec 2000 |
"The
Automatic Detection of Manner Landmarks using Simple
Temporal Measures" |
PhD program at MIT |
|
Thorvaldur Einarrson
* |
Dec 2003 |
"Psychoacoustics based gain compensation for low listening
levels" |
|
|
Sandeep Manocha **
|
Jul 2006
|
"Robust
Voice Mining of Telephone Conversations" |
Microsoft |
|
Srikanth Vishnubhotla** |
Jan 2007 |
“Irregular Phonation Detection and Speaker ID” |
PhD program at UMD |
| |
|
M.S. (Projects) |
|
Name |
Duration |
Title of Project |
|
Kenneth Grimes * |
May 1992 |
Formant estimation of vowels using Critical-band Filtering |
|
Jack McLaughlin * |
May 1992 |
Extraction of the glottal waveform using inverse filtering |
|
Tamer Onat * |
May 1992 |
Vowel recognition using neural networks and phonetic
features |
|
Neeraj Deshmukh * |
May 1995 |
A Strategy for Acoustic Modeling to Increase Efficiency of
HG |
|
Deborah Schwartz * |
May 1996 |
Signal Processing Algorithms for Electrolaryngeal Speech
Enhancement |
|
Carla Valera * |
May 1997 |
Common Features of Devoiced Semivowels |
|
Qian Zhang * |
May 1998 |
Recognition of Impoverished Speech |
|
Zach McCaffrey * |
May 1998 |
Replacement of Artificial Voice Excitation Signal with
Natural Excitation Signal using Cepstral Analysis |
|
Kun Ma * |
May 1999 |
Improvement of Alaryngeal Speech through the Automatic
Insertion of Prosodic Information |
|
Pelin Demirel * |
May 1999 |
Improvement of Alaryngeal Speech through the Automatic
Replacement of the Artificial Excitation Signal with a
Normal Excitation Signal |
|
Nandini Srinivasan * |
May 2000 |
Removal of Artificial Larynx Device Resonances through
Inverse Filtering |
|
Kun Xia * |
2000 |
Refinement of Formant Tracker for Automatic Speech
Recognition |
|
Eric Craft * |
2000 |
Automatic Classification of Baby Babble into Broad Classes |
|
Arindam Mandel * |
2000 |
Comparison of Knowledge-based Recognition with Human
Performance Using Impoverished Speech |
|
Bethany Broom * |
2000 |
Combining Different Order LPC Spectra to obtain Reliable
Pole Estimates for Automatic Formant Tracking |
|
Heather Cundiff * |
2000 |
Analysis of Acoustic and Articulatory Data for American
English /r/ |
|
Om Deskmush * |
2001 |
A Direct Measure of Proportion of Periodic and Aperiodic
Energy in Speech Signals |
|
Amit JuneJa * |
2001 |
Acoustic-Phonetic Approach to Speech Recognition Based on
Event Detection and Linear Discriminant Analysis |
|
Tarun Pruthi ** |
2003 |
Automatic Classification of Nasal Consonants |
| |
|
Undergraduate Students in Research Programs |
|
Name |
Duration |
Title of Research Project |
|
Shawn Williams |
Fall 1989 & Spr 1990 |
An Acoustic Study of the Feature Retroflex (BS Thesis) |
|
Charles Robinson |
Fall 1990 & Spr 1991 |
An Acoustic Study of the Influence of /r/ on different F3
trajectories (BS Thesis) |
|
Valerie Padilla |
Spring 1991 |
Detecting linguistic features for use in a speech
recognition system |
|
Vinay Chandra |
Fall 1991 & Spr 1992 |
Automatic Discrimination of Strident and Nonstrident
Fricatives
(Senior Honors Thesis) |
|
Stephanie Zierten |
Fall 1992 & Spr 1993 |
Automatic
Detection of Place of Articulation in Stop Consonants
(Senior Honors Thesis) |
|
Armen Balien |
Fall 1992 & Spr 1993 |
Automatic
Detection of Acoustic Properties that Separate Adjacent
Sounds with the Same Manner of Articulation
(Senior Honors Thesis) |
|
Kazuhito Niimi |
Spr 1994 |
Automatic classification of stop consonants |
|
Shong Yin |
Summer 2002 |
Speaker Recognition Implemented via GMM and Vector
Quantization |
|
Jason Strohmeir |
Summer 2002 |
Multilayer Perceptron Neural Network for Speech Recognition |
|
Jawahar Singh |
Summer & Fall 2003 |
A Graded Method for Determining the Proportion of Periodic/Aperiodic
Energy in Speech Signals |
|
Jalaal Deeb |
Summer 2003 |
Speaker Adaptation in Text-Independent Speaker Verificaton |
|
Paul Young |
Summer 2003 |
Creating
Feature-Based Finite State Automata for Speech Recognition
First
prize in the RITE (Research in Telecommunications
Engineering) Program at the University of Maryland |
|
Shuo Chen |
Summer 2004 |
Acoustic Parameters for Identification of Nasalized Vowels |
|
Thomas Plummer |
Summer 2004 |
The Investigation of Acoustical Features in Text-Independent
Speaker Verification |
|
Qin Zou |
Spring 2004 |
Compensation Algorithms to Minimize the Effect of Noise on
Acoustic Speech Parameters |
|
John Lin |
Fall 2004 & Spr 2005 |
Comparison of the acoustic properties of speech sound
produced in upright vs. supine position |
|
Avinash Yentrapati |
Summer 2005 |
Articulatory synthesis of sustained speech. |
|
Ayana George |
Summer 2005 |
Implementation of a Spectral Mean Subtraction Algorithm for
Speech Enhancement |
|
Sai Hei Yeung |
Summer 2005 |
MRI-based 3D Finite-element Analysis and Modeling of the
Vocal Tract for American English /r/ |
|
Ryan Aminzadeh |
Summer 2005 |
Unsupervised Speaker Segmentation of two-speaker
conversations |
|
Chris Turnes |
Spring 2006 |
2. The dependence of the MPO model on the exact structure of
the filterbank used in implementation (Spring) |
|
Geetika Nagpal |
Fall 2005 |
The dependence of the MPO model on the exact structure of
the filter bank used in implementation |
|
Bilal Raja |
Summer 2006 |
Recognition of Nasalized and Non-Nasalized Vowels
|
|
Kunle Ogunsuyi |
Summer 2006 |
Speaker Recognition and Voice Mining |
|
Timothy Burke |
Fall 2006 |
Replacing STFT filter bank in MPO processing with an
Auditory filter bank |
|
* Boston University ;
** University of Maryland |