Publications

2023

  • Rohan Badlani, Akshit Arora, Subhankar Ghosh, Rafael Valle, Kevin J Shih, João Felipe Santos, Boris Ginsburg, and Bryan Catanzaro. Vani: very-lightweight accent-controllable tts for native and non-native speakers with identity preservation. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1–2. IEEE, 2023. [ bib ]
  • Rohan Badlani, Rafael Valle, Kevin J Shih, João Felipe Santos, Siddharth Gururani, and Bryan Catanzaro. Multilingual multiaccented multispeaker tts with radtts. arXiv preprint arXiv:2301.10335, 2023. [ bib ]
  • Rafael Valle, João Felipe Santos, Kevin J Shih, Rohan Badlani, and Bryan Catanzaro. High-acoustic fidelity text to speech synthesis with fine-grained control of speech attributes. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1–5. IEEE, 2023. [ bib ]

2022

  • Kevin J Shih, Rafael Valle, Rohan Badlani, João Felipe Santos, and Bryan Catanzaro. Generative modeling for low dimensional speech attributes with neural spline flows. arXiv preprint arXiv:2203.01786, 2022. [ bib ]

2019

  • Benjamin Cauchi, Kai Siedenburg, Joao F Santos, Tiago H Falk, Simon Doclo, and Stefan Goetze. Non-intrusive speech quality prediction using modulation energies and lstm-network. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 27(7):1151–1163, 2019. [ bib ]
  • Kyle Kastner, João Felipe Santos, Yoshua Bengio, and Aaron Courville. Representation mixing for tts synthesis. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 5906–5910. IEEE, 2019. [ bib ]
  • João Felipe Santos and Tiago H Falk. Towards the development of a non-intrusive objective quality measure for dnn-enhanced speech. In 2019 eleventh international conference on quality of multimedia experience (QoMEX), 1–6. IEEE, 2019. [ bib ]

2018

  • Anderson Avila, Zahid Akhtar Momin, João F. Santos, Douglas O'Shaghnessy, and Tiago H. Falk. Feature pooling of modulation spectrum features for improved speech emotion recognition in the wild. IEEE Transactions on Affective Computing, July 2018. doi:10.1109/TAFFC.2018.2858255. [ bib ]
  • Sebastian Braun, João Felipe Santos, Emanuel Habets, and Tiago H. Falk. Dual-channel modulation energy metric for direct-to-reverberation ratio estimation. In International Conference on Acoustics, Speech, and Signal Processing (ICASSP). April 2018. [ bib ]
  • Stylianos Ioannis Mimilakis, Konstantinos Drossos, João Felipe Santos, Gerald Schuller, Tuomas Virtanen, and Yoshua Bengio. Monaural singing voice separation with skip-filtering connections and recurrent inference of time-frequency mask. In International Conference on Acoustics, Speech, and Signal Processing (ICASSP). 2018. URL: http://arxiv.org/abs/1711.01437, arXiv:1711.01437. [ bib ]
  • João F. Santos and Tiago H. Falk. Speech dereverberation with context-aware recurrent neural networks. IEEE Transactions on Audio, Speech, and Language Processing, July 2018. doi:10.1109/TASLP.2018.2821899. [ bib ]
  • Chiheb Trabelsi, Olexa Bilaniuk, Dmitriy Serdyuk, Sandeep Subramanian, João Felipe Santos, Soroush Mehri, Negar Rostamzadeh, Yoshua Bengio, and Christopher J. Pal. Deep complex networks. In International Conference on Learning Representations (ICLR). April 2018. URL: http://arxiv.org/abs/1705.09792, arXiv:1705.09792. [ bib ]

2017

  • Mohammed Senoussaoui, João F. Santos, and Tiago H. Falk. Speech temporal dynamics fusion approaches for noise-robust reverberation time estimation. In International Conference on Acoustics, Speech, and Signal Processing (ICASSP). March 2017. [ bib ]
  • Jose Sotelo, Soroush Mehri, Kundan Kumar, João F. Santos, Kyle Kastner, Aaron Courville, and Yoshua Bengio. Char2Wav: End-to-end speech synthesis. In International Conference on Learning Representations (Workshop Track). April 2017. [ bib ]

2016

  • Benjamin Cauchi, João F. Santos, Kai Siedenburg, Tiago H. Falk, Patrick Naylor, Simon Doclo, and Stefan Goetze. Predicting the quality of processed speech by combining modulation based features and model trees. In ITG Conference on Speech Communication. 2016. (to appear). [ bib ]
  • João F. Santos, Rachel Bouserhal, Jérémie Voix, and Tiago H. Falk. Objective quality estimation of in-ear microphone speech. In 5th ISCA/DEGA Workshop on Perceptual Quality of Systems. September 2016. [ bib ]
  • João F. Santos and Tiago H. Falk. Blind room acoustics characterization using recurrent neural networks and modulation spectrum dynamics. In AES 60th International Conference. February 2016. [ bib ]
  • Bob L. Sturm, João F. Santos, Oded Ben-Tal, and Iryna Korshunova. Music transcription modelling and composition using deep learning. In 1st Conference on Computer Simulation of Musical Creativity. June 2016. [ bib ]

2015

  • Tiago H. Falk, Vijay Parsa, João F. Santos, Kathryn Arehart, Oldooz Hazrati, Rainer Huber, James Kates, and Susan Scollie. Objective quality and intelligibility prediction for users of assistive listening devices. IEEE Signal Processsing Magazine, March 2015. doi:10.1109/MSP.2014.2358871. [ bib ]
  • João F. Santos. Using convolutional and recurrent layers in deep neural networks for spectral speech enhancement. In Deep Learning Summer School. August 2015. (abstract). [ bib ]
  • João F. Santos, Anderson Avila, Rachel Bouserhal, and Tiago H. Falk. Improving blind reverberation time estimation on a two-microphone portable device by using speech source distance information. In Speech in Noise Workshop. January 2015. (abstract). [ bib ]
  • Mohammed Senoussaoui, João F. Santos, and Tiago H. Falk. SRMR variants for improved blind room acoustics characterization. In ACE Challenge Workshop. October 2015. [ bib ]
  • Bob L. Sturm, João F. Santos, and Iryna Korshunova. Folk music style modelling by recurrent neural networks with long short term memory units. In International Society for Music Information Retrieval (ISMIR) conference. October 2015. (late-breaking demo). [ bib ]

2014

  • João F. Santos and Tiago H. Falk. Updating the SRMR metric for improved intelligibility prediction for cochlear implant users. IEEE Transactions on Audio, Speech, and Language Processing, December 2014. doi:10.1109/TASLP.2014.2363788. [ bib ]
  • João F. Santos, Vijay Parsa, Susan Scollie, and Tiago H. Falk. Evaluation of the ITU-T P.563 standard as an objective enhanced speech quality metric for hearing aid users. In International Hearing Aid Research Conference (IHCON). August 2014. (abstract). [ bib ]
  • João F. Santos, Mohammed Senoussaoui, and Tiago H. Falk. An updated objective intelligibility estimation metric for normal hearing listeners under noise and reverberation. In International Workshop on Acoustic Signal Enhancement (IWAENC), 55–59. September 2014. doi:10.1109/IWAENC.2014.6953337. [ bib ]
  • Mohammed Senoussaoui, Milton Sarria-Paja, João. F. Santos, and Tiago H. Falk. Model fusion for multimodal depression classification and level detection. In 4th Intl. Audio/Visual Emotion Challenge and Workshop. November 2014. [ bib ]

2013

  • Nirit Brosh, João F. Santos, Tiago H. Falk, Lonnie Zwaigenbaum, Susan E. Bryson, Wendy Roberts, Isabel M. Smith, Peter Szatmari, and Jessica A. Brian. Acoustic measurement of prosodic information in toddlers with autism spectrum disorders. In IMFAR. 2013. (abstract). [ bib ]
  • Tiago H. Falk, Stefano Cosentino, João F. Santos, David Suelzle, and Vijay Parsa. Non-intrusive objective speech quality and intelligibility prediction for hearing instruments in complex listening environments. In ICASSP. 2013. [ bib ]
  • João F. Santos, Nirit Brosh, Tiago H. Falk, Lonnie Zwaigenbaum, Susan E. Bryson, Wendy Roberts, Isabel M. Smith, Peter Szatmari, and Jessica A. Brian. Very early detection of autism spectrum disorders based on acoustic analysis of pre-verbal vocalizations of 18-month old toddlers. In ICASSP. 2013. [ bib ]
  • João F. Santos, Stefano Cosentino, Oldooz Hazrati, Philipos C. Loizou, and Tiago H. Falk. Objective speech intelligibility measurement for cochlear implant users in complex listening environments. Speech Communication, 55(7-8):815–824, September 2013. URL: http://www.sciencedirect.com/science/article/pii/S0167639313000435, doi:10.1016/j.specom.2013.04.001. [ bib ]
  • João F. Santos, Nils Peters, and Tiago H. Falk. Towards blind reverberation time estimation for non-speech signals. In 21st International Congress on Acoustics. Montréal, QC, Canada, 2013. [ bib ]

2012

  • João F. Santos, Stefano Cosentino, Oldooz Hazrati, Philipos C. Loizou, and Tiago H. Falk. Performance comparison of intrusive objective speech intelligibility and quality metrics for cochlear implant users. In InterSpeech. 2012. [ bib ]

2011

2007