Publications

Here is a list of Voiceful technologies' related scientific publications by Voctro Labs cofounders during more than a decade of academic research activities at the MTG-Universitat Pompeu Fabra. The list includes the more relevant publications for each core technology available in Voiceful.

Voice generation

Expressive voice generation based on different techniques including Deep Learning and Sample based synthesis, this technology can generate artificial singing voice with high realism. It can learn a model from existing recordings and generate new speech or singing content.


Blaauw, M. Bonada, J. (2017). A Neural Parametric Singing Synthesizer. In Interspeech 2017, 17th Annual Conference of the International Speech Communication Association (Stockholm, Sweden, August 20-24, 2017).


Bonada, J., Serra, X., Amatriain, X., Loscos, A. (2011) Spectral Processing in DAFX: Digital Audio Effects, Chapter 10, Udo Zolzer (Editor), John Wiley & Sons Publishers, pp. 393- 446, ISBN: 978-0-470-66599-2, published on March 2011.


Bonada, J. (2008). Voice Processing and Synthesis by Performance Sampling and Spectral Models. PhD thesis, Universitat Pompeu Fabra, Department of Information and Communication Technologies, 2008.


Bonada, J., Serra, X. (2007). Synthesis of the Singing Voice by Performance Sampling and Spectral Models. IEEE Signal Processing Magazine, vol. 24, no. 1, pp. 69-79, March 2007, ISSN 1053-5888, Impact factor (JRC): 4,914 Q1 T1 1/246 in Engineering, Electrical & Electronic


Janer, J., Bonada, J., and Blaauw, M. (2006) Performance-driven control for sample-based singing voice synthesis. Proceedings of the Digital Audio Effects Conference (DAFx). Montreal, Vol. 6. 2006



Voice Transformation

Voice manipulation technology that allows changing the characteristics of speech or singing altering the timbre, intonation and other expression characteristics to transform gender, age or personality of the speaker or singer.


Bonada, J. (2008). Wide band harmonic sinusoidal modeling. 11th International Conference on Digital Audio Effects DAFx-08, Espoo, Finland, 2008.

Mayor, O. Bonada, J.  Janer J. (2011). Audio Transformation Technologies Applied to Video Games. 41st AES Conference: Audio for Games, 2011.


Villavicencio, F., & Bonada, J. (2010). Applying Voice Conversion To Concatenative Singing-Voice Synthesis. Interspeech. 2162-2165.


Villavicencio, F., Yamagishi J., Bonada J., & Espic F. (2016). Applying Spectral Normalisation and Efficient Envelope Estimation and Statistical Transformation for the Voice Conversion Challenge 2016. Interspeech. 1657-61.


Mayor, O., Bonada, J., & Janer J. (2009). KaleiVoiceCope: Voice Transformation from Interactive Installations to Video-Games. AES 35th International Conference: Audio for Games.


Mayor, O., Bonada, J., & Janer J. (2010). KaleiVoiceKids: Interactive Real-Time Voice Transformation for Children. The 9th International Conference on Interaction Design and Children.


Monzo, C., Formiga L., Adell J., Mayor O., Bonada, J., Janer J., et al. (2009). Properly Using Speech Synthesis and Voice Transformation for Audiovisual Content Generation. (IBC, Ed.).International Broadcasting Conference (IBC2009).



Voice Description


Voice analysis technology that extracts acoustic and musical information from a voice recording to be later used for visualization, classification, monitoring or singing rating.


Janer, J. (2008). Singing-driven interfaces for Sound Synthesizers. PhD thesis, Universitat Pompeu Fabra, 2008.


Gómez, E. Bonada J. (2013). Towards Computer-Assisted Flamenco Transcription: An Experimental Comparison of Automatic Transcription Algorithms As Applied to A Cappella Singing. Computer Music Journal. 37(2), 73-90, 2013.


Mayor, O. Bonada J. Loscos A. (2009). Performance Analysis and Scoring of the Singing Voice. AES 35th International Conference: Audio for Games, 2009.


Mayor, O., Bonada, J., & Loscos A. (2006). The Singing Tutor Expression Categorization and Segmentation of the Singing Voice. AES 121st Convention, San Francisco, CA, USA, 2006 October 5–8.



Time-scaling and Pitch-shifting


Beyond voice signals, a high-quality time-scaling and pitch-shifting technology to process any audio content like music, field recordings, dialogues, etc.


Bonada, J. (2000). Automatic Technique in Frequency Domain for Near-Lossless Time-Scale Modification of Audio. International Computer Music Conference. 396-399, 2000.


Janer, J., Bonada, J., & Jordà S. (2006). Groovator - an implementation of real-time rhythm transformations. AES 121st Convention, San Francisco, CA, USA, 2006 October 5–8.