“I had the pleasure to work with Christopher at Defined.ai for nearly 3 years. Christoper is a dependable, solution-focused professional who works wonders and delivers high-quality results. We have worked together on several projects, and I saw Christopher's commitment and dedication to going above and beyond. As I needed to convert complex AI and ML technologies into concise and clear narratives for broader audiences, Christopher was always available and with a smile to walk me through the technical specificities in a clear and understandable way. On top of his world-class technical knowledge, Christopher has great communication skills and business vision, which makes him a high-value leader in any organization.”
About
Activity
-
After nearly six years with Volocopter, I have made the decision to say goodbye. This journey has been both exciting and educational, providing me…
After nearly six years with Volocopter, I have made the decision to say goodbye. This journey has been both exciting and educational, providing me…
Liked by Christopher Shulby
-
Closing our Texas retreat with some great BBQ and country music - after a week of discussing metrics, AI, past projects, future bets. #Fitbod
Closing our Texas retreat with some great BBQ and country music - after a week of discussing metrics, AI, past projects, future bets. #Fitbod
Liked by Christopher Shulby
-
One important release I seem to have missed: Voyage AI has announced their first state-of-the-art multilinqual embedding model. It brings significant…
One important release I seem to have missed: Voyage AI has announced their first state-of-the-art multilinqual embedding model. It brings significant…
Liked by Christopher Shulby
Experience & Education
Licenses & Certifications
-
-
Portuguse - CELPE-BRAS - Oral and Writing Proficiency
Ministério da Educação
IssuedCredential ID 201401004596 -
-
-
-
-
5 Year Professional License - Multi Age (P-12) - German - Spanish
Ohio Board of Education
Issued Expires
Volunteer Experience
-
-
President
Rotaract
- 2 years 1 month
Disaster and Humanitarian Relief
Founding member in 2005. Elected President for two terms starting in 2007. Grew Club membership from 6 to 79 active members. A decade later, it is one of the most active organizations at Ohio State.
Publications
-
ASR data augmentation in low-resource settings using cross-lingual multi-speaker TTS and cross-lingual voice conversion
proceedings of INTERSPEECH 2023
We explore cross-lingual multi-speaker speech synthesis and cross-lingual voice conversion applied to data augmentation for automatic speech recognition (ASR) systems in low/medium-resource scenarios. Through extensive experiments, we show that our approach permits the application of speech synthesis and voice conversion to improve ASR systems using only one target-language speaker during model training. We also managed to close the gap between ASR models trained with synthesized versus human…
We explore cross-lingual multi-speaker speech synthesis and cross-lingual voice conversion applied to data augmentation for automatic speech recognition (ASR) systems in low/medium-resource scenarios. Through extensive experiments, we show that our approach permits the application of speech synthesis and voice conversion to improve ASR systems using only one target-language speaker during model training. We also managed to close the gap between ASR models trained with synthesized versus human speech compared to other works that use many speakers. Finally, we show that it is possible to obtain promising ASR training results with our data augmentation method using only a single real speaker in a target language.
Other authorsSee publication -
YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone
ICML 2023
YourTTS brings the power of a multilingual approach to the task of zero-shot multi-speaker TTS. Our method builds upon the VITS model and adds several novel modifications for zero-shot multi-speaker and multilingual training. We achieved state-of-the-art (SOTA) results in zero-shot multi-speaker TTS and results comparable to SOTA in zero-shot voice conversion on the VCTK dataset. Additionally, our approach achieves promising results in a target language with a single-speaker dataset, opening…
YourTTS brings the power of a multilingual approach to the task of zero-shot multi-speaker TTS. Our method builds upon the VITS model and adds several novel modifications for zero-shot multi-speaker and multilingual training. We achieved state-of-the-art (SOTA) results in zero-shot multi-speaker TTS and results comparable to SOTA in zero-shot voice conversion on the VCTK dataset. Additionally, our approach achieves promising results in a target language with a single-speaker dataset, opening possibilities for zero-shot multi-speaker TTS and zero-shot voice conversion systems in low-resource languages. Finally, it is possible to fine-tune the YourTTS model with less than 1 minute of speech and achieve state-of-the-art results in voice similarity and with reasonable quality. This is important to allow synthesis for speakers with a very different voice or recording characteristics from those seen during training.
Other authorsSee publication -
SC-GlowTTS: an Efficient Zero-Shot Multi-Speaker Text-To-Speech Model
Proceedings of INTERSPEECH 2021
In this paper, we propose SC-GlowTTS: an efficient zero-shot multi-speaker text-to-speech model that improves similarity for speakers unseen during training. We propose a speaker-conditional architecture that explores a flow-based decoder that works in a zero-shot scenario. As text encoders, we explore a dilated residual convolutional-based encoder, gated convolutional-based encoder, and transformer-based encoder. Additionally, we have shown that adjusting a GAN-based vocoder for the…
In this paper, we propose SC-GlowTTS: an efficient zero-shot multi-speaker text-to-speech model that improves similarity for speakers unseen during training. We propose a speaker-conditional architecture that explores a flow-based decoder that works in a zero-shot scenario. As text encoders, we explore a dilated residual convolutional-based encoder, gated convolutional-based encoder, and transformer-based encoder. Additionally, we have shown that adjusting a GAN-based vocoder for the spectrograms predicted by the TTS model on the training dataset can significantly improve the similarity and speech quality for new speakers. Our model converges using only 11 speakers, reaching state-of-the-art results for similarity with new speakers, as well as high speech quality.
Other authors -
-
The Pros and Cons of In-house Speech Recognition
Definedcrowd
-
Theoretical Learning Guarantees Applied to Acoustic Modeling
Springer Journal of the Brazilian Computer Society
-
Acoustic Modeling Using a Shallow CNN-HTSVM Architecture
BRACIS 2017
A shallow CNN-HTSVM architecture specifically built for training state-of-the-art ASR systems inspired by deep-learning techniques and powerful, even for small datasets and low-resource environments.
-
Evaluating Word Embeddings for Sentence Boundary Detection in Speech Transcripts
STIL 2017
Evaluating word embeddings by opening the analysis to better understand the state-of-the-art results presented in a previous paper at EACL.
-
Portuguese Word Embeddings: Evaluating on Word Analogies and Natural Language Tasks
STIL 2017
A Nearly exhaustive evaluation of induction types and dimension for word embeddings in Portuguese with NLP tasks.
-
Sentence Segmentation in Narrative Transcripts from Neuropsycological Tests using Recurrent Convolutional Neural Networks
EACL 2017
A Recurrent convolutional neural network with prosodic, Part of Speech, and word embeddings are used as features to identify sentence segmentation in disfluent and impaired speech.
-
Automatic Rule-based Algorithms for Automatic Pronunciation of Portuguese Verbal Inflections
PROPOR 2014
A proof that regularity can be constructed from irregular patterns for Portuguese verbs given only its infinitive form, serving as an enhancement for modern grapheme-to-phoneme converters.
-
A Method for the Extraction of Phonetically-Rich Triphone Sentences
ITS 2014
A method for building phonetically rich corpora representative of the target language from which they were polled.
-
Automatic Disambiguation of Homographic Heterophone Pairs Containing Open and Closed Mid Vowels
STIL 2013
A method is proposed in order to correctly disambiguate the large majority of Homographic-Heterophone pairs in Portuguese, an issue plaguing current speech synthesis systems.
-
Prompts, Uptake, Modified Output, and Repair for L2 Learners with Foreign Language Classroom Anxiety
Horizons of Applied Linguistics
This paper investigates the effects which prompts and recasts have on learners with different levels of classroom anxiety.
-
The MerkMal Project: Automated Part of Speech Tagging System for Interactive Online Learning
DMSW
Presentation of the implementation of the Merk Mal project.
Patents
-
A Method for Phoneme Recognition with Little Data
Filed BR BR 10 2019 016386-0 A2
The present patent of invention is related to the Artificial Intelligence – NLP, field of technology. More specifically it describes a way to do feature extraction using a deep-learning method, in this case a convolutional neural network, to extract features from small databases by creating architectures which maximize the cost/benefit between the largest filters and smallest number of neurons in order to represent the largest avatar possible with the smallest amount of parameters, allowing the…
The present patent of invention is related to the Artificial Intelligence – NLP, field of technology. More specifically it describes a way to do feature extraction using a deep-learning method, in this case a convolutional neural network, to extract features from small databases by creating architectures which maximize the cost/benefit between the largest filters and smallest number of neurons in order to represent the largest avatar possible with the smallest amount of parameters, allowing the network to better generalize even with few examples, together with a knowledge-driven classifier and achieves nearly state-of-the-art phoneme recognition results with absolutely no pretraining or external weight initialization. It also beats the best replication study of the state of the art with a 28% frame error rate.
Courses
-
Statistical Learning Theory
-
Projects
Honors & Awards
-
Health Hackathon 2019
School of AI/Accenture
1st place for Brazil out of 8 teams and third place globally from 23 countries.
-
Second Place Best Paper
STIL 2017
Evaluating Word Embeddings for Sentence Boundary Detection in Speech Transcripts
Languages
-
German
Native or bilingual proficiency
-
English
Native or bilingual proficiency
-
Portuguese
Full professional proficiency
-
Spanish
Full professional proficiency
-
Swedish
Limited working proficiency
-
Russian
Limited working proficiency
-
Gothic
Limited working proficiency
-
Latin
Limited working proficiency
-
Yiddish
Elementary proficiency
-
French
Elementary proficiency
Recommendations received
7 people have recommended Christopher
Join now to viewMore activity by Christopher
-
You know you married the one when you get home and they've covered your kitchen in post its with positive feedback for UNPARSED !
You know you married the one when you get home and they've covered your kitchen in post its with positive feedback for UNPARSED !
Liked by Christopher Shulby
Other similar profiles
Explore collaborative articles
We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
Explore More