Christopher Shulby

Cuyahoga Falls, Ohio, United States Contact Info
4K followers 500+ connections

Join to view profile

About

🚀 Passionate Machine Learning Leader | NLP Researcher | Multi-Cultural Team Builder…

Activity

Join now to see all activity

Experience & Education

  • Kin AI

View Christopher’s full experience

See their title, tenure and more.

or

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

Licenses & Certifications

  • Intermediate Software Programmer in C/C++ Graphic

    Intermediate Software Programmer in C/C++

    Samsung Electronics

    Issued
  • Portuguse - CELPE-BRAS - Oral and Writing Proficiency  Graphic

    Portuguse - CELPE-BRAS - Oral and Writing Proficiency

    Ministério da Educação

    Issued
    Credential ID 201401004596
  • Spanish - ACTFL Oral Proficiency Graphic

    Spanish - ACTFL Oral Proficiency

    ACTFL

    Issued
  • Spanish - ACTFL Writing Proficiency Graphic

    Spanish - ACTFL Writing Proficiency

    ACTFL

    Issued
  • German - ACTFL Oral Proficiency Graphic

    German - ACTFL Oral Proficiency

    ACTFL

    Issued
  • German - ACTFL Writing Proficiency Graphic

    German - ACTFL Writing Proficiency

    ACTFL

    Issued
  • 5 Year Professional License - Multi Age (P-12) - German - Spanish

    Ohio Board of Education

    Issued Expires

Volunteer Experience

  • Rotary International Graphic

    Rotarian

    Rotary International

    - Present 15 years

    Disaster and Humanitarian Relief

  • Rotaract Graphic

    President

    Rotaract

    - 2 years 1 month

    Disaster and Humanitarian Relief

    Founding member in 2005. Elected President for two terms starting in 2007. Grew Club membership from 6 to 79 active members. A decade later, it is one of the most active organizations at Ohio State.

Publications

  • ASR data augmentation in low-resource settings using cross-lingual multi-speaker TTS and cross-lingual voice conversion

    proceedings of INTERSPEECH 2023

    We explore cross-lingual multi-speaker speech synthesis and cross-lingual voice conversion applied to data augmentation for automatic speech recognition (ASR) systems in low/medium-resource scenarios. Through extensive experiments, we show that our approach permits the application of speech synthesis and voice conversion to improve ASR systems using only one target-language speaker during model training. We also managed to close the gap between ASR models trained with synthesized versus human…

    We explore cross-lingual multi-speaker speech synthesis and cross-lingual voice conversion applied to data augmentation for automatic speech recognition (ASR) systems in low/medium-resource scenarios. Through extensive experiments, we show that our approach permits the application of speech synthesis and voice conversion to improve ASR systems using only one target-language speaker during model training. We also managed to close the gap between ASR models trained with synthesized versus human speech compared to other works that use many speakers. Finally, we show that it is possible to obtain promising ASR training results with our data augmentation method using only a single real speaker in a target language.

    Other authors
    See publication
  • YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone

    ICML 2023

    YourTTS brings the power of a multilingual approach to the task of zero-shot multi-speaker TTS. Our method builds upon the VITS model and adds several novel modifications for zero-shot multi-speaker and multilingual training. We achieved state-of-the-art (SOTA) results in zero-shot multi-speaker TTS and results comparable to SOTA in zero-shot voice conversion on the VCTK dataset. Additionally, our approach achieves promising results in a target language with a single-speaker dataset, opening…

    YourTTS brings the power of a multilingual approach to the task of zero-shot multi-speaker TTS. Our method builds upon the VITS model and adds several novel modifications for zero-shot multi-speaker and multilingual training. We achieved state-of-the-art (SOTA) results in zero-shot multi-speaker TTS and results comparable to SOTA in zero-shot voice conversion on the VCTK dataset. Additionally, our approach achieves promising results in a target language with a single-speaker dataset, opening possibilities for zero-shot multi-speaker TTS and zero-shot voice conversion systems in low-resource languages. Finally, it is possible to fine-tune the YourTTS model with less than 1 minute of speech and achieve state-of-the-art results in voice similarity and with reasonable quality. This is important to allow synthesis for speakers with a very different voice or recording characteristics from those seen during training.

    Other authors
    See publication
  • SC-GlowTTS: an Efficient Zero-Shot Multi-Speaker Text-To-Speech Model

    Proceedings of INTERSPEECH 2021

    In this paper, we propose SC-GlowTTS: an efficient zero-shot multi-speaker text-to-speech model that improves similarity for speakers unseen during training. We propose a speaker-conditional architecture that explores a flow-based decoder that works in a zero-shot scenario. As text encoders, we explore a dilated residual convolutional-based encoder, gated convolutional-based encoder, and transformer-based encoder. Additionally, we have shown that adjusting a GAN-based vocoder for the…

    In this paper, we propose SC-GlowTTS: an efficient zero-shot multi-speaker text-to-speech model that improves similarity for speakers unseen during training. We propose a speaker-conditional architecture that explores a flow-based decoder that works in a zero-shot scenario. As text encoders, we explore a dilated residual convolutional-based encoder, gated convolutional-based encoder, and transformer-based encoder. Additionally, we have shown that adjusting a GAN-based vocoder for the spectrograms predicted by the TTS model on the training dataset can significantly improve the similarity and speech quality for new speakers. Our model converges using only 11 speakers, reaching state-of-the-art results for similarity with new speakers, as well as high speech quality.

    Other authors
    • Christopher Shulby
    See publication
  • The Pros and Cons of In-house Speech Recognition

    Definedcrowd

    Definedcrowd White Paper

    Other authors
    See publication
  • Theoretical Learning Guarantees Applied to Acoustic Modeling

    Springer Journal of the Brazilian Computer Society

  • Acoustic Modeling Using a Shallow CNN-HTSVM Architecture

    BRACIS 2017

    A shallow CNN-HTSVM architecture specifically built for training state-of-the-art ASR systems inspired by deep-learning techniques and powerful, even for small datasets and low-resource environments.

    See publication
  • Evaluating Word Embeddings for Sentence Boundary Detection in Speech Transcripts

    STIL 2017

    Evaluating word embeddings by opening the analysis to better understand the state-of-the-art results presented in a previous paper at EACL.

    See publication
  • Portuguese Word Embeddings: Evaluating on Word Analogies and Natural Language Tasks

    STIL 2017

    A Nearly exhaustive evaluation of induction types and dimension for word embeddings in Portuguese with NLP tasks.

    See publication
  • Sentence Segmentation in Narrative Transcripts from Neuropsycological Tests using Recurrent Convolutional Neural Networks

    EACL 2017

    A Recurrent convolutional neural network with prosodic, Part of Speech, and word embeddings are used as features to identify sentence segmentation in disfluent and impaired speech.

    See publication
  • Automatic Rule-based Algorithms for Automatic Pronunciation of Portuguese Verbal Inflections

    PROPOR 2014

    A proof that regularity can be constructed from irregular patterns for Portuguese verbs given only its infinitive form, serving as an enhancement for modern grapheme-to-phoneme converters.

    See publication
  • A Method for the Extraction of Phonetically-Rich Triphone Sentences

    ITS 2014

    A method for building phonetically rich corpora representative of the target language from which they were polled.

    See publication
  • Automatic Disambiguation of Homographic Heterophone Pairs Containing Open and Closed Mid Vowels

    STIL 2013

    A method is proposed in order to correctly disambiguate the large majority of Homographic-Heterophone pairs in Portuguese, an issue plaguing current speech synthesis systems.

    See publication
  • Prompts, Uptake, Modified Output, and Repair for L2 Learners with Foreign Language Classroom Anxiety

    Horizons of Applied Linguistics

    This paper investigates the effects which prompts and recasts have on learners with different levels of classroom anxiety.

    See publication
  • The MerkMal Project: Automated Part of Speech Tagging System for Interactive Online Learning

    DMSW

    Presentation of the implementation of the Merk Mal project.

Patents

  • A Method for Phoneme Recognition with Little Data

    Filed BR BR 10 2019 016386-0 A2

    The present patent of invention is related to the Artificial Intelligence – NLP, field of technology. More specifically it describes a way to do feature extraction using a deep-learning method, in this case a convolutional neural network, to extract features from small databases by creating architectures which maximize the cost/benefit between the largest filters and smallest number of neurons in order to represent the largest avatar possible with the smallest amount of parameters, allowing the…

    The present patent of invention is related to the Artificial Intelligence – NLP, field of technology. More specifically it describes a way to do feature extraction using a deep-learning method, in this case a convolutional neural network, to extract features from small databases by creating architectures which maximize the cost/benefit between the largest filters and smallest number of neurons in order to represent the largest avatar possible with the smallest amount of parameters, allowing the network to better generalize even with few examples, together with a knowledge-driven classifier and achieves nearly state-of-the-art phoneme recognition results with absolutely no pretraining or external weight initialization. It also beats the best replication study of the state of the art with a 28% frame error rate.

    See patent

Courses

  • Statistical Learning Theory

    -

Projects

Honors & Awards

  • Health Hackathon 2019

    School of AI/Accenture

    1st place for Brazil out of 8 teams and third place globally from 23 countries.

  • Second Place Best Paper

    STIL 2017

    Evaluating Word Embeddings for Sentence Boundary Detection in Speech Transcripts

Languages

  • German

    Native or bilingual proficiency

  • English

    Native or bilingual proficiency

  • Portuguese

    Full professional proficiency

  • Spanish

    Full professional proficiency

  • Swedish

    Limited working proficiency

  • Russian

    Limited working proficiency

  • Gothic

    Limited working proficiency

  • Latin

    Limited working proficiency

  • Yiddish

    Elementary proficiency

  • French

    Elementary proficiency

Recommendations received

7 people have recommended Christopher

Join now to view

More activity by Christopher

View Christopher’s full profile

  • See who you know in common
  • Get introduced
  • Contact Christopher directly
Join to view full profile

Other similar profiles

Explore collaborative articles

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Explore More

Add new skills with these courses