Zac Messinger’s Post

Director, Customer Success Engineering at Quantum Metric

3mo

📚 New Blog: "Into to NLP with TF-IDF Vectorization" In this post, I perform a deep dive into one of the OG natural language processing (NLP) techniques of the last 50 years with TF-IDF vectorization! If your at all curious about the origins search indexing, this is a great method to learn about. Up until 2015, ~83% of text based recommender systems used in digital libraries still relied on TF-IDF as their tool of choice (according to Wikipedia)! 🔍 Topics Covered in this post: - Calculating TF-IDF from Scratch - One-Hot Encoding - Cosine Similarity - Search Term Relevance Ranking with TF-IDF This was an extremely fun post to write, as it gave me a much stronger understanding on the fundamentals of search indexing and cosine similarity in vector space, which are fundamental to RAG applications. Check it out below, & let me know your thoughts & feedback in the comments! https://lnkd.in/gj4ctJ9w #ArtificialIntelligence #MachineLearning

Into to NLP with TF-IDF Vectorization

zacmessinger.com

1 Comment

Francis Cordón

Passionate about fulfilling the promise of Continuous Application Reliability. Placing human empathy at the center. Key contributor to three successful SaaS exits

3mo

Zac Messinger This is phenomenal 🙌🏽! Thanks for sharing.

To view or add a comment, sign in

More Relevant Posts

Kemal Gunay

Computational Social Scientist | Postdoc Researcher
8mo Edited
Report this post
Advanced NLP spaCy spaCy is an open-source natural language processing (NLP) library designed for efficient and scalable processing of text. It is developed by Explosion AI and is written in Python. spaCy is built to be fast and production-ready, making it a popular choice for various NLP tasks, including tokenization, part-of-speech tagging, named entity recognition, and dependency parsing. You can find all my advanced NLP spaCy notes in notebook format. CHAPTERS Chapter 1: Finding words, phrases, names and concepts This chapter will introduce you to the basics of text processing with spaCy. You'll learn about the data structures, how to work with trained pipelines, and how to use them to predict linguistic features in your text. Chapter 2: Large-scale data analysis with spaCy In this chapter, you'll use your new skills to extract specific information from large volumes of text. You'll learn how to make the most of spaCy's data structures, and how to effectively combine statistical and rule-based approaches for text analysis. Chapter 3: Processing Pipelines This chapter will show you everything you need to know about spaCy's processing pipeline. You'll learn what goes on under the hood when you process a text, how to write your own components and add them to the pipeline, and how to use custom attributes to add your own metadata to the documents, spans and tokens. Chapter 4: Training a neural network model In this chapter, you'll learn how to update spaCy's statistical models to customize them for your use case – for example, to predict a new entity type in online comments. You'll train your own model from scratch, and understand the basics of how training works, along with tips and tricks that can make your custom NLP projects more successful. Link: https://lnkd.in/dBfwefkn #spaCy #NLP #naturallanguageprocessing
4 Comments
Like Comment
To view or add a comment, sign in
Lucas de Carvalho Scabora

Senior Data Scientist | Data Engineer at IBM
3mo Edited
Report this post
Based on the Always Learning Philosophy, I'm thrilled to share that I completed the course "Natural Language Processing (NLP)" from 🤗 Hugging Face: - https://lnkd.in/dH25HHbW This course is excellent and very objective, mainly covering: 💡 Deep dive into the Transformers library to manage models, tokenizers, among others. 💡 Encoder-only, Decoder-only, and Sequence-to-sequence models and usage scenarios. 💡 Fine-tuning a pre-trained model (using Trainer API) and speeding it up using the Accelerate library. 💡 The course also provided awesome examples of preparing a dataset and training process for a wide range of common NLP tasks such as token classification (NER and POS), summarization and translation, and question answering. #datascience #LLMs #continuouslearning #huggingface

Introduction - Hugging Face NLP Course

huggingface.co
Like Comment
To view or add a comment, sign in
Daniel Peters

Sales @ Neo4j | Knowledge Graphs, AI, LLMs
1mo
Report this post
Harness the true power of Natural Language Processing (NLP) with our new starter kit! Experience the unmatched power and resilience of LlamaIndex and #Neo4j in action. Get ready to build high-performance NLP applications that can swiftly store and retrieve information from documents. Explore its key features using the link below and start your adventure today! #NLP https://bit.ly/4cii80E Unveil the power of NLP with LlamaIndex and Neo4j, creating reliable solutions that understand and process natural language efficiently.

Unleashing the Power of NLP with LlamaIndex and Neo4j

neo4j.com
Like Comment
To view or add a comment, sign in
Kesavan Nair (Kay)

VP - Global Cloud Field Operations @ Neo4j | Cloud SaaS Revenue Executive
1mo Edited
Report this post
🔓 Unlock the potential of natural language processing (NLP) with our new starter kit that seamlessly combines the power of the LlamaIndex library with the robustness of the Neo4j graph database. Build NLP applications that store and retrieve information from documents with lightning-fast performance! Take a look at the key features and get started today: https://bit.ly/4cii80E #NPL #GenAI #Graphdatabase #Neo4j

Unleashing the Power of NLP with LlamaIndex and Neo4j

neo4j.com
Like Comment
To view or add a comment, sign in
Web3 Research Ltd.

28 followers
8mo
Report this post
Transfer learning with pre-trained language models has become essential for NLP. In our latest article, we provide a comprehensive guide to ULMFiT - the pioneering technique that sparked this revolution. We explain how ULMFiT leverages pre-trained LSTMs and fine-tuning innovations to achieve state-of-the-art results on text classification, sentiment analysis, and more. Key highlights: - The advantages of pre-training large language models on unlabeled text - A step-by-step breakdown of the ULMFiT transfer learning process - Code examples for implementing ULMFiT from scratch - ULMFiT model architecture and training techniques - Applications and impact of ULMFiT on NLP Whether you want to master transfer learning for NLP or just grasp the fundamentals, this article will give you a solid understanding of ULMFiT and how to apply it. Check this article out! https://lnkd.in/dEvMnxUi #machinelearning #nlp #computervision #ai

Mastering ULMFiT: Harnessing Transfer Learning for Cutting-Edge NLP

medium.com
Like Comment
To view or add a comment, sign in
Miguel Garcia De Haro

Director Comercial Latino America
1mo
Report this post
Harness the true power of Natural Language Processing (NLP) with our new starter kit! Experience the unmatched power and resilience of LlamaIndex and #Neo4j in action. Get ready to build high-performance NLP applications that can swiftly store and retrieve information from documents. Explore its key features using the link below and start your adventure today! #NLP https://bit.ly/4cii80E Unveil the power of NLP with LlamaIndex and Neo4j, creating reliable solutions that understand and process natural language efficiently.

Unleashing the Power of NLP with LlamaIndex and Neo4j

neo4j.com
Like Comment
To view or add a comment, sign in
Sanjith Kumar

Deep Learning Intern @Phosphene.ai | MLOPS Enthusiast | System Design | AI enthusiast | GDSC Cloud Lead | Data Structures | GDSC AI/ML Lead | Japanese Language Instructor | Medium Blogger | Multilingual
10mo
Report this post
Finally! 📯 Word2vec is not a singular algorithm, rather, it is a family of model architectures and optimizations that can be used to learn word embeddings from large datasets. Embeddings learned through word2vec have proven to be successful on a variety of downstream natural language processing tasks. 🎊 The third part of my NLP series is out on medium✨... I have given a very deep and comprehensive explanation with the implementation of Word2Vec using skip-gram algorithm from scratch... 👨🎓. For more information you can read this 👇 https://lnkd.in/gHGe_q5P #ai #nlp #textpreprocessing #deeplearning #word2vec #blog #mediumarticle #medium

Text Preprocessing for NLP Part — 3

medium.com

4 Comments
Like Comment
To view or add a comment, sign in
Michael Zakhary

Machine Learning Engineer @ Botit. Ex QA Intern @ Procter & Gamble | Data Scientist | I help companies use Big data, and machine learning to reach operational excellence
7mo
Report this post
After 4 months of hard work and dedication, I am proud to announce that I have completed the NLP specialization by DeepLearning.AI on Coursera. Under the guidance of talented instructors Younes Bensouda Mourri and Lukasz Kaiser, I delved deep into the world of Natural Language Processing (NLP). I was able to gain both deep theoretical insights and broad practical experience about many NLP alogrithms, models, and applications. 📚 Highlights of My Learning Journey and my Ultimate Takeaway: 1. Sentiment Analysis: - Explored traditional techniques like logistic regression and Naive Bayes. - Constructed deep neural networks from scratch for sentiment analysis. 2. Word Representations: - Gained insights into vector space representations of words. 3. Machine Translation: - Approached the challenge from different perspectives, from neural nets to attention-based transformer models. 4. Part of Speech Tagging: - Implemented using Markov chains and the Viterbi algorithm. 5. N-gram Models: - Developed probabilistic N-gram models and deep N-gram models using GRUs for auto-complete. 6. Auto-Correction: - Implemented auto-correct using dynamic programming for minimum edit distance calculation. 7. Word Embeddings: - Explored the core of semantic representation and trained a CBOW model from scratch. 8. Named Entity Recognition (NER) & Siamese Networks: - Leveraged LSTMs for NER and created siamese networks for detecting question duplicates. 9. Attention Models: - Deep dive into multi-headed attention and text summarization using TRAX. 10. BERT & T5 Architecture: - Studied and fine-tuned BERT on the Hugging Face platform. 11. Reformer Model: - Explored using the reformer model and Locality Sensitive Hashing for faster and more effecient attention calculation and chatbot training on large datasets without the exponential growth of memory usage that plagues traditional transformer models ✨ The Ultimate Takeaway: Every piece of knowledge gained is invaluable, but the greatest achievement is proving to myself my commitment to continuous improvement. I proved to myself I can hold myself accountable to get things done without the external enforcement of deadlines and exams. I think I have gathered a little more proof that I am the person I aspire to be, I can wake up everyday now and look myself in the mirror and say, I am just a bit closer to that person my younger self wanted to be.

16 Comments
Like Comment
To view or add a comment, sign in
Zac Messinger

Director, Customer Success Engineering at Quantum Metric
2mo
Report this post
📚 New Blog: "Intro to Word Embeddings with Word2Vec" Over the past year and a half, chances are strong that you've probably heard of the term "word embedding" before as AI has spread like wildfire throughout our professional & personal lives. But do you know what they are and why they are useful? Even more importantly, do you know how they work? If you answered "no" to any of those questions, then hopefully this post helps get you caught up to speed. In this post, I go underneath the hood to demystify one of the core building blocks of Natural Language Processing (NLP) by deconstructing Word2Vec, one of the original word embedding deep learning models published 10 years ago! 🔍 Topics Covered in this post: - What is a Word Embedding? - Understanding & Implementing Word2Vec - Defining the Word2Vec Architecture - Setting up a Training Function - Training the Word2Vec Model & Validating Results This was a satisfying memorial day post to write. I was surprised how simple, yet elegant the implementation of such a powerful NLP tool could be! Check it out below, & let me know your thoughts & feedback in the comments! #ArtificialIntelligence #MachineLearning https://lnkd.in/gBRNKd3a

Intro to Word Embeddings with Word2Vec

zacmessinger.com

9 Comments
Like Comment
To view or add a comment, sign in
Vidya .

Aspiring Data Scientist | Transforming Raw Data into Actionable Insights | Python | Machine Learning | Data Visualization
6mo Edited
Report this post
Just wrapped up an insightful journey through the evolution of Natural Language Processing (NLP) from the 60s to the present in my latest blog on Medium. 📖 From rule-based systems to the cutting-edge models we have today, the progress is truly remarkable! I'd like to express my gratitude to CampusX for the incredible inspiration and knowledge shared in their enlightening YouTube video on NLP that sparked this exploration. 🙏 Their dedication to making complex topics accessible is truly commendable. Check out my blog here: https://lnkd.in/dgRGWkcc #NLP #LanguageProcessing #TechnologyEvolution #Gratitude #DataScience #MachineLearning

The Journey of NLP from 60s to present

medium.com
Like Comment
To view or add a comment, sign in

1,153 followers

View Profile Follow

Zac Messinger’s Post

Into to NLP with TF-IDF Vectorization

zacmessinger.com

More from this author

How to take the next step in your career

Explore topics