PyTorch’s Post

View organization page for PyTorch, graphic

258,278 followers

1mo

Check out our efficient decoding Grouped-Query Attention (GQA) with low-precision KV cache for LLM inference! Read more on the PyTorch blog: https://hubs.la/Q02zSbSc0

To view or add a comment, sign in

More Relevant Posts

Jongsoo Park

Research Scientist at Meta AI Systems Co-design
1mo
Report this post
KV cache becoming too big and demanding too much memory bandwidth can be a challenge to increase context lengths in LLM inference. 4-bit quantization can help, and this blog presents various optimizations for attention kernel to use 4-bit KV cache efficiently.
PyTorch

258,278 followers
1mo

Check out our efficient decoding Grouped-Query Attention (GQA) with low-precision KV cache for LLM inference! Read more on the PyTorch blog: https://hubs.la/Q02zSbSc0
Like Comment
To view or add a comment, sign in
Akshay Pachaar

AI Engineering @LightningAI ⚡️ | BITS Pilani | 3 Patents | 𝕏 (148K+)
5mo
Report this post
Andrej Karpathy is at it again! A new 2 hours tutorial just dropped on how to build the GPT Tokenizer. Tokenizers are a completely separate stage of the LLM pipeline: they have their own training sets, training algorithms (Byte Pair Encoding), and after training implement two fundamental functions: encode() from strings to tokens, and decode() back from tokens to strings. Watch now: https://lnkd.in/dfMccgTJ
Like Comment
To view or add a comment, sign in
Calvin So

Cyber Threat Intelligence Researcher | Threat Research | Threat Hunting | Risk Analysis | Malware Analysis | Playbook Automation
11mo
Report this post
🧠🧠Updates on Mitre T Code Procedure GPT. 🧠🧠 After some tweaking on the prompts, I've managed to have it get technical details out. The end goal is to feed multiple links and have it analyze each separate link and produce the TTPs at a Procedural (detailing how the T code was used). #cyberthreatintelligence #cti #mitreattack #threatintelligence
2 Comments
Like Comment
To view or add a comment, sign in
Hrushikesh Dokala

AI Agents | Maintainer @AutoGen(Microsoft)
7mo
Report this post
Exciting News! 🚀 Just had my pull request #679 merged into MLJARofficial's mljar-supervised library! Resolved Issue #669: Added _classes attribute for k-Nearest Neighbors (KNN) Algorithm. 🛠️ Changes Made: Modified KNNFit class to include _classes attribute, providing unique classes based on the fitted model. Updated KNeighborsAlgorithm and KNeighborsRegressorAlgorithm classes to inherit from KNNFit and ClassifierMixin and RegressorMixin, respectively. Adjusted relevant unit tests to ensure proper functioning of the _classes attribute. 🧪 Testing: Unit tests (test_knn.py) added/modified to check the functionality of the _classes attribute. All tests pass successfully. https://lnkd.in/dbarSsct Grateful for the collaborative effort in the open-source community! 🙌 Let's keep pushing the boundaries of AI and machine learning together! 💡 #OpenSource #MachineLearning #GitHubContributions #MLJARofficial #DataScience #AICommunity 🚀

added _classes for knn classifiers by Hk669 · Pull Request #679 · mljar/mljar-supervised

github.com
Like Comment
To view or add a comment, sign in
Kolank

58 followers
5mo
Report this post
Andrej Karpathy has recently shared a comprehensive two-hour tutorial covering the creation of the GPT Tokenizer. Tokenizers stand as an independent phase within the LLM pipeline, possessing distinct training sets and employing the Byte Pair Encoding training algorithm. Post-training, they execute two essential functions: encode(), facilitating the transformation from strings to tokens, and decode(), enabling the reversal from tokens to strings. Link in comments
1 Comment
Like Comment
To view or add a comment, sign in
Ahmed Moubtahij, ing.

NLP Scientist | ML Engineer | LLMs | RAG | LLM Guardrails
3mo Edited
Report this post
Come see Thierry Jean and I talk about 1) why modular machine learning pipelines are better and 2) Guardrails for LLM systems. On Guardrails for LLM systems: Ever had an automated chat response that made you raise an eyebrow? You are not alone in questioning the reliability of stochastic language models. What if there were guardian agents ensuring only compliant responses reach your users? Enter LLM Guardrails. We'll dive into the critical role of quality control agents within LLM systems. Discover how these guardrails evaluate LLM outputs against stringent criteria before exposing outputs to the user. We'll explore the possibilities of corrective action and escalation when outputs fail to meet standards, and how this not only enhances service quality but also mitigates reputational risk for the provider. Please refer to the MLOPs community link for more details.

MLOps community #7 (in-person | en personne), Thu, May 2, 2024, 5:30 PM | Meetup

meetup.com

4 Comments
Like Comment
To view or add a comment, sign in
Shubham Agarwal

AI | SAAS | Solopreneurship
1w
Report this post
Your LLM application does not always need GPT-4o. It's better to use cost-effective and faster models (e.g. Mistral 8x7B) for some queries. RouteLLM proposes efficient router models that dynamically select between a stronger and weaker LLM during inference to balance cost and response quality. The paper proposes 4 different routing techniques - 1. Similarity-weighted (SW) ranking - performs a "weighted Elo calculation" based on similarity 2. Matrix factorization - learns a scoring function for how well a model can answer a prompt 3. BERT classifier - classifier that predicts which model can provide a better response 4. Causal LLM classifier - also a classifier Here's a sample code using Matrix Factorization with 50% strong-model calls (GPT-4o)
3 Comments
Like Comment
To view or add a comment, sign in
Ezequiel Fernandez

Java Full-Stack Developer | Spring Framework | Thymeleaf | Hibernate | MySQL | JavaScript
4mo
Report this post
Final Project(MyCostLiving)...Edit User endpoint. Had many issues with this code understanding the right way it should work. Many lessons learned: Why using the User from authentication.getPrincipal() and not direct from the DB(security implications), why the User instance obtained from authentication.getPrincipal() is not in sync with the state of the User in the DB and how to address that and many others... A lot of concepts behind just few lines of code. Thanks to Chat GPT 😂 🤣 . Amazing tool if you know what to ask for, even though sometimes the answers could be nonsense things...
2 Comments
Like Comment
To view or add a comment, sign in
Samuel Ajala

Embedded Systems × Machine Learning 🦾 || RAIN-INN LASU Lead 🤖 || Software Developer || Python || Electronics & Computer Engineer(in view)
4mo
Report this post
Day 64/100: Still on the SMS project. Turns out I'm not done with preprocessing. Came back to the project and realized I still have to vectorize my inputs for tensorflow. Lots of errors🥲 #100DaysOfCode #100DaysOfML
Like Comment
To view or add a comment, sign in
Nikita Kazhin

Building bootstrapped businesses with AI & no-code. Co-founder at Smasher | Action plans for your ideas & prompts for your tools
5mo
Report this post
For anyone struggling to get GPT 3.5 Turbo API to consistently output a valid JSON despite a ton of instructions: Use JSON mode by adding: "response_format": { "type": "json_object" }, right after your model. Problem solved for good.
Like Comment
To view or add a comment, sign in

258,278 followers

View Profile Follow

PyTorch’s Post

More Relevant Posts

Explore topics