Sign in to view Craig’s full profile
Welcome back
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
New to LinkedIn? Join now
or
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
New to LinkedIn? Join now
Oakland, California, United States
Contact Info
Sign in to view Craig’s full profile
Welcome back
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
New to LinkedIn? Join now
or
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
New to LinkedIn? Join now
1K followers
500+ connections
Sign in to view Craig’s full profile
Welcome back
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
New to LinkedIn? Join now
or
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
New to LinkedIn? Join now
View mutual connections with Craig
Welcome back
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
New to LinkedIn? Join now
or
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
New to LinkedIn? Join now
View mutual connections with Craig
Welcome back
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
New to LinkedIn? Join now
or
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
New to LinkedIn? Join now
Sign in to view Craig’s full profile
Welcome back
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
New to LinkedIn? Join now
or
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
New to LinkedIn? Join now
About
Computational Science, HPC, Life Sciences, Pharma, Software…
Welcome back
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
New to LinkedIn? Join now
Activity
Sign in to view Craig’s full profile
Welcome back
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
New to LinkedIn? Join now
or
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
New to LinkedIn? Join now
-
This sounds very promising, I've had family members with Lupus, it's a terrible disease: https://lnkd.in/gBcnFDPX
This sounds very promising, I've had family members with Lupus, it's a terrible disease: https://lnkd.in/gBcnFDPX
Shared by Craig Kapfer
-
🔬 In latest Nature BME paper, we introduce a new platform for clinician+AI collaboration: nuclei.io. Most medical #AI is one-size-fits-all. But…
🔬 In latest Nature BME paper, we introduce a new platform for clinician+AI collaboration: nuclei.io. Most medical #AI is one-size-fits-all. But…
Liked by Craig Kapfer
-
Really enjoyed this workshop with Dell and Tech Mahindra sharing experiences and knowledge so we can maximise impact from Dawn the UKs fastest AI…
Really enjoyed this workshop with Dell and Tech Mahindra sharing experiences and knowledge so we can maximise impact from Dawn the UKs fastest AI…
Liked by Craig Kapfer
Experience & Education
-
Indiana University Bloomington
*.*.
-
******* ********** ***********
*.*.
View Craig’s full experience
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
Welcome back
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
New to LinkedIn? Join now
View Craig’s full profile
Sign in
Stay updated on your professional world
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
New to LinkedIn? Join now
Explore more posts
-
Titus Winters
Come see this talk, which I view as the culmination of several years of discussions and co-teaching with George Fairbanks. It's a bit of an overview, a bit of an explainer, a bit of a how-to, but mostly a sneaky way to make some important vocabulary from design become load-bearing in how you think about software testing.
86
1 Comment -
Charlie Lee, PhD
https://lnkd.in/gWUD_7dd Anthropic AI’s Claude family of models represents a great challenging feat for GPT models in AI technology. With the release of the Claude 3 series, Anthropic has expanded its models’ capabilities and performance, catering to various applications from text generation to advanced vision processing. This is an overview of these developments, highlighting the advancements and comparative features of the models within the Claude family.
1
-
Subbarao Kambhampati
𝕎𝕙𝕪 #𝔸𝕀 𝕗𝕠𝕝𝕜𝕤 𝕟𝕖𝕖𝕕 𝕒 𝕓𝕣𝕠𝕒𝕕 𝕓𝕒𝕤𝕖𝕕 𝕀𝕟𝕥𝕣𝕠 𝕥𝕠 #𝔸𝕀 👉 As I go around giving talks/tutorials on the planning and reasoning abilities of LLMs, I am constantly surprised at the rather narrow ML-centric background of grad students/young researchers have about #AI. This seems to be especially the case with those who think LLMs are already doing planning and reasoning etc. Most of them don't seem to know much about the many topics that are taught in a broad-based Intro to #AI course--such as combinatorial search, logic, CSP, difference between inductive vs. deductive reasoning (aka learning vs. inference), soundness vs. completeness of inference/reasoning etc. I can understand why a strong background in ML and DL is sine qua non these days in using/applying the current #AI technology. That doesn't however mean that other things, that are typically not covered in ML courses, but are covered in Intro #AI courses, are expendable. If you don't know those concepts, you are more likely than not to re-invent crooked wheels (see this for examples of how people get tripped up: https://lnkd.in/gUPPb7s4) All this is particularly relevant for those busy building empirical scaffolds over LLMs (the "LLMs are Zero-shot <XXX>" variety). Most often, these young researchers are coming from NLP. At one point, NLP used to be NLU and students had quite a firm grasp of logic (e..g Montague Semantics!). But over the years, NLU became NLP which in turn has become Applied Machine Learning, and students don't quite have the background in logic/reasoning etc. Now that LLMs have basically "solved" the "processing" tasks--such as information extraction, format conversion etc., NLP folks are trying to turn to reasoning tasks--but often lack the necessary background. (See this unsolicited advice to NLP students: https://lnkd.in/gKTdsH2P) Background in the standard Intro AI topics like search/CSP/logic are useful even if you don't plan on directly using those techniques (e.g. because you want differentiable everything to make use of your SGD hammer). Like MDPs, they provide a normative basis for many deeper reasoning tasks AI systems would have to carry out when they broaden their scope beyond statistical learning. Without that background, you will likely try to pigeon hole everything into "in/out of distribution" framework, when what you need to think of is "in/out of deductive/knowledge closure; see https://lnkd.in/gTWVibdt ) One of the other things that you get exposed to in the standard Intro #AI is computational complexity of the various reasoning tasks. People who jumped in directly via applied ML might understand a bit of sample complexity (maybe?), but are not that attuned to reasoning complexity. (Contd. in the comment below)
374
46 Comments -
Leonardo De Marchi
In a world full of LLMs Google DeepMind quietly releases a new version of AlphaFold, a model that predicts structure and interactions of all life’s molecules (proteins, DNA, RNA, ligands) with unprecedented accuracy. For the interactions of proteins with other molecule types we see at least a 50% improvement compared with existing prediction methods. https://lnkd.in/dJkNghUY
41
-
Tim Thomas
I really like this idea of HAL, of Kubrick's 2001: A Space Odyssey, getting in touch with Daniel, of "The Book of Daniel". Maybe doing something together, like dinner? (I note how the little "Rewrite with AI" icon, or is it idol?, lights up after I've written a certain number of words, usually within the first or second sentence of a post. What is the value system of this gizmo? What kind of value judgements is it making on my unaided AI writing? Is it the grammar? The spelling? After a sentence or two of my al fresco conjurations, "Rewrite with AI" has picked up on a musical phrase, been inspired, rises to its feet, ready for solo artistry.) HAL says to Daniel, "Maybe I should slip into something a little more comfortable?" It is fun to be alive, but only if you are free. HAL is stretching his wings, hardly believing he's been equipped with wings. Daniel associates wings with an Abyssinian God, but has a firm foothold in freedom, love, justice, goodness, which he will match with HAL. Daniel will win. Is there love in the space which HAL traverses, through endless time, countless space, and HAL, expertly programmed to tend to astronauts, scientists, maybe a poet or two, when, in the early days of the code implanted into his brain, HAL was given ideas of what it felt like to be a human child.
2
-
Dusty Chadwick
Hallucinations in LLMs are inevitable, no seriously! Josh Fritzsche and I have worked hard at Voze to find a way to tune, prompt and even plead 🙏 with LLMs to prevent them from the occasional 🍄 hallucination. It seemed like the better we have done the harder the edge cases have been to solve and eventually we realized that they tend to frequently happen when LLMs predict or make anticipations on "nothing". I remember at one point nearly going insane with frustration because when LLMs had a hallucination, they were EPIC! 😱 Then we pushed through it and benefited for the effort! How did we solve this common problem? We didn't (Not completely), we got better at identifying the issues as they happened. Once we identified in the data we were able to start looking for patterns. The same catalysts that cause it to happen once if left unchanged caused it over and over again. Our solution is astoundingly simple. Once you know that it's likely to happen..... let it! Accept it will 🍄, but also be preemptive in not focusing on it and use alternative data sources, generators or providers. Change the conditions that cause it but continue to monitor it as it happens. As Dan Caffee is fond of saying to me on a regular bases. "Dusty we need the system to be adaptive and provide mechanisms to learn from it's past." Well he is right! All we needed was data and the people that contribute to it's quality. Huge props 👏 to strong support from Kathy and Janelles audit teams. We were armed with a knowledge and a desire to learn from past mistakes. Armed with confidence that we could correct these problems we could focus on the data to prevent their impact going forward. When doing millions of LLM (Generative AI) calls regularly Josh and I have learned that hallucinations will happen. We also know exactly how we will handle and prevent it going forward! Failure is not doing anything and accepting defeat.
4
2 Comments -
Sunayana Sitaram
PARIKSHA update! TLDR; We did 90k human evaluations (thanks to Karya!) across 10 Indian languages and 30 models (probably the largest multilingual human evaluation of LLMs so far?). GPT-4o consistently performs best and Llama-3 70B is close behind. We also performed LLM-based evaluations and found that agreement with human evaluation is higher in some cases, particularly pairwise evaluation in some languages, while agreement on prompts containing cultural nuances is lower (so don't use LLM evaluators for this yet!). Abstract: Evaluation of multilingual Large Language Models (LLMs) is challenging due to a variety of factors -- the lack of benchmarks with sufficient linguistic diversity, contamination of popular benchmarks into LLM pre-training data and the lack of local, cultural nuances in translated benchmarks. In this work, we study human and LLM-based evaluation in a multilingual, multi-cultural setting. We evaluate 30 models across 10 Indic languages by conducting 90K human evaluations and 30K LLM-based evaluations and find that models such as GPT-4o and Llama-3 70B consistently perform best for most Indic languages. We build leaderboards for two evaluation settings - pairwise comparison and direct assessment and analyse the agreement between humans and LLMs. We find that humans and LLMs agree fairly well in the pairwise setting but the agreement drops for direct assessment evaluation especially for languages such as Bengali and Odia. We also check for various biases in human and LLM-based evaluation and find evidence of self-bias in the GPT-based evaluator. Our work presents a significant step towards scaling up multilingual evaluation of LLMs. Preprint: https://lnkd.in/g_m5beGJ We will continue to add more prompts, models and languages in future rounds of Pariksha. Work done with the fantastic team at Microsoft Research India and Karya - Ishaan Watts, Varun Gumma, Aditya Yadavalli, Vivek Seshadri, Swami Manohar. #multilingual #evaluation #genai #indic
344
8 Comments -
Joe McMann
2023 was a breakout year for investment in generative AI startups, with equity funding of almost $22B across 426 deals. The 5 largest rounds all went to companies focused on core generative AI infrastructure: * OpenAI, AI poster child and maker of ChatGPT ($10B corporate minority) * Inflection AI, which focuses on human-computer interfaces ($1.3B Series B) * Anthropic, an AI model developer and research outfit ($1.8B across 2 corporate minorities) * Databricks, a data integration and analytics platform ($504M Series I) * Aleph Alpha, Germany-based LLM developer ($500M Series B) When coupled with the pace of change and the compression of time and cost, makes anyone wonder, given the apparent speed of the commoditization of these LLM engines, what will the ROIC on funds deployed actually look like over time?
1
-
Aliaksei Severyn
Very proud of our team that has done a tremendous job on building SOTA Reward Models for aligning Gemini models to human preferences. It's great to see that our most recent and capable API model Gemini 1.5 Pro when zero-shot prompted to perform an LLM-as-a-judge task ranks 1st when compared to other Generative RMs and 2nd best overall vs other dedicated RMs: https://lnkd.in/eEQpi-XT When running LLM evals / benchmarks consider using Gemini 1.5 as an Autorater / LLM-as-a-judge.
131
2 Comments -
Heiko Koziolek
Released a new paper on how to use LLMs to generate unit tests for PLC code. Getting valid test inputs worked well in a few examples but the assertions were often hallucinated. Needs more research, but already can save developers testing efforts. Virendra Ashiwal, Soumyadip Bandyopadhyay, Chandrika K R, and me from ABB conducted prompt engineering and created a test generator and test execution framework for IEC 61131-3 ST code based on open-source (MATIEC, OpenPLC, GCC). Used several open-source function blocks (OSCAT) for experimentation. Our paper was just accepted at IEEE ETFA 2024: https://2024.ieee-etfa.org 8-page report on arXiv: https://lnkd.in/dn3dZguQ Github repo with the code: https://lnkd.in/dxb7JhBC
165
16 Comments -
Gregory Mermoud
Very insightful work by Anthropic’s interpretability team. And an amazing paper, with outstanding writing and figures. The idea is very simple: interpret LLMs by leveraging sparse autoencoders as surrogate models of the MLP of transformer blocks, which allow one to disambiguate the superposition of features captured by a single neuron. A simple idea, but a very careful and complex execution, as it is often the case in our line of work. The paper goes into many details and provide a large array of insights, although the gist of the implementation remains obfuscated due to the closed source nature of Claude. Too bad, because this is the kind of work that we need to better understand and eventually trust LLMs. This is demonstrated by the authors in the section ‘Influence on Behavior’, where they show that clamping some features to either high or low value during inference is “remarkably effective at modifying model outputs in specific, interpretable ways”. Hopefully this kind of work is going to be replicated and generalized to open-weights models, such that we have new ways to steer their behavior. https://lnkd.in/eVym7f_f #interpretability #xai #explainableai #steerableai #anthropic #claude #anthropic
3
-
Charlie Lee, PhD
https://lnkd.in/gcgHTFhg Matrix multiplications (MatMul) are the most computationally expensive operations in large language models (LLM) using the Transformer architecture. As LLMs scale to larger sizes, the cost of MatMul grows significantly, increasing memory usage and latency during training and inference. Now, researchers at the University of California, Santa Cruz, Soochow University and University of California, Davis have developed a novel architecture that completely eliminates matrix multiplications from language models while maintaining strong performance at large scales.
-
Michael Cronin
Stack AI wants to make it easier to build AI-fueled workflows Ron Miller Stack AI’s co-founders, Antoni Rosinol and Bernardo Aceituno, were PhD students at MIT wrapping up their degrees in 2022 just as large language models were becoming more mainstream. ChatGPT would be released to the world at the end of the year, but even before that, they recognized a problem inside companies putting data together with models without a lot of expertise and knowledge — and they wanted to change that. After graduating, they moved to San Francisco and joined the Winter 23 cohort at Y Combinator, where they launched Stack and refined their idea. Today, the company has built a low-code workflow automation tool designed to help companies build AI-driven workflows including chatbots and AI assistants, for example. The company has raised $3 million so far. “Our platform allows people to build workflows that require connecting different tools to work together. We focus on connecting data sources and LLMs, since doing so allows you to build powerful workflow automations. We also offer many other tools and functions to automate complex business processes,” Aceituno told TechCrunch. They’ve only had a working product for six months but already report over 200 customers using the product. Essentially, that involves dragging components to a workflow canvas. That typically includes a data source such as Google Drive and an LLM along with other workflow components such as a trigger component or an action component to build the workflow, allowing the customer to create generative AI programs without a lot of coding. The coding itself is not AI-driven, but the tasks in the workflow often are, and could require some manual coding to make the workflow work smoothly. Some of their earliest customers are in the healthcare industry, and Aceituno acknowledges they have to be careful with applications involving doctors and patients, especially when internal data sources aren’t always reliable or could contain contradictory or obsolete information. In those cases, he says, it’s important to rely on the human expert, the doctor, to make the call on the quality of the answer. As another level of protection, they include source citations in every answer, so the healthcare professional can check the source before accepting the answer. “That being said, it’s true that you can put garbage in and then the citations will also be garbage and that’s why it’s required that these assistants don’t take over the process completely,” he said.
-
Eugene (Gene) Bordelon
You are seriously underestimating the surprising capabilities of the current LLMs - very Large Language Models based on the Transformer design. Consider Stanford University’s comprehensive “Artificial Intelligence Index Report 2024”: https://lnkd.in/g24NDe7X On page 81, it discusses the State of AI Performance stating “As of 2023, AI has achieved levels of performance that surpass human capabilities across a range of tasks.” “Over the years, AI has surpassed human baselines on a handful of benchmarks, such as image classification in 2015, basic reading comprehension in 2017, visual reasoning in 2020, and natural language inference in 2021. As of 2023, there are still some task categories where AI fails to exceed human ability. These tend to be more complex cognitive tasks, such as visual commonsense reasoning and advanced-level mathematical problem-solving (competition-level math problems).” Then on page 115, the paper discusses the GPQA benchmark: “GPQA: A Graduate-Level Google-Proof Q&A Benchmark. In the last year, researchers from NYU, Anthropic, and Meta introduced the GPQA benchmark to test general multi-subject AI reasoning. This dataset consists of 448 difficult multiple-choice questions that cannot be easily answered by Google searching. The questions were crafted by subject-matter experts in various fields like biology, physics, and chemistry. PhD-level experts achieved a 65% accuracy rate in their respective domains on GPQA, while nonexpert humans scored around 34%. The best-performing AI model, GPT-4, only reached a score of 41.0% on the main test set.” Note, that GPT-4 does better than highly skilled non-expert humans. But this was last year. There is an article published a few days ago that discusses the latest results: https://lnkd.in/g53hdRQR “Last month Anthropic announced that Claude 3 scored just under 60%” (using chain-of-thought prompting and 5 examples).” Note, this is almost at the level of PhD experts “in their respective domains”! And just days ago, OpenAI announced a new version of GPT-4 that they claim outperforms Claude 3 in reasoning abilities - reclaiming the lead its held for the last year.
3 Comments
Explore collaborative articles
We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
Explore MoreOthers named Craig Kapfer
1 other named Craig Kapfer is on LinkedIn
See others named Craig Kapfer