This document discusses the current limitations of machine learning and managing expectations. It covers three key areas: 1) Current state-of-the-art limitations such as an inability to build generalized models for both images and text or conversational agents that pass the Turing test. 2) Expectation mismatch between what products teams expect ML to be able to do and its actual capabilities, like generating catchy titles. 3) Technical difficulties in maintaining ML systems over time like concept drift, training-serving skew, and unexpected data distributions causing false positives that require additional data and retraining. Check data science competitions to understand current ML capabilities and manage expectations.
This document discusses a self-learning computer vision AI that aims to make deep learning solutions more accessible. It outlines how current expert consulting is limited and expensive. The AI would use techniques like transfer learning, hyperparameter optimization, and Bayesian optimization to "learn how to learn" models without human experts. This could expand access to computer vision applications while also researching technologies that may make expert knowledge obsolete over time.
While Artificial Intelligence (AI) and Machine Learning (ML) are 2019 buzzwords, there is an ever growing need for the Software Engineer to understand now they work. This talk explains AI/ML basics in the context of Go dev. It is a hands-on demo using the go-learn library to explain of classing ML techniques.
Making basic, good-looking plots in Python is tough. Matplotlib gives you great control, but at the expense of being very detailed. The rise of pandas has made Python the go-to language for data wrangling and munging but many people are still reluctant to leave R because of its outstanding data viz packages. ggplot is a port of the popular R package ggplot2 into Python. It provides a high level grammar that allow users to quickly and easily make good looking plots. An example may be found here: http://blog.yhathq.com/posts/ggplot-for-python.html Greg will show you how to use ggplot to analyze data from the MLB's open data source, pitchf/x. He will take you through the basics of ggplot and show how easy it is to create histograms, plot smoothed curves, customize colors & shapes. http://www.meetup.com/PyData-Boston/events/184382092/
Making basic, good-looking plots in Python is tough. Matplotlib gives you great control, but at the expense of being very detailed. The rise of pandas has made Python the go-to language for data wrangling and munging but many people are still reluctant to leave R because of its outstanding data viz packages. ggplot is a port of the popular R package ggplot2. It provides a high level grammar that allow users to quickly and easily make good looking plots. So say good-bye to matplotlib, and hello to ggplot as your everyday Python plotting library!
Quick presentation on an easy way to get your app done. Or at least the tools you can use to get from point a to point b as fast as possible.
Reviewing progress in the machine learning certification journey 𝗦𝗽𝗲𝗰𝗶𝗮𝗹 𝗔𝗱𝗱𝗶𝘁𝗶𝗼𝗻 - Short tech talk on How to Network by Qingyue(Annie) Wang C𝗼𝗻𝘁𝗲𝗻𝘁 𝗿𝗲𝘃𝗶𝗲𝘄 𝗼𝗻 AI and ML on Google Cloud by Margaret Maynard-Reid 𝗔 𝗳𝗼𝗰𝘂𝘀𝗲𝗱 𝗰𝗼𝗻𝘁𝗲𝗻𝘁 𝗿𝗲𝘃𝗶𝗲𝘄 𝗼𝗻 𝗠𝗟 𝗽𝗿𝗼𝗯𝗹𝗲𝗺 𝗳𝗿𝗮𝗺𝗶𝗻𝗴, 𝗺𝗼𝗱𝗲𝗹 𝗲𝘃𝗮𝗹𝘂𝗮𝘁𝗶𝗼𝗻, 𝗮𝗻𝗱 𝗳𝗮𝗶𝗿𝗻𝗲𝘀𝘀 by Sowndarya Venkateswaran. A discussion on sample questions to aid certification exam preparation. An interactive Q&A session to clarify doubts and questions. Previewing next steps and topics, including course completions and material reviews.
My presentation about how to get started with competitive data mining at the meeting of data mining research group of Department of Computer Science and Engineering, University of Moratuwa.
My presentation about how to get started with competitive data mining at the data mining research group of Department of Computer Science and Engineering, University of Moratuwa.
This document outlines an agenda for a Big Data workshop, including an introduction to Big Data concepts and tools. The workshop will discuss why Big Data is important, what it is and isn't, and fears around working with large datasets. It will provide examples of Big Data in organizations and products. The workshop aims to be practical, focusing on real-world use cases and selecting the right technologies. References are included for further reading on topics like Hadoop, PostgreSQL, Cassandra, Storm and analyzing log data. Attendees will have an opportunity to discuss Big Data projects and challenges.
This document provides an overview of investing in AI-driven startups. It outlines Dr. Roy Lowrance's background working with machine learning systems and startups. It then lists 100 AI startups that have raised over $11.7 billion total. The agenda covers an overview of AI, machine learning and big data, the life cycle of AI projects, and sustainable competitive advantages for AI-based startups.
This document describes courses offered by Product School to help product managers gain skills in areas like product management, coding, data analytics, digital marketing, UX design, and product leadership. It also provides an overview of a talk on applying machine learning given by a Lyft senior product manager. The talk explains what machine learning is, the different types of machine learning problems, and how product managers can identify opportunities, define problems, and guide machine learning solutions and teams. Examples are provided around replacing cash bail and automating food delivery order disputes.
In this presentation I list and try to answer some useful questions about machine learning, and large-scale machine learning in particular. I talk about things like what we can and cannot do with ML, do I need a cluster for large-scale ML, what are common problems with ML systems and future directions.
In the first day of How-to-AI Series, you will learn some fundamental knowledge about the AI and Machine Learning field.
Slides Chris Butler recently used in his discussion w/ mentees of The Product Mentor. Synopsis: In this talk, Vikas will share his thoughts on what is Product Strategy and how Product Managers can develop it, He will also share some concepts in Strategy and how Product Managers can apply them to make their products more successful. The Product Mentor is a program designed to pair Product Mentors and Mentees from around the World, across all industries, from start-up to enterprise, guided by the fundamental goals…Better Decisions. Better Products. Better Product People. Throughout the program, each mentor leads a conversation in an area of their expertise that is live streamed and available to both mentee and the broader product community. http://TheProductMentor.com
Taylor Howard, Director of Data Analytics and Collaboration, machine learning presentation from TBTF PoweredUp Technology Festival 2017.
The document provides an overview of machine learning and artificial intelligence concepts. It discusses: 1. The machine learning pipeline, including data collection, preprocessing, model training and validation, and deployment. Common machine learning algorithms like decision trees, neural networks, and clustering are also introduced. 2. How artificial intelligence has been adopted across different business domains to automate tasks, gain insights from data, and improve customer experiences. Some challenges to AI adoption are also outlined. 3. The impact of AI on society and the workplace. While AI is predicted to help humans solve problems, some people remain wary of technologies like home health diagnostics or AI-powered education. Responsible development of explainable AI is important.
When it comes to studying, Machines and Students have one thing in common: Examinations. To perform well on their final evaluations, humans require taking classes, reading books and solving practice quizzes. Similarly, machines need artificial intelligence to memorize data, infer feature correlations, and pass validation standards in order to solve almost any problem. In this quick introductory session, we'll walk through these analogies to learn the core concepts behind Machine Learning, and why it works so well!
The document summarizes an advanced agile testing workshop hosted by Lisa Crispin. The workshop aims to be collaborative and help attendees solve testing problems through experiments and discussions on topics like impact mapping, testing quadrants, skills development, tool selection, technical debt, and test automation. Attendees will identify their biggest testing challenges, prioritize them, and brainstorm experiments to address high priority problems through techniques like impact mapping and story mapping. The workshop provides resources and examples to facilitate these discussions.
Based on Gartner's research, 85% of AI projects fail. In this talk, we show the most common mistakes made by the managers, developers, and data scientists while building AI products. We go through ten case studies of products that failed and analyze the reasons for each failure. We also present how to avoid such mistakes and deliver a successful AI product by introducing a few lifecycle changes.
This document discusses moving from traditional business intelligence (BI) tools to adopting machine learning. It begins with an overview of common BI workflows and their limitations. It then provides introductions to machine learning, deep learning, and artificial intelligence. The machine learning pipeline is explained along with examples of adopting machine learning in products. Challenges of adopting machine learning are discussed as well as cost optimization strategies. Real world use cases are presented and open source options are mentioned.
This document discusses best practices for setting up development and test sets for machine learning models. It recommends that the dev and test sets: 1) Should reflect the actual data distribution you want your model to perform well on, rather than just being a random split of your training data. 2) Should come from the same data distribution. Having mismatched dev and test sets makes progress harder to measure. 3) The dev set should be large enough, typically thousands to tens of thousands of examples, to detect small performance differences as models are improved. The test set size depends on desired confidence in overall performance.
Abstract: There has been tremendous progress in artificial intelligence recently. There's no doubt one day it will also power Datadog products and you'll have to deal with it in your pipelines. What is it going to change? In this talk, I'll explain what makes ML fundamentally different than software engineering, and present a few of the operational challenges of setting up a machine learning system in the real world. Most importantly, I’ll propose practical steps to prepare the transition, that do not require you having a machine model running yet. This talk was given at a Ladies of Code Meetup in Paris, in May 2023. Recording: https://www.youtube.com/watch?v=S9l8GO4wtdY Meetup: https://www.meetup.com/fr-FR/ladies-of-code-paris/events/293711765/
District Data Labs Workshop Current Workshop: August 23, 2014 Previous Workshops: - April 5, 2014 Data products are usually software applications that derive their value from data by leveraging the data science pipeline and generate data through their operation. They aren’t apps with data, nor are they one time analyses that produce insights - they are operational and interactive. The rise of these types of applications has directly contributed to the rise of the data scientist and the idea that data scientists are professionals “who are better at statistics than any software engineer and better at software engineering than any statistician.” These applications have been largely built with Python. Python is flexible enough to develop extremely quickly on many different types of servers and has a rich tradition in web applications. Python contributes to every stage of the data science pipeline including real time ingestion and the production of APIs, and it is powerful enough to perform machine learning computations. In this class we’ll produce a data product with Python, leveraging every stage of the data science pipeline to produce a book recommender.
Learn more about enterprise frameworks and why your technology business and you need to be thinking about your software application architecture at scale.
This document summarizes a presentation about data science at OLX. It discusses OLX's moderation and recommender systems. For moderation, it describes OLX's machine learning models that automatically moderate listings for issues like duplicates, spam, and illegal/NSFW content. Moderators review flagged content. For recommendations, it discusses collaborative filtering and item embeddings to suggest relevant listings to users. It also outlines OLX's team structure, goal setting process, and expectations for data scientists, which include a focus on modeling, evaluation and some production work.
Whylogs is an open source tool for data monitoring that automatically creates statistical summaries called profiles of datasets. It helps with data monitoring by generating these profiles which can be compared over time to detect changes visually or programmatically. This allows issues like schema changes or bugs in data pipelines to be identified. The profiles have properties like being descriptive, lightweight and mergeable, which enables monitoring across distributed systems by allowing profile data to be logically merged. Whylogs thus provides a step towards observability of data systems.
The document outlines the plan and syllabus for a Data Engineering Zoomcamp hosted by DataTalks.Club. It introduces the four instructors for the course - Ankush Khanna, Sejal Vaidya, Victoria Perez Mola, and Alexey Grigorev. The 10-week course will cover topics like data ingestion, data warehousing with BigQuery, analytics engineering with dbt, batch processing with Spark, streaming with Kafka, and a culminating 3-week student project. Pre-requisites include experience with Python, SQL, and the command line. Course materials will be pre-recorded videos and there will be weekly live office hours for support. Students can earn a certificate and compete on a
This document discusses Zalando's use of AI to improve size and fit recommendations for customers. It outlines several challenges including varying size conventions, limited fit data for new items, and sparse customer purchase histories. It then describes Zalando's approaches to address these, including algorithms that use item images to predict sizes for new items lacking data (SizeNet) and models that learn from customers' past purchases and feedback to provide personalized size recommendations. The goal is to help customers find the right fit on their first purchase to reduce returns and improve the shopping experience.
Computer vision techniques like facial recognition and image captioning can help automate metadata generation for media companies. Facial recognition can identify people in photos to assist editors and improve searchability, while image captioning can propose captions. A case study of applying these techniques to photos from English Premier League football games achieved 99% accuracy for facial recognition and precision of 78.7% for image captioning. Combining the two allows generating customized captions that include names identified through facial recognition. Challenges remain when the automatic caption does not match details in the image.
ML Zoomcamp 10 - Kubernetes
This document discusses several paradoxes that can arise in data science. It begins by discussing modelling and simulations that can be used when data is unavailable. It then outlines Simpson's Paradox, where a trend seen in groups disappears or reverses when the groups are combined. Next, it discusses the accuracy paradox, where a metric stops being useful once it becomes the target. It also discusses the learnability-Godel paradox related to the limitations of mathematics according to Godel's incompleteness theorems. Finally, it discusses the law of unintended consequences as it relates to data science.
ML Zoomcamp 8 - Neural networks and deep learning
An algorithm is considered fair if its results and performance are independent of sensitive variables like gender, ethnicity, etc. Fairness can be introduced at different stages of model development, such as in data collection, preparation, and model selection. Techniques for identifying and mitigating bias include causal reasoning, explainability, fairness metrics, and counterfactuals. Counterfactual fairness evaluates predictions across different protected attribute values while holding other variables constant. Explainability helps ensure models make decisions for the right reasons. Overall fairness aims to achieve equal outcomes or opportunities across groups.
This document discusses MLOps at OLX, including: - The main areas of data science work at OLX like search, recommendations, fraud detection, and content moderation. - How OLX uses teams structured by both feature areas and roles to collaborate on projects. - A maturity model for MLOps with levels from no MLOps to fully automated processes. - How OLX has improved from siloed work to cross-functional teams and adding more automation to model creation, release, and application integration over time.
ML Zoomcamp 6 - Decision Trees and Ensemble Learning
ML Zoomcamp 5 - Model deployment
Olga Petrova gives an introduction to transformers for natural language processing (NLP). She begins with an overview of representing words using tokenization, word embeddings, and one-hot encodings. Recurrent neural networks (RNNs) are discussed as they are important for modeling sequential data like text, but they struggle with long-term dependencies. Attention mechanisms were developed to address this by allowing the model to focus on relevant parts of the input. Transformers use self-attention and have achieved state-of-the-art results in many NLP tasks. Bidirectional Encoder Representations from Transformers (BERT) provides contextualized word embeddings trained on large corpora.
ML Zoomcamp 4 - Evaluation Metrics for Classification
ML Zoomcamp 3 - Machine Learning for Classification
ML Zoomcamp Week #2 Office Hours
This document discusses the use of machine learning in online marketplaces. It outlines how machine learning is used for recommendations, search, trust and safety, seller experience, and pricing/monetization. Specific applications mentioned include collaborative and content-based recommendation systems, ranking models for search, automated content moderation, image quality assessment, dynamic pricing, and promoting listings. The document provides examples of algorithms like counting, collaborative filtering, learning to rank, and neural networks that power these machine learning applications in online marketplaces.
This document contains summaries from multiple sessions of a machine learning zoomcamp. It introduces machine learning concepts like supervised learning, the CRISP-DM process, model selection, linear algebra, and the Python libraries NumPy and Pandas. It also discusses setting up an environment for machine learning and provides example data and models for tasks like email spam detection and car price prediction.
The project "Social Media Platform in Object-Oriented Modeling" aims to design and model a robust and scalable social media platform using object-oriented modeling principles. In the age of digital communication, social media platforms have become indispensable for connecting people, sharing content, and fostering online communities. However, their complex nature requires meticulous planning and organization.This project addresses the challenge of creating a feature-rich and user-friendly social media platform by applying key object-oriented modeling concepts. It entails the identification and definition of essential objects such as "User," "Post," "Comment," and "Notification," each encapsulating specific attributes and behaviors. Relationships between these objects, such as friendships, content interactions, and notifications, are meticulously established.The project emphasizes encapsulation to maintain data integrity, inheritance for shared behaviors among objects, and polymorphism for flexible content handling. Use case diagrams depict user interactions, while sequence diagrams showcase the flow of interactions during critical scenarios. Class diagrams provide an overarching view of the system's architecture, including classes, attributes, and methods .By undertaking this project, we aim to create a modular, maintainable, and user-centric social media platform that adheres to best practices in object-oriented modeling. Such a platform will offer users a seamless and secure online social experience while facilitating future enhancements and adaptability to changing user needs.
Pre-trained Large Language Models (LLM) have achieved remarkable successes in several domains. However, code-oriented LLMs are often heavy in computational complexity, and quadratically with the length of the input code sequence. Toward simplifying the input program of an LLM, the state-of-the-art approach has the strategies to filter the input code tokens based on the attention scores given by the LLM. The decision to simplify the input program should not rely on the attention patterns of an LLM, as these patterns are influenced by both the model architecture and the pre-training dataset. Since the model and dataset are part of the solution domain, not the problem domain where the input program belongs, the outcome may differ when the model is trained on a different dataset. We propose SlimCode, a model-agnostic code simplification solution for LLMs that depends on the nature of input code tokens. As an empirical study on the LLMs including CodeBERT, CodeT5, and GPT-4 for two main tasks: code search and summarization. We reported that 1) the reduction ratio of code has a linear-like relation with the saving ratio on training time, 2) the impact of categorized tokens on code simplification can vary significantly, 3) the impact of categorized tokens on code simplification is task-specific but model-agnostic, and 4) the above findings hold for the paradigm–prompt engineering and interactive in-context learning and this study can save reduce the cost of invoking GPT-4 by 24%per API query. Importantly, SlimCode simplifies the input code with its greedy strategy and can obtain at most 133 times faster than the state-of-the-art technique with a significant improvement. This paper calls for a new direction on code-based, model-agnostic code simplification solutions to further empower LLMs.
"Le potenzialità del Digital Twin per il settore Water"
SP-23: Hand Bank on Concrete Mixes required at the time designing
OCS Training Institute is pleased to co-operate with a Global provider of Rig Inspection/Audits, Commission-ing, Compliance & Acceptance as well as & Engineering for Offshore Drilling Rigs, to deliver Drilling Rig Inspec-tion Workshops (RIW) which teaches the inspection & maintenance procedures required to ensure equipment integrity. Candidates learn to implement the relevant standards & understand industry requirements so that they can verify the condition of a rig’s equipment & improve safety, thus reducing the number of accidents and protecting the asset.
Advances in Detect and Avoid for Unmanned Aircraft Systems and Advanced Air Mobility
SCADAmetrics Instrumentation for Sensus Water Meters - Core and Main Training 2024 July 09