The developer world is changing as we create and generate new data patterns and handling processes within our applications. Additionally, with the massive interest in machine learning and advanced analytics how can we as developers build intelligence directly into our applications that can integrate with the data and data paths we are creating? The answer is Azure Databricks and by attending this session you will be able to confidently develop smarter and more intelligent applications and solutions which can be continuously built upon and that can scale with the growing demands of a modern application estate.
Spark is fast becoming a critical part of Customer Solutions on Azure. Databricks on Microsoft Azure provides a first-class experience for building and running Spark applications. The Microsoft Azure CAT team engaged with many early adopter customers helping them build their solutions on Azure Databricks. In this session, we begin by reviewing typical workload patterns, integration with other Azure services like Azure Storage, Azure Data Lake, IoT / Event Hubs, SQL DW, PowerBI etc. Most importantly, we will share real-world tips and learnings that you can take and apply in your Data Engineering / Data Science workloads
This presentation covers some of the major data science and AI announcements from the May 2020 Microsoft Build conference. Included in this talk are 1) Azure Synapse Link, 2) Responsible AI, 3) Project Bonsai & Project Moab, and 4) AI Models at Scale (deep learning with billions of parameters).
This presentation focuses on the value proposition for Azure Databricks for Data Science. First, the talk includes an overview of the merits of Azure Databricks and Spark. Second, the talk includes demos of data science on Azure Databricks. Finally, the presentation includes some ideas for data science production.
These are the slides for my talk "An intro to Azure Data Lake" at Azure Lowlands 2019. The session was held on Friday January 25th from 14:20 - 15:05 in room Santander.
This document contains contact information for Marcos Freccia, a SQL Server DBA and Data Platform MVP at Zalando SE. It also lists some common challenges for BI professionals such as managing data in the cloud, ease of use and adoption, keeping data current, integration with existing environments, and managing BI systems. Finally, it provides an overview of Power BI including its key benefits, data sources, visualization capabilities, and integration with other Microsoft products.
Azure Data Lake Intro (SQLBits 2016 ADL/USQL Pre-Conference) Data Lake concept, Azure Data Lake, HDINSIGHT, Azure Data Lake Storage and Analytics
Hast Du Dich als Datenbankentwickler schon einmal gefragt, wie Du Deine Datenbank-Projekte mit Machine Learning Technologien erweitern kannst? Wie kannst Du Dein vorhandenes Wissen wiederverwenden und was muss Du noch lernen? In dieser Session stellt Sascha Dittmann verschiedene Lernpfade vor, um als Datenbankentwickler in die Welt des Data Science eintauchen zu können. Für seine Praxisbeispiele nutzt er dabei verschiedene Werkzeuge, wie beispielsweise die SQL Server ML Services, Azure Databricks und die Azure ML Services, um bekanntes Wissen mit Neuen zu vereinen.
The presentation discusses how to migrate expensive open source big data workloads to Azure and leverage latest compute and storage innovations within Azure Synapse with Azure Data Lake Storage to develop a powerful and cost effective analytics solutions. It shows how you can bring your .NET expertise with .NET for Apache Spark to bear and how the shared meta data experience in Synapse makes it easy to create a table in Spark and query it from T-SQL.
This document summarizes a presentation given by Alberto Diaz Martin on Azure Databricks for data scientists. The presentation covered how Databricks can be used for infrastructure management, data exploration and visualization at scale, reducing time to value through model iterations and integrating various ML tools. It also discussed challenges for data scientists and how Databricks addresses them through features like notebooks, frameworks, and optimized infrastructure for deep learning. Demo sections showed EDA, ML pipelines, model export, and deep learning modeling capabilities in Databricks.
May's RDX Insights Series Presentation focuses on Microsoft's BI products. We begin with an overview of Power BI, SSIS, SSAS and SSRS and how the products integrate with each other. The webinar continues with a detailed discussion on how to use Power BI to capture, model, transform, analyze and visualize key business metrics. We’ll finish with a Power BI demo highlighting some of its most beneficial and interesting features.
This document discusses designing a modern data warehouse in Azure. It provides an overview of traditional vs. self-service data warehouses and their limitations. It also outlines challenges with current data warehouses around timeliness, flexibility, quality and findability. The document then discusses why organizations need a modern data warehouse based on criteria like customer experience, quality assurance and operational efficiency. It covers various approaches to ingesting, storing, preparing, modeling and serving data on Azure. Finally, it discusses architectures like the lambda architecture and common data models.
Here are the slides for my talk "An intro to Azure Data Lake" at Techorama NL 2018. The session was held on Tuesday October 2nd from 15:00 - 16:00 in room 7.
Data orchestration is the lifeblood of any successful data analytics solution. Take a deep dive into Azure Data Factory's data movement and transformation activities, particularly its integration with Azure's Big Data PaaS offerings such as HDInsight, SQL Data warehouse, Data Lake, and AzureML. Participants will learn how to design, build and manage big data orchestration pipelines using Azure Data Factory and how it stacks up against similar Big Data orchestration tools such as Apache Oozie. Video of presentation: https://channel9.msdn.com/Events/Ignite/Australia-2017/DA332
Think of big data as all data, no matter what the volume, velocity, or variety. The simple truth is a traditional on-prem data warehouse will not handle big data. So what is Microsoft’s strategy for building a big data solution? And why is it best to have this solution in the cloud? That is what this presentation will cover. Be prepared to discover all the various Microsoft technologies and products from collecting data, transforming it, storing it, to visualizing it. My goal is to help you not only understand each product but understand how they all fit together, so you can be the hero who builds your companies big data solution.
Talk Description: The Modern Data Warehouse architecture is a response to the emergence of Big Data, Machine Learning and Advanced Analytics. DevOps is a key aspect of successfully operationalising a multi-source Modern Data Warehouse. While there are many examples of how to build CI/CD pipelines for traditional applications, applying these concepts to Big Data Analytical Pipelines is a relatively new and emerging area. In this demo heavy session, we will see how to apply DevOps principles to an end-to-end Data Pipeline built on the Microsoft Azure Data Platform with technologies such as Data Factory, Databricks, Data Lake Gen2, Azure Synapse, and AzureDevOps. Resources: https://aka.ms/mdw-dataops
Presentation by James Baker and myself on Running cost effective big data workloads with Azure Synapse and Azure Datalake Storage (ADLS) at Microsoft Ignite 2020. Covers Modern Data warehouse architecture supported by Azure Synapse, integration benefits with ADLS and some features that reduce cost such as Query Acceleration, integration of Spark and SQL processing with integrated meta data and .NET For Apache Spark support.
Data Con LA 2020 Description Data warehouses are not enough. Data lakes are the backbone of a modern data environment. Data Lakes are best built leveraging unique services of the cloud provider to reduce operations complexity. This session will explain why everyone's talking about data lakes, break down the best services in Azure to build a Data Lake, and walk through code for querying and loading with Azure Databricks and Event Hubs for Kafka. Attendees will leave the session with a firm grasp of why we build data lakes and how Azure Databricks fits in for ETL and querying. Speaker Dustin Vannoy, Dustin Vannoy Consulting, Principal Data Engineer
Apache Spark is the next big data processing tool for Data Scientist. As seen on the recent StackOverflow analysis, it's the hottest big data technology on their site! In this talk, I'll use the PySpark interface to leverage the speed and performance of Apache Spark. I'll focus on the end to end workflow for getting data into a distributed platform, and leverage Spark to process the data for advanced analytics. I'll discuss the popular Spark APIs used for data preparation, SQL analysis, and ML algorithms. I'll explain the performance differences between Scala and Python, and how Spark has bridged the gap in performance. I'll focus on PySpark as the interface to the platform, and walk through a demo to showcase the APIs. Talk Overview: Spark's Architecture. What's out now and what's in Spark 2.0Spark APIs: Most common APIs used by Spark Common misconceptions and proper techniques for using Spark. Demo: Walk through ETL of the Reddit dataset. SparkSQL Analytics + Visualizations of the Dataset using MatplotLibSentiment Analysis on Reddit Comments
Sesja pokazująca zarówno Machine Learning Server (czyli algorytmy uczenia maszynowego w językach R i Python), ale także możliwość korzystania z danych JSON w SQL Server, czy też łączenia się do danych znajdujących się na HDFS, HADOOP, czy Spark poprzez Polybase w SQL Server, by te dane wykorzystywać do analizy, predykcji poprzez modele w językach R lub Python.
Presented at the MLConf in Seattle, this presentation offers a quick introduction to Apache Spark, followed by an overview of two novel features for data science
This document discusses developing analytics applications using machine learning on Azure Databricks and Apache Spark. It begins with an introduction to Richard Garris and the agenda. It then covers the data science lifecycle including data ingestion, understanding, modeling, and integrating models into applications. Finally, it demonstrates end-to-end examples of predicting power output, scoring leads, and predicting ratings from reviews.
In questa sessione vedremo, con il solito approccio pratico di demo hands on, come utilizzare il linguaggio R per effettuare analisi a valore aggiunto, Toccheremo con mano le performance di parallelizzazione degli algoritmi, aspetto fondamentale per aiutare il ricercatore nel raggiungimento dei suoi obbiettivi. In questa sessione avremo la partecipazione di Lorenzo Casucci, Data Platform Solution Architect di Microsoft.
The document summarizes the Databricks analytics platform, which provides a unified environment powered by Apache Spark. It integrates with Azure services and provides features like interactive collaboration, native security integration, and one-click setup. It also discusses capabilities like schema management, data consistency, elastic scalability, and automatic upgrades.
Spark + AI Summit 2020 had over 35,000 attendees from 125 countries. The majority of participants were data engineers and data scientists. Apache Spark is now widely used with Python and SQL. Spark 3.0 includes improvements like adaptive query execution that accelerate queries by 2-18x. Delta Engine is a new high performance query engine for data lakes built on Spark 3.0.
This document summarizes machine learning pipelines in Apache Spark using MLlib. It introduces Spark DataFrames for structured data manipulation and Apache Spark MLlib for building machine learning workflows. An example text classification pipeline is presented to demonstrate loading data, feature extraction, training a logistic regression model, and evaluating performance. Parameter tuning is discussed as an important part of the machine learning process.
This document discusses Azure Machine Learning services for data scientists. It provides an overview of Azure Machine Learning Studio for building and deploying machine learning models with over 100 modules. Numbers show hundreds of thousands of deployed models serving billions of requests. It also discusses Azure Batch AI for scalable machine learning training without managing infrastructure, and Azure Databricks for Apache Spark as a managed service on Azure. The document outlines the machine learning development lifecycle supported in Azure and tools for experimentation, model management, and operationalization of models.
This document discusses PowerBI and R. It provides an overview of Microsoft R products including Microsoft R Open, Microsoft R Server, and SQL Server R Services. It explains how SQL Server R Services integrates R with SQL Server for scalable in-database analytics. Examples of using R with PowerBI, SQL Server, and Azure are provided. The document also compares the capabilities of Microsoft R Open, Microsoft R Server, and open source R and discusses using R for advanced analytics, predictive modeling, and big data at scale.
The document discusses the role and responsibilities of a data architect. It provides information on the high demand and salaries for data architects, which can be over $200,000 at companies like Microsoft. The summary also outlines some of the key technical skills required for the role, including strong data modeling abilities, knowledge of databases, ETL tools, analytics dashboards, and programming languages like SQL, Python and R. Business skills like communication and presenting complex concepts are also important.
This document provides an overview of Spark: Data Science as a Service by Sridhar Alla and Kiran Muglurmath of Comcast. It discusses Comcast's data science challenges due to massive data size and lack of scalable architecture. It introduces Roadrunner, Comcast's solution built on Spark, which provides a centralized processing system with SQL and machine learning capabilities to enable data ingestion, quality checks, feature engineering, modeling and workflow management. Roadrunner is accessed through REST APIs and helps multiple teams work with the same large datasets. Examples of transformations, joins, aggregations and anomaly detection algorithms demonstrated in Roadrunner are also included.
This document describes Hopsworks, an end-to-end data platform for analytics and machine learning built by KTH and RISE SICS. It provides data ingestion, preparation, experimentation, model training, and deployment capabilities. The platform is built on Apache technologies like Apache Beam, Spark, Flink, Kafka, and uses Kubernetes for orchestration. It also includes a feature store for ML features. The document then discusses Apache Flink and its use for stream processing applications. It provides examples of using Flink's APIs like SQL, CEP, and machine learning. Finally, it introduces the concept of continuous deep analytics and the Arcon framework for unified analytics across streams, tensors, graphs and more through an intermediate
Building Retrieval-Augmented Generation (RAG) systems with open-source and custom AI models is a complex task. This talk explores the challenges in productionizing RAG systems, including retrieval performance, response synthesis, and evaluation. We’ll discuss how to leverage open-source models like text embeddings, language models, and custom fine-tuned models to enhance RAG performance. Additionally, we’ll cover how BentoML can help orchestrate and scale these AI components efficiently, ensuring seamless deployment and management of RAG systems in the cloud.
The breath and depth of Azure products that fall under the AI and ML umbrella can be difficult to follow. In this presentation I’ll first define exactly what AI, ML, and deep learning is, and then go over the various Microsoft AI and ML products and their use cases.
This introductory workshop is aimed at data analysts & data engineers new to Apache Spark and exposes them how to analyze big data with Spark SQL and DataFrames. In this partly instructor-led and self-paced labs, we will cover Spark concepts and you’ll do labs for Spark SQL and DataFrames in Databricks Community Edition. Toward the end, you’ll get a glimpse into newly minted Databricks Developer Certification for Apache Spark: what to expect & how to prepare for it. * Apache Spark Basics & Architecture * Spark SQL * DataFrames * Brief Overview of Databricks Certified Developer for Apache Spark
Spark Overview, Cluster Architecture, Elements, Spark Stack, Spark Streaming Meetup Details of my presentation: http://www.meetup.com/lspe-in/events/212250542/ http://www.meetup.com/devops-bangalore/events/222155834/
This document discusses Spark ML pipelines for machine learning workflows. It begins with an introduction to Spark MLlib and the various algorithms it supports. It then discusses how ML workflows can be complex, involving multiple data sources, feature transformations, and models. Spark ML pipelines allow specifying the entire workflow as a single pipeline object. This simplifies debugging, re-running on new data, and parameter tuning. The document provides an example text classification pipeline and demonstrates how data is transformed through each step via DataFrames. It concludes by discussing upcoming improvements to Spark ML pipelines.
Automated machine learning (automated ML) automates feature engineering, algorithm and hyperparameter selection to find the best model for your data. The mission: Enable automated building of machine learning with the goal of accelerating, democratizing and scaling AI. This presentation covers some recent announcements of technologies related to Automated ML, and especially for Azure. The demonstrations focus on Python with Azure ML Service and Azure Databricks.
Dustin Vannoy is a field data engineer at Databricks and co-founder of Data Engineering San Diego. He specializes in Azure, AWS, Spark, Kafka, Python, data lakes, cloud analytics, and streaming. The document provides an overview of various Azure data and analytics services including Azure SQL DB, Cosmos DB, Blob Storage, Data Lake Storage Gen 2, Databricks, Synapse Analytics, Data Factory, Event Hubs, Stream Analytics, and Machine Learning. It also includes a reference architecture and recommends Microsoft Learn paths and community resources for learning.
The document provides 100 different ways that Yammer can be used within an organization. These include using Yammer to ask questions, share information and updates, coordinate meetings and events, get feedback, welcome new employees, plan trainings, and celebrate accomplishments. The broad range of suggestions show how Yammer can facilitate internal communication and collaboration across departments.
This document provides suggestions for 10 core groups that could be created on a company's Yammer network: CEO Connection, Heritage, Diversity & Inclusion, Emerging Technologies, New Hires, Innovation, Social Groups, Department/Region/Offices, Safety Moments, and Parent Community. For each group, a sample description and potential uses are outlined to provide ideas for how the group could be utilized.
1) The document discusses securing IoT devices and infrastructure through X.509 certificate-based identity and attestation, TLS-based encryption, and secure provisioning and management. 2) It describes securing the cloud infrastructure with Azure Security Center, Azure Active Directory, Key Vault, and policy-based access controls. 3) The document promotes building security into devices and infrastructure from the start through standards-based and custom secure hardware modules.
Visual Studio and Xamarin enable developers to create native Android and iOS apps with world-class tools in a fast, familiar, and flexible way. Join this tour of how you can use your existing C# and .NET skills to create fully native apps on every platform.
Learn how to take advantage of APIs, platform capabilities and intelligence from Microsoft Graph to make your app more performant, more resilient and more reliable
Build interactive emails for Outlook with Actionable Messages using Adaptive Cards. In this session, you will learn how to code a simple and great looking Actionable Message end-to-end.
As organizations deploy additional security controls to combat today’s evolving threats, integration challenges often limit the return of investment. The new security API in the Microsoft Graph makes it easier for enterprise developers and ISVs to unlock insights from these solutions by unifying and standardizing alerts for easier integration and correlation, bringing together contextual data to inform investigations, and enabling automation for greater SecOps efficiency. We will walk through real world examples of applications that leverage the security API to help customers realize the full value of their security investments.
The document describes a simple workflow that calls an activity function called "SayHello" and passes the parameter "Amsterdam". The activity function returns the string "Hello Amsterdam!". The orchestrator function schedules the activity, waits for it to complete, collects the output, and returns it.
The document describes the process of automatically scaling Azure Container Instances for a game server. It shows how ACIAutoScaler can monitor container usage and dynamically add or remove instances as needed to handle fluctuations in active sessions. When sessions drop below a threshold, ACISetState marks an instance for deletion. Once sessions stop on that instance, ACIGC deletes it to maintain optimal resource usage.
This document discusses NoSQL databases and Azure Cosmos DB. It notes that Cosmos DB supports key-value, column, document and graph data models. It guarantees high availability and throughput while offering customizable pricing based on throughput. Cosmos DB uses the Atom-Record-Sequence data model and provides SQL and table APIs to access and query data. The document provides an example of how 12 relational tables could be collapsed into 3 document collections in Cosmos DB.
This document provides information about building streaming applications. It refers the reader to a website, aka.ms/build-streaming, that explains how to configure input and output bindings as well as triggers to develop streaming applications. The Twitter handle @codemillmatt is mentioned, suggesting this person may provide additional help or resources on the topic.