2018 Women in Analytics Conference https://www.womeninanalytics.org/ Over the last year I’ve become obsessed with learning how to be a better "cloud computing evangelist to data scientists" - specifically to the R community. I’ve learned that this isn’t often an easy undertaking. Most people (data scientists or not) are skeptical of changing up the tools and workflows they’ve come to rely on when those systems seem to be working. Resistance to change increases even further with barriers to quick adoption, such as having to teach yourself a completely new technology or framework. I’d like to give a talk about how working in the cloud changes data science and how exploring these tools can lead to a world of new possibilities within the intersection of DevOps and Data Analytics. Topics to discuss: - Working through functionality/engineering challenges with R in a cloud environment - Opportunities to customize and craft your ideal version of R/RStudio - Making and embracing a decision on what is “real" about your analysis or daily work (Chapter 6 in R for Data Science) - Running multiple R instances in the cloud (why would you want to do this?) - Becoming an R/Data Science Collaboration wizard: Building APIs with Plumber in the Cloud
In this talk, we’ll describe NoSQL (“not-only SQL”) and document-oriented databases and the value they provide for data science companies like Uptake. We will walk through the unique challenges such datastores pose for data science workflows. To make these challenges and lessons learned concrete, we’ll explore data science workflows through a discussion of the development efforts that led to “uptasticsearch”, an R package released by the Uptake Data Science team to reduce friction in interacting with a document store called Elasticsearch. The talk will conclude with a discussion of recent developments in NoSQL technologies and implications for data scientists.
Organizations with on-premises Hadoop infrastructure are bogged down by system complexity, unscalable infrastructure, and the increasing burden on DevOps to manage legacy architectures. Costs and resource utilization continue to go up while innovation has flatlined. In this session, you will learn why, now more than ever, enterprises are looking for cloud alternatives to Hadoop and are migrating off of the architecture in large numbers. You will also learn how elastic compute models’ benefits help one customer scale their analytics and AI workloads and best practices from their experience on a successful migration of their data and workloads to the cloud.
What gets measured, gets managed; but what gets governed, generates real value. That's one major reason why data governance has risen to a top priority for most organizations. Another reason is the rapid onboarding of big data, which often comes from beyond the traditional firewall. And then there are the authorities: issues like privacy, security and fiduciary responsibility are combining to make data governance a must-have. Register for this episode of The Briefing Room to hear veteran Analyst Dr. Robin Bloor explain why governance should be viewed as a positive change agent for the modern enterprise. He'll be briefed by Ron Huizenga of IDERA, who will discuss a practical, model-based approach to enterprise data governance, with a focus on Master Data Management.
Here is an overview of the Bridged framework that CodeData uses to deliver data driven solutions to our customers. The Bridged framework covers all aspects of such solutions - strategy. leadership, process, technology, education and operations.
In this session, the Databricks product team provides a deeper dive into the machine learning announcements. Join us for a detailed demo that gives you insights into the latest innovations that simplify the ML lifecycle — from preparing data, discovering features, and training and managing models in production.
Modern day applications are data driven and data rich. The infrastructure your backends run on are a critical aspect of your environment, and require unique monitoring tools and techniques. In this webinar learn about what DataOps is, and how critical good data ops is to the integrity of your application. Intelligent APM for your data is critical to the success of modern applications. In this webinar you will learn: The power of APM tailored for Data Operations The importance of visibility into your data infrastructure How AIOps makes data ops actionable
The document discusses moving healthcare data architecture to the cloud. It describes a large health system that implemented an enterprise data warehouse (EDW) on the cloud to provide cost savings and flexibility. This consolidated multiple clinical repositories and reduced infrastructure costs. It also describes an academic health center that integrated patient records across its organizations using a cloud-based EDW. This improved analytics and reduced operating costs by 50% while improving patient care. Both organizations benefited from the scalability, cost savings and innovation the cloud enabled for their clinical analytics and research.
Industry thought leaders Gaurav Dhillon and David Linthicum discuss the future of cloud integration and data management in the API economy. Topics from this webinar and the accompanying slides include: key considerations of today's CIOs, approaching the reality of the multi-cloud world and new solutions for managing cloud and on-premise data. To learn more, visit: http://www.snaplogic.com/.
The document introduces data engineering and provides an overview of the topic. It discusses (1) what data engineering is, how it has evolved with big data, and the required skills, (2) the roles of data engineers, data scientists, and data analysts in working with big data, and (3) the structure and schedule of an upcoming meetup on data engineering that will use an agile approach over monthly sprints.
This document provides an overview of big data concepts and technologies for managers. It discusses problems with relational databases for large, unstructured data and introduces NoSQL databases and Hadoop as solutions. It also summarizes common big data applications, frameworks like MapReduce, Spark, and Flink, and different NoSQL database categories including key-value, column-family, document, and graph stores.
There are patterns for things such as domain-driven design, enterprise architectures, continuous delivery, microservices, and many others. But where are the data science and data engineering patterns? Sometimes, data engineering reminds me of cowboy coding - many workarounds, immature technologies and lack of market best practices.
This document discusses big data analysis and Hadoop. It begins by describing different stages of data analysis and roles of various personnel. It then discusses challenges of analyzing big data using traditional tools and how Hadoop addresses these challenges through its distributed architecture and MapReduce programming model. Several case studies are presented where companies have used Hadoop to perform large-scale data analysis. Key components of Hadoop like MapReduce, Pig, Hive and Mahout are also introduced.
Modern data processing environments resemble factory lines, transforming raw data to valuable data products. The lean principles that have successfully transformed manufacturing are equally applicable to data processing, and are well aligned with the new trend known as DataOps. In this presentation, we will explain how applying lean and DataOps principles can be implemented as technical data processing solutions and processes in order to eliminate waste and improve data innovation speed. We will go through how to eliminate the following types of waste in data processing systems: * Cognitive waste - unclear source of truth, dependency sprawl, duplication, ambiguity. * Operational waste - overhead for deployment, upgrades, and incident recovery. * Delivery waste - friction and delay in development, testing, and deployment. * Product waste - misalignment to business value, detach from use cases, push driven development, vanity quality assurance. We will primarily focus on technical solutions, but some of the waste mentioned requires organisational refactoring to eliminate.
KeyNote #DBInsights" on 7 April. My views on the DBAs fears, doubts and opportunities in the age of DevOps, Cloud, Big Data, Open Source, bi-modal IT, Pizza teams, you name it.
Boston Data Mining Meetup introduction slides from Big Data Infrastructure workshop - A hands-on introduction
The document discusses how open source software has disrupted traditional software development models. It describes how companies have had to adapt from closed, proprietary development to more open and community-based development. Specifically, companies have had to learn to release more control, work within open source communities, adopt agile development practices, and deal with the many unknowns of integrating diverse open source projects. The document suggests open source will continue pushing technological innovation and forcing the industry to evolve.
Our pitch at Data-Driven NYC meetup on September 17th (http://datadrivennyc.com). Speaking about Data Scientists pains and how Dataiku Data Science Studio can help them to more than Data Cleaners and Data Leak Fixers !
Lean Analytics is a set of rules to make data science more streamlined and productive. It touches on many aspects of what a data scientist should be and how a data science project should be defined to be successful. During this presentation Richard will present where data science projects go wrong, how you should think of data science projects, what constitutes success in data science and how you can measure progress. This session will be loaded with terms, stories and descriptions of project successes and failures. If you're wondering whether you're getting value out of data science, how to get more value out of it and even whether you need it then this talk is for you! What you will take away from this session Learn how to make your data science projects successful Evaluate how to track progress and report on the efficacy of data science solutions Understand the role of engineering and data scientists Understand your options for processes and software
Machine learning applications are typically stitched together from hopes and dreams, shell scripts, cron jobs, home-grown schedulers, snippets of configuration clipped from multiple blog posts, thousands of hard-coded business rules, a.k.a. "our SQL corpus," and a few lines of training and testing code. Organizing all the moving parts into something maintainable and supportive of ongoing development is a challenge most teams have on their TODO list, roadmap, or tech debt pile. Getting ahead of the day-to-day demands and settling into a sane architecture often seems like an unattainable goal. The past several years have seen an explosion of tool-building in the data engineering and analytics area, including in Apache projects spanning the areas of search and information retrieval, job orchestration, file and stream formats, and machine learning libraries. In this talk we will cover our product and development teams' choices of architecture and tools, from data ingestion and storage, through transformations and processing, to presentation of results and publishing to web services, reports, and applications.
The Briefing Room with Dr. Robin Bloor and WhereScape Live Webcast on September 30, 2014 Watch the archive: https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=bfff40f7c9645fc398770ea11152b148 The fueling of information systems will always require some effort, but a confluence of innovations is fundamentally changing how quickly and accurately it can be done. Gone are long cycle times for development. Today, organizations can embrace a more rapid and collaborative approach for building analytical applications and data warehouses. The key is to have business experts working hand-in-hand with data professionals as the solutions take shape, thus expediting the speed to valuable insights. Register for this episode of The Briefing Room to hear veteran Analyst Dr. Robin Bloor as he explains the changing nature of information design. He’ll be briefed by WhereScape President Mark Budzinski, who will discuss his company’s data warehouse automation solutions and how they enable collaborative development. He will share use cases that illustrate show aligning business and IT, organizations can enable faster and more agile data warehouse development. Visit InsideAnlaysis.com for more information.
This document provides an overview of getting started with data science using Python. It discusses what data science is, why it is in high demand, and the typical skills and backgrounds of data scientists. It then covers popular Python libraries for data science like NumPy, Pandas, Scikit-Learn, TensorFlow, and Keras. Common data science steps are outlined including data gathering, preparation, exploration, model building, validation, and deployment. Example applications and case studies are discussed along with resources for learning including podcasts, websites, communities, books, and TV shows.
The Fourth International Workshop on RESTful Design, WS-REST 2013 REST in Brazil - Industry Keynote On learning REST, and its impact on the design of massive applications in Brazil
The Briefing Room with Dr. Robin Bloor and Cirro Live Webcast on February 11, 2014 Watch the archive: https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=0ec1fa381886313cc06d841015c65898 As information ecosystems continue to expand, businesses are searching for ways to combine traditional analytics with a new source of insight: Big Data. But with data flooding in from all kinds of sources, fast access and performance at scale can easily become an issue. One effective approach for solving this challenge is data federation, a method that involves taking the analytical processing to the data, allowing streamlined access to multiple data sources without the expensive ETL overhead or building of semantic layers. Register for this episode of The Briefing Room to hear veteran Analyst Dr. Robin Bloor as he explains how the prevalence of distributed data calls for a new approach to Big Data. He will be briefed by Mark Theissen of Cirro, who will tout his company’s Data Hub, a data federation solution that provides a single point of access to all enterprise data assets without excessive data movements, preprocessing or staging. He will discuss how data federation differs from virtualization and ETL approaches, and demonstrate how a Cirro deployment solves the analytics challenge of integrating data silos across the data center – and the cloud – using the BI tools you already have on your desktop for real-time distributed analytics. Visit InsideAnlaysis.com for more information.
At my first visit to SciPy in Latin America, I was able to review the history of PyData, SciPy, and NumFOCUS, and discuss how to grow its communities and cooperate in the future. I also introduce OpenTeams as a way for open-source contributors to grow their reputation and build businesses.
In this talk, WeCloudData introduces the Hadoop/Spark ecosystem and how businesses use big data tools and platforms. For more detail about WeCloudData's big data for data scientist course please visit: https://weclouddata.com/data-science/
This document provides an overview of Drupal, including its history, community, features, and Drupal 8 updates. Some key points: - Drupal is an open source content management system with a large, active community that has grown significantly since 2005. - Drupal 8 is a major update that focuses on modernizing the codebase, improving performance, and adding new features like a responsive image module and improved multilingual support. Over 1,600 people contributed to Drupal 8. - The Drupal community extends beyond code contributions - there are many ways for individuals and organizations of all skill levels to get involved through documentation, support, events, and more. Contributing back helps both Drupal and contributors
Sentiment analysis uses natural language processing to identify opinions in text as positive, negative, or neutral. Analyzing Twitter data through sentiment analysis can provide insight into public opinions on various topics. The presentation described how sentiment analysis of Twitter data on road traffic could work, using Azure cognitive services and Logic Apps for processing without code. A demo then showed these Azure services in action for sentiment analysis.