The document describes how to build a data science team and systems. It discusses establishing data collection and management systems, developing metrics and dashboards to analyze business data, creating predictive models using machine learning algorithms, and providing data science services like information retrieval to internal customers. The goal is to move from static, uncollected data to a fully realized big data platform and data science team that supports business analytics and decision making.
The current uptrend in faster computational power has led to a more mature eco-system for image processing and video analytics. By using deep neural networks for image recognition and object detection we can achieve better than human accuracies. Industrial sectors led by retail and finance want to take advantage of these latest developments in real-time analysis of video content for fraud detection, surveillance and many other applications. There are a couple of challenges involved in the real word implementation of a video analytics solution: 1) Most video analytics use-cases are effective only when response times are in milliseconds. Requirement of performing at very low latencies gives rise to a need for software and hardware acceleration 2) Such solutions will need wide-spread deployment and are expected to have low TCO. To address these two key challenges we propose a video analytics solution leveraging Spark Structured Streaming + DL framework (like Intel’s Analytics-Zoo & Tensorflow) built on a heterogenous CPU + FPGA hardware platform. The proposed solution provides >3x acceleration in performance to a video analytics pipeline when compared to a CPU only implementation while requiring zero code change on the application side as well as achieving more than 2x decrease in TCO. Our video analytics pipeline includes ingestion of video stream + H.264 decode to image frames + image transformation + image inferencing, that uses a deep neural network. FPGA based solution offloads the entire pipeline computation to the FPGA while CPU only solution implements the pipeline using OpenCV + Spark Structured Streaming + Intel’s Analytics-Zoo DL library. Key Take aways: 1. Optimizing performance of Spark Streaming + DL pipeline 2. Acceleration of video analytics pipeline using FPGA to deliver high throughput at low latency and reduced TCO. 3. Performance data for benchmarking CPU and CPU + FPGA based solution.
1) NVIDIA-Iguazio Accelerated Solutions for Deep Learning and Machine Learning (30 mins): About the speaker: Dr. Gabriel Noaje, Senior Solutions Architect, NVIDIA http://bit.ly/GabrielNoaje 2) GPUs in Data Science Pipelines ( 30 mins) - GPU as a Service for enterprise AI - A short demo on the usage of GPUs for model training and model inferencing within a data science workflow About the speaker: Anant Gandhi, Solutions Engineer, Iguazio Singapore. https://www.linkedin.com/in/anant-gandhi-b5447614/
This document discusses democratizing AI using Apache Spark. It summarizes that while AI is advancing rapidly, it has not been fully democratized due to challenges with data management, developing productive teams, and establishing production-ready applications. Databricks aims to close these gaps with its just-in-time data platform that provides integrated workspaces, automated Spark management, and supports deep learning use cases across industries.
This document discusses Internet of Things (IOT) and big data. It outlines key topics including why big data is important, reference architectures for IOT, relevant technologies, and use cases. Example use cases described are applying real-time analytics to traffic data to predict congestion and sensor data from a football game to analyze player statistics and activity in real-time. The document also discusses technologies involved in complex event processing and architectures for IOT systems that integrate data collection and analytics.
At H2O.ai we see a world where all software will incorporate AI, and we’re focused on bringing AI to business through software. H2O.ai is the maker behind H2O, the leading open source machine and deep learning platform for smarter applications and data products. H2O operationalizes data science by developing and deploying algorithms and models for R, Python and the Sparkling Water API for Spark. In this webinar, you will learn about the scalable H2O core platform and the distributed algorithms it supports. H2O integrates seamlessly with the R and the Python environments. We will show you how to leverage the power of H2O algorithms in R, Python and H2O Flow interface. Come with an open mind and some high level knowledge of machine learning, and you will take away a stream of knowledge for your next ML/DL project. Amy Wang is a math hacker at H2O, as well as the Sales Engineering Lead. She graduated from Hunter College in NYC with a Masters in Applied Mathematics and Statistics with a heavy concentration on numerical analysis and financial mathematics. Her interest in applicable math eventually lead her to big data and finding the appropriate mediums for data analysis. Desmond is a Senior Director of Marketing at H2O.ai. In his 15+ years of career in Enterprise Software, Desmond worked in Distributed Systems, Storage, Virtualization, MPP databases, Streaming Analytics Platform, and most recently Machine Learning. He obtained his Master’s degree in Computer Science from Stanford University and MBA degree from UC Berkeley, Haas School of Business.
Twitter generates billions and billions of events per day. Analyzing these events in real time presents a massive challenge. Twitter designed and deployed a new streaming system called Heron. Heron has been in production nearly 2 years and is widely used by several teams for diverse use cases. This talk looks at Twitter's operating experiences and challenges of running Heron at scale and the approaches taken to solve those challenges.
The document summarizes a meetup on data streaming and machine learning with Google Cloud Platform. The meetup consisted of two presentations: 1. The first presentation discussed using Apache Beam and Google Cloud Dataflow to parallelize machine learning training for hyperparameter optimization. It showed how Dataflow reduced training time from 12 hours to under 30 minutes. 2. The second presentation demonstrated building a streaming Twitter sentiment analysis pipeline with Dataflow. It covered streaming patterns, batch vs streaming considerations, and a demo that ingested tweets from PubSub, analyzed sentiment with NLP, and loaded results to BigQuery.
Predicting failure in power networks, detecting fraudulent activities in payment card transactions, and identifying next logical products targeted at the right customer at the right time all require machine learning around massive data sets. This form of artificial intelligence requires complex self-learning algorithms, rapid data iteration for advanced analytics and a robust big data architecture that’s up to the task. Learn how you can quickly exploit your existing IT infrastructure and scale operations in line with your budget to enjoy advanced data modeling, without having to invest in a large data science team.
These slides will show how to approach a multi-class (classification) problem using H2O. The data that is being used is an aggregated log of multiple systems that are constantly providing information about their status, connections and traffic. In large organizations, these log datasets can be very huge and unidentifiable due to the number of sources, legacy systems etc. In our example, we use a created response for each source. The use H2O to classify the source of data. Author Bio: Ashrith Barthur is a Security Scientist at H2O currently working on algorithms that detect anomalous behaviour in user activities, network traffic, attacks, financial fraud and global money movement. He has a PhD from Purdue University in the field of information security, specialized in Anomalous behaviour in DNS protocol. Don’t forget to download H2O! http://www.h2o.ai/download/
"Democratizing Big Data", Ami Gal, CEO & Co-Founder of SQream Technologies Watch more from Data Natives Tel Aviv 2016 here: http://bit.ly/2hw1MY0 Visit the conference website to learn more: http://telaviv.datanatives.io/ Follow Data Natives: https://www.facebook.com/DataNatives https://twitter.com/DataNativesConf Stay Connected to Data Natives by Email: Subscribe to our newsletter to get the news first about Data Natives 2017: http://bit.ly/1WMJAqS About the Author: Ami Gal is the Co-Founder and CEO at SqreamTechnologies where he is producing a very fast SQL Big Database at SQream Technologies, crunching all the way from a few Terabytes to Petabytes with high performance. He is a hands-on entepreneur, a mentor at Seedcamp and SmartCamp Mentor at IBM.
Big data visualization frameworks and applications at Kitware Marcus Hanwell, Technical Leader at Kitware, Inc. March 27th 2014 Kitware develops permissively licensed open source frameworks and applications for scientific data applications, and related areas. Some of the frameworks developed by our High Performance Computing and Visualization group address current challenges in big data visualization and analysis in a number of application domains including geospatial visualization, social media, finance, chemistry, biological (phylogenetics), and climate. The frameworks used to develop solutions in these areas will be described, along with the applications and the nature of the underlying data. These solutions focus on shared frameworks providing data storage, indexing, retrieval, client-server delivery models, server-side serial and parallel data reduction, analysis, and diagnostics. Additionally, they provide mechanisms that enable server-side or client-side rendering based on the capabilities and configuration of the system. Big Data Visualization Meetup - South Bay http://www.meetup.com/Big-Data-Visualisation-South-Bay/
Learn about the Challenge of Big Data and how Hadoop in the Cloud, a flexible infrastructure for Big Data, is changing everything!
Talk I gave at StratHadoop in Barcelona on November 21, 2014. In this talk I discuss the experience we made with realtime analysis on high volume event data streams.
This document provides an overview of Microsoft's Cognitive Toolkit (CNTK), a deep learning framework. It discusses key aspects of CNTK including its components, how to get started, and its capabilities. CNTK provides tools for common deep learning tasks like computer vision, natural language processing, and time series prediction. It also supports distributed training on multiple GPUs and machines and has APIs in Python, C++, and C#.
Big Data Visualization Kwan-Liu Ma Professor of Computer Science and Chair of the Graduate Group in Computer Science (GGCS) at the University of California-Davis January 22nd 2014 We are entering a data-rich era. Advanced computing, imaging, and sensing technologies enable scientists to study natural and physical phenomena at unprecedented precision, resulting in an explosive growth of data. The size of the collected information about the Web and mobile device users is expected to be even greater. To make sense and maximize utilization of such vast amounts of data for knowledge discovery and decision making, we need a new set of tools beyond conventional data mining and statistical analysis. One such a tool is visualization. I will present visualizations designed for gleaning insight from massive data and guiding complex data analysis tasks. I will show case studies using data from cyber/homeland security, large-scale scientific simulations, medicine, and sociological studies. Big Data Visualization Meetup - South Bay http://www.meetup.com/Big-Data-Visualisation-South-Bay/
Big Trends in Big Data, New Apache Hadoop enhancements, New Sql tools and Big Data Real Time computation systems
As advanced sensor technologies are becoming widely deployed in the energy industry, the availability of higher-frequency data results in both analytical benefits and computational costs. To an energy forecaster or data scientist, some of these benefits might include enhanced predictive performance from forecasting models as well as improved pattern recognition in energy consumption across building types, economic sectors, and geographies. To a utility or electricity service provider, these benefits might include significantly deeper insights into their diverse customer base. However, these advantages can come with a high computational price tag. With Spark 2.0, User-Defined Functions can be applied across grouped SparkDataFrames in the SparkR API to solve the multivariate optimization and model selection problems typically required for fitting site-level models. This recently added feature of Spark 2.0 on Databricks has allowed DNV GL to efficiently fit predictive models that relate weather, electricity, water, and gas consumption across virtually any number of buildings.
DEVIEW DAY1. jpa와 모던 자바 데이터 저장 기술
REEF is a meta-framework for big data analytics that eases development atop resource managers like YARN and Mesos. It provides a reusable control plane for coordinating data processing tasks and an adaptation layer for different resource managers. REEF decouples applications from cluster resources and handles common control plane functions like fault tolerance and configuration management. The framework is implemented in Java and C# and supports local, YARN, Mesos, and HDInsight execution environments. Future work includes graduating REEF from the Apache Incubator and using it to build new data processing frameworks and systems.
DEVIEW DAY1. 영상 인식을 통한 오프라인 고객분석 솔루션과 딥러닝
This document summarizes lessons learned from developing the Realm Android library. It discusses challenges such as setting up an Android library project, API design, testing, distribution methods, and issues like annotation processing, bytecode weaving, and native code support. Key points covered are how to start a library project, the importance of testing libraries extensively, and distribution options like Bintray.
This document summarizes a presentation about Packetbeat and monitoring distributed systems. It discusses how Packetbeat passively captures network packets, decodes protocols, and matches requests and responses to create JSON objects. It then sends this data to Elasticsearch for analysis. Aggregations like histograms, percentiles, and moving averages are used to analyze latency, identify slow methods, and detect anomalies in metrics over time. Other Beats like Topbeat, Filebeat, and Metricsbeat are also briefly introduced.
DEVIEW 2015 DAY1. 브라우저는 vsync를 어떻게 활용하고 있을까
DEVIEW2015 DAY1. 웹브라우저 감옥에서 살아남기
DRC-HUBO is Rainbow Robotics' humanoid robot that competed in the DRC Finals. It uses a modular, lightweight exoskeletal design with effective cooling and power systems. PODO-RT is the real-time framework that controls DRC-HUBO. It uses a distributed architecture with independent processes communicating over shared memory for high-speed control. DRC-HUBO demonstrated a variety of autonomous tasks at the DRC Finals, including driving, opening doors, using tools, and traversing rough terrain.
MIT researchers have developed highly efficient quadruped robots like the Cheetah that can run at speeds up to 6m/s. The Cheetah uses a proprioceptive actuation system with high torque density motors to achieve high force control bandwidth over 120Hz. Its parallelized control system with multicore CPUs and FPGAs allows control frequencies up to 4kHz. Design principles for efficient legged locomotion include energy regeneration, low transmission impedance, and low leg inertia. The researchers are continuing their work with robots like Cheetah 2 and Hermes.
DEVIEW 2015 DAY1. 네이버효과툰어떻게만들어졌나 - 김효님, 이현철님
DEVIEW2015 DAY1.데이터 센터의 오픈 소스 open compute project (ocp)
DEVIEW DAY1. mobile앱에서 효율적인 storage 접근 방법
Delivered at PSG College of Technology, Mar 24, 2018 Github - https://github.com/raghu-icecraft/tech-talks/tree/master/Tableau/Mar_18 Basics of BI, Data Visualization. Tableau Features and integration with R. Discussed about Tableau Public and Tableau Desktop. Additions Compared to ICCTAC 2018 session :- Some more emphasis added related to Data Science. Added slides related to Bi and Data science Gartner Magic Quadrant of year 2018. A slide dedicated to foremost Principles of Data Visualization; a note Edward Tufte and Gestalt laws. Audience are MSc Data Science students along with other Teaching Staff. Workshop happened in PSG College of Technology, Coimbatore (Department of MCA).
These are the slides from my talk at Data Day Texas 2016 (#ddtx16). The world of data warehousing has changed! With the advent of Big Data, Streaming Data, IoT, and The Cloud, what is a modern data management professional to do? It may seem to be a very different world with different concepts, terms, and techniques. Or is it? Lots of people still talk about having a data warehouse or several data marts across their organization. But what does that really mean today in 2016? How about the Corporate Information Factory (CIF), the Data Vault, an Operational Data Store (ODS), or just star schemas? Where do they fit now (or do they)? And now we have the Extended Data Warehouse (XDW) as well. How do all these things help us bring value and data-based decisions to our organizations? Where do Big Data and the Cloud fit? Is there a coherent architecture we can define? This talk will endeavor to cut through the hype and the buzzword bingo to help you figure out what part of this is helpful. I will discuss what I have seen in the real world (working and not working!) and a bit of where I think we are going and need to go in 2016 and beyond.