At the StampedeCon 2015 Big Data Conference: The starting point for this project was a MapReduce application that processed log files produced by the support portal. This application was running on Hadoop with Ruby Wukong. At the time of the project start it was underperforming and did not show good scalability. This made the case for redesigning it using Spark with Scala and Java.
Initial review of the Ruby code revealed that it was using disk IO excessively, in order to communicate between MapReduce jobs. Each job was implemented as a separate script passing large data volumes through. Spark is more efficient in managing intermediate data passed between MapReduce jobs – not only it keeps it in memory whenever possible, it often eliminates the need for intermediate data at all. However, that alone not brought us much improvement since there were additional bottlenecks at data aggregation stages.
The application involved a global data ordering step, followed by several localized aggregation steps. This first global sort required significant data shuffle that was inefficient. Spark allowed us to partition the data and convert a single global sort into many local sorts, each running on a single node and not exchanging any data with other nodes. As a result, several data processing steps started to fit into node memory, which brought about a tenfold performance improvement.
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
Google Cloud Dataflow is a fully managed service that allows users to build batch or streaming parallel data processing pipelines. It provides a unified programming model for batch and streaming workflows. Cloud Dataflow handles resource management and optimization to efficiently execute data processing jobs on Google Cloud Platform.
In my talk I will discuss and show examples of using Apache Hadoop, Apache Hive, Apache MXNet, Apache OpenNLP, Apache NiFi and Apache Spark for deep learning applications. This is the follow up to last years Apache Deep Learning 101 that was done at Dataworks Summit and ApacheCon.
As part of my talk I will walk through using Apache NXNet Pre-Built Models, MXNet's New Model Server with Apache NiFi, executing MXNet with Apache NiFi and running Apache MXNet on edge nodes utilizing Python and Apache MiniFi.
This talk is geared towards Data Engineers interested in the basics of Deep Learning with open source Apache tools in a Big Data environment. I will walk through source code examples available in github and run the code live on an Apache Hadoop / YARN / Apache Spark cluster.
This will be an introduction to executing Deep Learning Pipelines in an Apache Big Data environment.
My talk at Data Works Summit Sydney was listed in top 7 -> https://hortonworks.com/blog/7-sessions-dataworks-summit-sydney-see/
Also have speak at and run Future of Data Princeton and at Oracle Code NYC.
https://www.slideshare.net/oom65/hadoop-security-architecture?next_slideshow=1
https://community.hortonworks.com/articles/83100/deep-learning-iot-workflows-with-raspberry-pi-mqtt.html
https://community.hortonworks.com/articles/146704/edge-analytics-with-nvidia-jetson-tx1-running-apac.html
https://dzone.com/refcardz/introduction-to-tensorflow
Apache Spark is a fast, general-purpose, and easy-to-use cluster computing system for large-scale data processing. It provides APIs in Scala, Java, Python, and R. Spark is versatile and can run on YARN/HDFS, standalone, or Mesos. It leverages in-memory computing to be faster than Hadoop MapReduce. Resilient Distributed Datasets (RDDs) are Spark's abstraction for distributed data. RDDs support transformations like map and filter, which are lazily evaluated, and actions like count and collect, which trigger computation. Caching RDDs in memory improves performance of subsequent jobs on the same data.
YARN Containerized Services: Fading The Lines Between On-Prem And Cloud
Apache Hadoop YARN is the modern distributed operating system for big data applications. In Apache Hadoop 3.1.0, YARN added a service framework that supports long-running services. This new capability goes hand in hand with the recent improvements in YARN to support Docker containers. Together these features have made it significantly easier to bring new applications and services to YARN.
In this talk you will learn about YARN service framework, its new containerization capabilities and how it lays the foundation for a hybrid and uniform architecture for compute and storage across on-prem and multi-cloud environments. This will include examples highlighting how easy it is to bring applications to the YARN service framework as well as how to containerize applications.
Here's what to expect in this talk:
- Motivation for YARN service framework and containerization
- YARN service framework overview
- YARN service examples
- Containerization overview
- Containerization for Big Data and non Big Data workloads - wait that's everything
The document discusses Paytm Labs' transition from batch data ingestion to real-time data ingestion using Apache Kafka and Confluent. It outlines their current batch-driven pipeline and some of its limitations. Their new approach, called DFAI (Direct-From-App-Ingest), will have applications directly write data to Kafka using provided SDKs. This data will then be streamed and aggregated in real-time using their Fabrica framework to generate views for different use cases. The benefits of real-time ingestion include having fresher data available and a more flexible schema.
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Telecom operators need to find operational anomalies in their networks very quickly. This need, however, is shared with many other industries as well so there are lessons for all of us here. Spark plus a streaming architecture can solve these problems very nicely. I will present both a practical architecture as well as design patterns and some detailed algorithms for detecting anomalies in event streams. These algorithms are simple but quite general and can be applied across a wide variety of situations.
Boost Performance with Scala – Learn From Those Who’ve Done It!
Scalding is a scala DSL for Cascading. Run on Hadoop, it’s a concise, functional, and very efficient way to build big data applications. One significant benefit of Scalding is that it allows easy porting of Scalding apps from MapReduce to newer, faster execution fabrics.
In this webinar, Cyrille Chépélov, of Transparency Rights Management, will share how his organization boosted the performance of their Scalding apps by over 50% by moving away from MapReduce to Cascading 3.0 on Apache Tez. Dhruv Kumar, Hortonworks Partner Solution Engineer, will then explain how you can interact with data on HDP using Scala and leverage Scala as a programming language to develop Big Data applications.
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
Hadoop is becoming a standard platform for building critical financial applications such as risk reporting, trading and fraud detection. These applications require high level of SLAs (service-level agreement) in terms of RPO (Recovery Point Objective) and RTO (Recovery Time Objective). To achieve these SLAs, organizations need to build a disaster recovery plan that cover several layers ranging from the infrastructure to the clients going through the platform and the applications. In this talk, we will present the different architecture blueprints for disaster recovery as well as their corresponding SLA objectives. Then, we will focus on the stretch cluster solution that Crédit Agricole CIB is using in production. We will discuss the solution’s advantages, drawbacks and the impact of this approach on the global architecture. Finally, we will explain in detail how to configure and deploy this solution and how to integrate each layer (storage layer, processing layer...) into the architecture.
Powering Fast Data and the Hadoop Ecosystem with VoltDB and Hortonworks
Developers increasingly are building dynamic, interactive real-time applications on fast streaming data to extract maximum value from data in the moment. To do so requires a data pipeline, the ability to make transactional decisions against state, and an export functionality that pushes data at high speeds to long-term Hadoop analytics stores like Hortonworks Data Platform (HDP). This enables data to arrive in your analytic store sooner, and allows these analytics to be leveraged with radically lower latency.
But successfully writing fast data applications that manage, process, and export streams of data generated from mobile, smart devices, sensors and social interactions is a big challenge.
Join Hortonworks and VoltDB, an in-memory scale-out relational database that simplifies fast data application development, to learn how you can ingest large volumes of fast-moving, streaming data and process it in real time. We will also cover how developing fast data applications is simplified, faster - and delivers more value when built on a fast in-memory, scale-out SQL database.
The document discusses a presentation on OpenStack Sahara given at a conference in Rome. It begins with introducing the three speakers and their backgrounds. It then provides an agenda for the presentation which includes an introduction to big data, an overview of OpenStack components, and a demonstration of Sahara in action. The presentation discusses what big data is, provides a brief history of MapReduce and Hadoop, and explains how OpenStack is well-suited to host big data platforms through its various components and architecture. It concludes by introducing OpenStack Sahara as a way to simplify deploying and managing Hadoop clusters on OpenStack.
NoSQL Application Development with JSON and MapR-DB
NoSQL databases are being used everywhere by startups and Global 2000 companies alike for data environments that require cost-effective scaling. These environments also typically need to represent data in a more flexible way than is practical with relational databases.
Deep Learning with DL4J on Apache Spark: Yeah it's Cool, but are You Doing it...
DeepLearning4J (DL4J) is a powerful Open Source distributed framework that brings Deep Learning to the JVM (it can serve as a DIY tool for Java, Scala, Clojure and Kotlin programmers). It can be used on distributed GPUs and CPUs. It is integrated with Hadoop and Apache Spark. ND4J is a Open Source, distributed and GPU-enabled library that brings the intuitive scientific computing tools of the Python community to the JVM. Training neural network models using DL4J, ND4J and Spark is a powerful combination, but the overall cluster configuration can present some unespected issues that can compromise performances and nullify the benefits of well written code and good model design. In this talk I will walk through some of those problems and will present some best practices to prevent them. The presented use cases will refer to DL4J and ND4J on different Spark deployment modes (standalone, YARN, Kubernetes). The reference programming language for any code example would be Scala, but no preliminary Scala knowledge is mandatory in order to better understanding the presented topics.
20150314 sahara intro and the future plan for open stack meetup
Sahara is an OpenStack service that allows users to easily provision and manage Hadoop clusters in OpenStack. It currently supports plugins for Hortonworks, Cloudera, and MapR distributions of Hadoop. The Cloudera plugin integrates Cloudera Manager to provision CDH services. Sahara aims to provide analytics as a service and allow data processing directly in OpenStack clusters using technologies like HDFS, Swift, and Hadoop frameworks. Performance overhead compared to bare metal Hadoop clusters is a current limitation being addressed.
The document discusses Cisco's Hadoop as a service offering on their Intercloud platform. Some key points:
- Cisco provides managed Hadoop, including Cloudera's distribution, on optimized instances with local storage and object storage. This offers a scalable, reliable, and secure environment for Hadoop workloads.
- Use cases discussed include predictive maintenance using IoT data and analyzing customer journeys across multiple channels.
- A pilot test showed Cisco's platform could process over 100 million records from production data across various Hadoop jobs.
- Cisco also discusses their data virtualization product CiscoDV, which can integrate data across on-premises, cloud sources on Cisco and AWS.
-
Apache Eagle is a distributed real-time monitoring and alerting engine for Hadoop that was created by eBay and later open sourced as an Apache Incubator project. It provides security for Hadoop systems by instantly identifying access to sensitive data, recognizing attacks/malicious activity, and blocking access in real time through complex policy definitions and stream processing. Eagle was designed to handle the huge volume of metrics and logs generated by large-scale Hadoop deployments through its distributed architecture and use of technologies like Apache Storm and Kafka.
This document provides an overview of exploiting insecure IoT firmware. It begins with an introduction to IoT protocols like CoAP, MQTT, XMPP, and AMQP. It then discusses the OWASP top 10 security risks for IoT, focusing on insecure software/firmware. Common debugging interfaces for firmware like UART, JTAG, SPI, and I2C are explained. Operating systems and compilers used for IoT development are listed. Finally, the document outlines a methodology for exploiting insecure firmware, including getting the firmware, performing reconnaissance, unpacking, localizing points of interest, and then decompiling, compiling, tweaking, fuzzing, or pentesting the firmware. Tools mentioned include binwalk, firmwalk
NUMALLIANCE has successfully integrated new companies over the past 10 years to expand its capabilities in cold forming solutions for wire and tube. It recently merged with its competitor SILFAX, bringing increased tube expertise. While the companies previously competed in automotive and aerospace, they take different approaches that are now complementary within NUMALLIANCE. For example, SILFAX focuses on longer tube production while NUMALLIANCE develops solutions for other parts. The key to a successful merger is identifying the right competitors to combine capabilities while allowing team members to enhance operations rather than causing redundancy.
This document discusses various tools for data visualization, including D3.js, WebGL, the ELK stack, R, Processing, Open Refine, and 3D printing. It provides examples of visualizations created with each tool and suggests when each tool may be best to use. D3.js is described as a low-level library that provides full control but requires more work, while tools like the ELK stack allow for quickly visualizing system and business data. R is presented as useful for exploring and analyzing large datasets, and Open Refine is recommended for cleaning and preparing CSV files for export.
This document provides an overview of heterogeneous persistence and different database management systems (DBMS). It discusses why a single DBMS is often not sufficient and describes different types of DBMS including relational databases, key-value stores, and columnar databases. For each type, it outlines good and bad use cases, examples, considerations, and pros and cons. The document aims to help readers understand the different flavors of DBMS and how to choose the right ones for their specific data and access needs.
USGS Report on the Impact of Marcellus Shale Drilling on Forest Animal Habitats
A report issued March 25, 2013 by the U.S. Geological Survey titled "Landscape Consequences of Natural Gas Extraction in Allegheny and Susquehanna Counties, Pennsylvania, 2004–2010." The report, using a series of maps and data, purports to show that drilling has lead to "carving up" wildlife habitats in some forests.
Bsides Delhi Security Automation for Red and Blue Teams
Suraj Pratap discusses security automation for red and blue teams. He outlines how he automates the server and application lifecycles using open source tools to address challenges around human capacity, tool selection, time, and cost when managing 600+ servers and 10+ applications across cloud infrastructures. Some areas he has automated include infrastructure security using Ansible and CloudFormation, security auditing using Scout2 and Prowler, offensive security tests using OpenVAS and Jenkins, vulnerability management with Dradis and Vulnreport.io, and security information and event monitoring with Alienvault and ELK.
Demystifying Security Analytics: Data, Methods, Use Cases
Many vendors sell “security analytics” tools. Also, some organizations built their own security analytics toolsets and capabilities using Big Data technologies and approaches. How do you find the right approach for your organization and benefit from this analytics boom? How to start your security analytics project and how to mature the capabilities?
(Source: RSA USA 2016-San Francisco)
This session will share large scale architectures from the author's experiences with various companies like Cisco, Symantec, and EMC and compare and contrast the architecture across : Infrastructure Architecture Scaling, Ecommerce integrations and migration approach from legacy into AEM, Digital Marketing Cloud Integrations such as personalization, analytics, and DMP.
The document outlines a pilot leadership program called the Accelerated Leadership Class (ALC) at Peterson Air Force Base. The 7-session program will provide interactive leadership training to 12 junior airmen using experiential learning activities and models. Sessions will focus on developing leadership skills, emotional intelligence, giving and receiving feedback, and completing a group leadership project to benefit the base. The tentative schedule explores different timing options over 6 days, 3 sessions per week for 2 weeks, or twice monthly for 3 months.
Opensource approach to design and deployment of Microservices based VNF
Microservice is gaining increased adoption in the Telco NFV world. It is key to understand the design and deployment methodologies involved in developing Microservice based VNF. This talk provides an opensource practitioner approach to building and deploying a Microservice based VNF and includes the following: - Design patterns, workflow models - Design models for VNF placement, capacity management, scale-in/out and resiliency - Deployment considerations that includes handing of scale and fault tolerant VNF using well known Opensource tools.
About the presenter: Prem Sankar works for Ericsson Opensource Ecosystem team and part of the Opendaylight and OPNFV team in Ericsson. Prem evangelizes SDN and Cloud and has given many sessions and conducted workshops around SDN and ODL. Prem is PTL of ODL COE project and currently driving the Kuberenetes and ODL Integration in Opendaylight community. Prem is a frequent speaker at opensource summits and has presented in Opendaylight, OPNFV and Open networking summits.
If you heard about web-scale or have a requirement to survive under web-scale or you just would like to prepare your application to handle an X effect this topic is for you.
During a presentation you will understand aspects and caveats of performance testing, nuances of performance testing of Java based web applications.
As a practical part you will get a brief overview of existing tools and will get a guide of using Gatling as a tool to make a load for your application.
Gatling is an open source tool for performance loading written in Scala and provides comprehensive DSL for load scenario specification.
Format Wars: from VHS and Beta to Avro and ParquetDataWorks Summit
The document discusses different data storage formats such as text, Avro, Parquet, and their suitability for writing and reading data. It provides examples of how to choose a format based on factors like query needs, data types, and whether schemas need to evolve. The document also demonstrates how Avro can handle schema evolution by adding or changing fields while still reading existing data.
Open Source Lambda Architecture with Hadoop, Kafka, Samza and DruidDataWorks Summit
This document discusses using an open source Lambda architecture with Kafka, Hadoop, Samza, and Druid to handle event data streams. It describes the problem of interactively exploring large volumes of time series data. It outlines how Druid was developed as a fast query layer for Hadoop to enable low-latency queries over aggregated data. The architecture ingests raw data streams in real-time via Kafka and Samza, aggregates the data in Druid, and enables reprocessing via Hadoop for reliability.
There is increased interest in using Kubernetes, the open-source container orchestration system for modern, stateful Big Data analytics workloads. The promised land is a unified platform that can handle cloud native stateless and stateful Big Data applications. However, stateful, multi-service Big Data cluster orchestration brings unique challenges. This session will delve into the technical gaps and considerations for Big Data on Kubernetes.
Containers offer significant value to businesses; including increased developer agility, and the ability to move applications between on-premises servers, cloud instances, and across data centers. Organizations have embarked on this journey to containerization with an emphasis on stateless workloads. Stateless applications are usually microservices or containerized applications that don’t “store” data. Web services (such as front end UIs and simple, content-centric experiences) are often great candidates as stateless applications since HTTP is stateless by nature. There is no dependency on the local container storage for the stateless workload.
Stateful applications, on the other hand, are services that require backing storage and keeping state is critical to running the service. Hadoop, Spark and to lesser extent, noSQL platforms such as Cassandra, MongoDB, Postgres, and mySQL are great examples. They require some form of persistent storage that will survive service restarts...
Speakers
Anant Chintamaneni, VP Products, BlueData
Nanda Vijaydev, Director Solutions, BlueData
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...DataWorks Summit
Google Cloud Dataflow is a fully managed service that allows users to build batch or streaming parallel data processing pipelines. It provides a unified programming model for batch and streaming workflows. Cloud Dataflow handles resource management and optimization to efficiently execute data processing jobs on Google Cloud Platform.
In my talk I will discuss and show examples of using Apache Hadoop, Apache Hive, Apache MXNet, Apache OpenNLP, Apache NiFi and Apache Spark for deep learning applications. This is the follow up to last years Apache Deep Learning 101 that was done at Dataworks Summit and ApacheCon.
As part of my talk I will walk through using Apache NXNet Pre-Built Models, MXNet's New Model Server with Apache NiFi, executing MXNet with Apache NiFi and running Apache MXNet on edge nodes utilizing Python and Apache MiniFi.
This talk is geared towards Data Engineers interested in the basics of Deep Learning with open source Apache tools in a Big Data environment. I will walk through source code examples available in github and run the code live on an Apache Hadoop / YARN / Apache Spark cluster.
This will be an introduction to executing Deep Learning Pipelines in an Apache Big Data environment.
My talk at Data Works Summit Sydney was listed in top 7 -> https://hortonworks.com/blog/7-sessions-dataworks-summit-sydney-see/
Also have speak at and run Future of Data Princeton and at Oracle Code NYC.
https://www.slideshare.net/oom65/hadoop-security-architecture?next_slideshow=1
https://community.hortonworks.com/articles/83100/deep-learning-iot-workflows-with-raspberry-pi-mqtt.html
https://community.hortonworks.com/articles/146704/edge-analytics-with-nvidia-jetson-tx1-running-apac.html
https://dzone.com/refcardz/introduction-to-tensorflow
Apache Spark is a fast, general-purpose, and easy-to-use cluster computing system for large-scale data processing. It provides APIs in Scala, Java, Python, and R. Spark is versatile and can run on YARN/HDFS, standalone, or Mesos. It leverages in-memory computing to be faster than Hadoop MapReduce. Resilient Distributed Datasets (RDDs) are Spark's abstraction for distributed data. RDDs support transformations like map and filter, which are lazily evaluated, and actions like count and collect, which trigger computation. Caching RDDs in memory improves performance of subsequent jobs on the same data.
YARN Containerized Services: Fading The Lines Between On-Prem And CloudDataWorks Summit
Apache Hadoop YARN is the modern distributed operating system for big data applications. In Apache Hadoop 3.1.0, YARN added a service framework that supports long-running services. This new capability goes hand in hand with the recent improvements in YARN to support Docker containers. Together these features have made it significantly easier to bring new applications and services to YARN.
In this talk you will learn about YARN service framework, its new containerization capabilities and how it lays the foundation for a hybrid and uniform architecture for compute and storage across on-prem and multi-cloud environments. This will include examples highlighting how easy it is to bring applications to the YARN service framework as well as how to containerize applications.
Here's what to expect in this talk:
- Motivation for YARN service framework and containerization
- YARN service framework overview
- YARN service examples
- Containerization overview
- Containerization for Big Data and non Big Data workloads - wait that's everything
2015 nov 27_thug_paytm_rt_ingest_brief_finalAdam Muise
The document discusses Paytm Labs' transition from batch data ingestion to real-time data ingestion using Apache Kafka and Confluent. It outlines their current batch-driven pipeline and some of its limitations. Their new approach, called DFAI (Direct-From-App-Ingest), will have applications directly write data to Kafka using provided SDKs. This data will then be streamed and aggregated in real-time using their Fabrica framework to generate views for different use cases. The benefits of real-time ingestion include having fresher data available and a more flexible schema.
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Codemotion
Telecom operators need to find operational anomalies in their networks very quickly. This need, however, is shared with many other industries as well so there are lessons for all of us here. Spark plus a streaming architecture can solve these problems very nicely. I will present both a practical architecture as well as design patterns and some detailed algorithms for detecting anomalies in event streams. These algorithms are simple but quite general and can be applied across a wide variety of situations.
Boost Performance with Scala – Learn From Those Who’ve Done It! Cécile Poyet
Scalding is a scala DSL for Cascading. Run on Hadoop, it’s a concise, functional, and very efficient way to build big data applications. One significant benefit of Scalding is that it allows easy porting of Scalding apps from MapReduce to newer, faster execution fabrics.
In this webinar, Cyrille Chépélov, of Transparency Rights Management, will share how his organization boosted the performance of their Scalding apps by over 50% by moving away from MapReduce to Cascading 3.0 on Apache Tez. Dhruv Kumar, Hortonworks Partner Solution Engineer, will then explain how you can interact with data on HDP using Scala and leverage Scala as a programming language to develop Big Data applications.
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...DataWorks Summit
Hadoop is becoming a standard platform for building critical financial applications such as risk reporting, trading and fraud detection. These applications require high level of SLAs (service-level agreement) in terms of RPO (Recovery Point Objective) and RTO (Recovery Time Objective). To achieve these SLAs, organizations need to build a disaster recovery plan that cover several layers ranging from the infrastructure to the clients going through the platform and the applications. In this talk, we will present the different architecture blueprints for disaster recovery as well as their corresponding SLA objectives. Then, we will focus on the stretch cluster solution that Crédit Agricole CIB is using in production. We will discuss the solution’s advantages, drawbacks and the impact of this approach on the global architecture. Finally, we will explain in detail how to configure and deploy this solution and how to integrate each layer (storage layer, processing layer...) into the architecture.
Powering Fast Data and the Hadoop Ecosystem with VoltDB and HortonworksHortonworks
Developers increasingly are building dynamic, interactive real-time applications on fast streaming data to extract maximum value from data in the moment. To do so requires a data pipeline, the ability to make transactional decisions against state, and an export functionality that pushes data at high speeds to long-term Hadoop analytics stores like Hortonworks Data Platform (HDP). This enables data to arrive in your analytic store sooner, and allows these analytics to be leveraged with radically lower latency.
But successfully writing fast data applications that manage, process, and export streams of data generated from mobile, smart devices, sensors and social interactions is a big challenge.
Join Hortonworks and VoltDB, an in-memory scale-out relational database that simplifies fast data application development, to learn how you can ingest large volumes of fast-moving, streaming data and process it in real time. We will also cover how developing fast data applications is simplified, faster - and delivers more value when built on a fast in-memory, scale-out SQL database.
The document discusses a presentation on OpenStack Sahara given at a conference in Rome. It begins with introducing the three speakers and their backgrounds. It then provides an agenda for the presentation which includes an introduction to big data, an overview of OpenStack components, and a demonstration of Sahara in action. The presentation discusses what big data is, provides a brief history of MapReduce and Hadoop, and explains how OpenStack is well-suited to host big data platforms through its various components and architecture. It concludes by introducing OpenStack Sahara as a way to simplify deploying and managing Hadoop clusters on OpenStack.
NoSQL Application Development with JSON and MapR-DBMapR Technologies
NoSQL databases are being used everywhere by startups and Global 2000 companies alike for data environments that require cost-effective scaling. These environments also typically need to represent data in a more flexible way than is practical with relational databases.
Deep Learning with DL4J on Apache Spark: Yeah it's Cool, but are You Doing it...DataWorks Summit
DeepLearning4J (DL4J) is a powerful Open Source distributed framework that brings Deep Learning to the JVM (it can serve as a DIY tool for Java, Scala, Clojure and Kotlin programmers). It can be used on distributed GPUs and CPUs. It is integrated with Hadoop and Apache Spark. ND4J is a Open Source, distributed and GPU-enabled library that brings the intuitive scientific computing tools of the Python community to the JVM. Training neural network models using DL4J, ND4J and Spark is a powerful combination, but the overall cluster configuration can present some unespected issues that can compromise performances and nullify the benefits of well written code and good model design. In this talk I will walk through some of those problems and will present some best practices to prevent them. The presented use cases will refer to DL4J and ND4J on different Spark deployment modes (standalone, YARN, Kubernetes). The reference programming language for any code example would be Scala, but no preliminary Scala knowledge is mandatory in order to better understanding the presented topics.
20150314 sahara intro and the future plan for open stack meetupWei Ting Chen
Sahara is an OpenStack service that allows users to easily provision and manage Hadoop clusters in OpenStack. It currently supports plugins for Hortonworks, Cloudera, and MapR distributions of Hadoop. The Cloudera plugin integrates Cloudera Manager to provision CDH services. Sahara aims to provide analytics as a service and allow data processing directly in OpenStack clusters using technologies like HDFS, Swift, and Hadoop frameworks. Performance overhead compared to bare metal Hadoop clusters is a current limitation being addressed.
The document discusses Cisco's Hadoop as a service offering on their Intercloud platform. Some key points:
- Cisco provides managed Hadoop, including Cloudera's distribution, on optimized instances with local storage and object storage. This offers a scalable, reliable, and secure environment for Hadoop workloads.
- Use cases discussed include predictive maintenance using IoT data and analyzing customer journeys across multiple channels.
- A pilot test showed Cisco's platform could process over 100 million records from production data across various Hadoop jobs.
- Cisco also discusses their data virtualization product CiscoDV, which can integrate data across on-premises, cloud sources on Cisco and AWS.
-
Apache Eagle is a distributed real-time monitoring and alerting engine for Hadoop that was created by eBay and later open sourced as an Apache Incubator project. It provides security for Hadoop systems by instantly identifying access to sensitive data, recognizing attacks/malicious activity, and blocking access in real time through complex policy definitions and stream processing. Eagle was designed to handle the huge volume of metrics and logs generated by large-scale Hadoop deployments through its distributed architecture and use of technologies like Apache Storm and Kafka.
This document provides an overview of exploiting insecure IoT firmware. It begins with an introduction to IoT protocols like CoAP, MQTT, XMPP, and AMQP. It then discusses the OWASP top 10 security risks for IoT, focusing on insecure software/firmware. Common debugging interfaces for firmware like UART, JTAG, SPI, and I2C are explained. Operating systems and compilers used for IoT development are listed. Finally, the document outlines a methodology for exploiting insecure firmware, including getting the firmware, performing reconnaissance, unpacking, localizing points of interest, and then decompiling, compiling, tweaking, fuzzing, or pentesting the firmware. Tools mentioned include binwalk, firmwalk
NUMALLIANCE has successfully integrated new companies over the past 10 years to expand its capabilities in cold forming solutions for wire and tube. It recently merged with its competitor SILFAX, bringing increased tube expertise. While the companies previously competed in automotive and aerospace, they take different approaches that are now complementary within NUMALLIANCE. For example, SILFAX focuses on longer tube production while NUMALLIANCE develops solutions for other parts. The key to a successful merger is identifying the right competitors to combine capabilities while allowing team members to enhance operations rather than causing redundancy.
This document discusses various tools for data visualization, including D3.js, WebGL, the ELK stack, R, Processing, Open Refine, and 3D printing. It provides examples of visualizations created with each tool and suggests when each tool may be best to use. D3.js is described as a low-level library that provides full control but requires more work, while tools like the ELK stack allow for quickly visualizing system and business data. R is presented as useful for exploring and analyzing large datasets, and Open Refine is recommended for cleaning and preparing CSV files for export.
This document provides an overview of heterogeneous persistence and different database management systems (DBMS). It discusses why a single DBMS is often not sufficient and describes different types of DBMS including relational databases, key-value stores, and columnar databases. For each type, it outlines good and bad use cases, examples, considerations, and pros and cons. The document aims to help readers understand the different flavors of DBMS and how to choose the right ones for their specific data and access needs.
USGS Report on the Impact of Marcellus Shale Drilling on Forest Animal HabitatsMarcellus Drilling News
A report issued March 25, 2013 by the U.S. Geological Survey titled "Landscape Consequences of Natural Gas Extraction in Allegheny and Susquehanna Counties, Pennsylvania, 2004–2010." The report, using a series of maps and data, purports to show that drilling has lead to "carving up" wildlife habitats in some forests.
Bsides Delhi Security Automation for Red and Blue TeamsSuraj Pratap
Suraj Pratap discusses security automation for red and blue teams. He outlines how he automates the server and application lifecycles using open source tools to address challenges around human capacity, tool selection, time, and cost when managing 600+ servers and 10+ applications across cloud infrastructures. Some areas he has automated include infrastructure security using Ansible and CloudFormation, security auditing using Scout2 and Prowler, offensive security tests using OpenVAS and Jenkins, vulnerability management with Dradis and Vulnreport.io, and security information and event monitoring with Alienvault and ELK.
Demystifying Security Analytics: Data, Methods, Use CasesPriyanka Aash
Many vendors sell “security analytics” tools. Also, some organizations built their own security analytics toolsets and capabilities using Big Data technologies and approaches. How do you find the right approach for your organization and benefit from this analytics boom? How to start your security analytics project and how to mature the capabilities?
(Source: RSA USA 2016-San Francisco)
This session will share large scale architectures from the author's experiences with various companies like Cisco, Symantec, and EMC and compare and contrast the architecture across : Infrastructure Architecture Scaling, Ecommerce integrations and migration approach from legacy into AEM, Digital Marketing Cloud Integrations such as personalization, analytics, and DMP.
The document outlines a pilot leadership program called the Accelerated Leadership Class (ALC) at Peterson Air Force Base. The 7-session program will provide interactive leadership training to 12 junior airmen using experiential learning activities and models. Sessions will focus on developing leadership skills, emotional intelligence, giving and receiving feedback, and completing a group leadership project to benefit the base. The tentative schedule explores different timing options over 6 days, 3 sessions per week for 2 weeks, or twice monthly for 3 months.
Opensource approach to design and deployment of Microservices based VNFMichelle Holley
Microservice is gaining increased adoption in the Telco NFV world. It is key to understand the design and deployment methodologies involved in developing Microservice based VNF. This talk provides an opensource practitioner approach to building and deploying a Microservice based VNF and includes the following: - Design patterns, workflow models - Design models for VNF placement, capacity management, scale-in/out and resiliency - Deployment considerations that includes handing of scale and fault tolerant VNF using well known Opensource tools.
About the presenter: Prem Sankar works for Ericsson Opensource Ecosystem team and part of the Opendaylight and OPNFV team in Ericsson. Prem evangelizes SDN and Cloud and has given many sessions and conducted workshops around SDN and ODL. Prem is PTL of ODL COE project and currently driving the Kuberenetes and ODL Integration in Opendaylight community. Prem is a frequent speaker at opensource summits and has presented in Opendaylight, OPNFV and Open networking summits.
If you heard about web-scale or have a requirement to survive under web-scale or you just would like to prepare your application to handle an X effect this topic is for you.
During a presentation you will understand aspects and caveats of performance testing, nuances of performance testing of Java based web applications.
As a practical part you will get a brief overview of existing tools and will get a guide of using Gatling as a tool to make a load for your application.
Gatling is an open source tool for performance loading written in Scala and provides comprehensive DSL for load scenario specification.
Big Data Europe: Simplifying Development and Deployment of Big Data ApplicationsBigData_Europe
Presentation at MSD IT Global Innovation Center in Prague, Czech Republic. Covers the technical outcomes of horizon2020 BigDataEurope project and provides and example of a component integration into the BDI platform.
Jilles has experience using Docker at Inbot to improve the separation between development and operations work. Some key points:
- Docker helps address the problem of standardized software packaging and runtime configuration, separating provisioning responsibilities for developers and operators.
- At Inbot, Docker was adopted in 2014 and helped eliminate Puppet and move infrastructure to AWS. It simplified software dependencies and improved deployment speed.
- Dockerfiles provide a clear documentation of what is needed to run software, replacing complex configuration scripts and reducing operator workload.
WhiteHedge provides DevOps as a service. We offer devops consultation, implementation and training services. You can contact us at devops@whitehedge.com
Working with big volumes of data is a complicated task, but it's even harder if you have to do everything in real time and try to figure it all out yourself. This session will use practical examples to discuss architectural best practices and lessons learned when solving real-time social media analytics, sentiment analysis, and data visualization decision-making problems with AWS. Learn how you can leverage AWS services like Amazon RDS, AWS CloudFormation, Auto Scaling, Amazon S3, Amazon Glacier, and Amazon Elastic MapReduce to perform highly performant, reliable, real-time big data analytics while saving time, effort, and money. Gain insight from two years of real-time analytics successes and failures so you don't have to go down this path on your own.
My incident Response from Techfair 2016 in Jersey. The talk explores how incident response could to comply with the requirements set out in the Jersey Financial Services Commission Dear CEO letter on cyber security.
How Docker EE is Finnish Railway’s Ticket to App ModernizationDocker, Inc.
VR Group-Finnish Railways is responsible for 118 million passenger rides and moving 41 million tons of cargo a year and is seeing overall growth in rail transit throughout Finland. A priority for the organization is to provide improved customer services, including an improved seat reservation system and bringing modern experiences like next generation mobile apps to their passengers. These improvements require looking at their application portfolio and deciding to either:
Revise: Transform legacy applications to more cost efficient solutions
Redesign: Redesign and rewrite mainframe-based solutions to microservices
In this session, Markus Niskanen, Integration Manager at VR Group, and Oscar Renalias, Sr. Technology Architect at Accenture will discuss how they leveraged Docker EE and the public cloud to be the common platform for these different application modernization projects. They will cover how they are leveraging Docker and the cloud to renew and optimize their application portfolio for greater ROI, leading to organization-wide adaptation of DevOps principles and cultural change in an industry that is over 150 years old.
SocCnx11 - All you need to know about orient mepanagenda
Orient Me is the first Connections service which is built on the new Connections Pink stack. Nico will talk about the installation, integration and administration of Orient Me. He will also provide useful insights around the used backend tools. Walk away with knowledge how to successfully run Orient Me in your own Connections environment!
Cisco at VMworld 2015 - Cisco UCS as the Foundation for Software-Defined Data...ldangelo0772
IT is in the midst of a dramatic shift to the mobile-cloud era, one in which IT services can be consumed on-demand across the enterprise and in hybrid and public clouds. Tjerk Bijlsma will share the latest Cisco Unified Computing System (Cisco UCS) innovations that can help you shape your Software-Defined Data Center, radically simplifying IT while delivering services at the speed of today's business.
During this session you will learn about:
Cisco's comprehensive architectural approach to enable next wave of IT convergence that includes VMware vSAN and comprehensive vRealize integration as part of the SDDC.
Innovations in Cisco Data Center portfolio including Cisco UCS and Nexus integrations with VMware solutions.
Solutions for virtualized environments for Converged and Hyper Converged systems including FlexPod, VersaStack, Vblock, vSAN, Simplivity, StorMagic and more.
L'azienda è più agile? Tutto merito del Data Center SMAU
The document discusses Cisco's unified computing system (UCS) and how it provides a flexible, integrated intelligent infrastructure for data centers. It highlights key benefits of UCS such as simplified management, reduced costs, ability to evolve with business needs, and optimized performance for virtualization. UCS combines computing, networking, management, and storage access into a single integrated architecture to reduce complexity and improve agility.
Joe Onisick, Principal Engineer, Cisco discusses building the right network and understanding different overlay approaches at Cisco Connect Toronto 2015.
Cisco Connect Halifax 2018 Cisco dna - deeper diveCisco Canada
This document provides a summary of a session on Cisco's Digital Network Architecture. The session discusses how Cisco's latest advances in programmable ASIC hardware and software-defined technologies are driving innovations in their Catalyst 9000 switches and solutions like Encrypted Traffic Analytics and Software-Defined Access. It outlines how the session will provide insight into Cisco's ASIC design process and the capabilities of their latest switching silicon. The session aims to show Cisco's evolution from application-specific integrated circuits to advanced graphical user interfaces that enable customers to more quickly innovate and reduce costs with solutions enabled by their Digital Network Architecture.
Application Centric Infrastructure (ACI), the policy driven data centreCisco Canada
Mike Herbet, Principal Engineer, Cisco, Dave Cole, Consulting Systems Engineer, Cisco, Sean Comrie, Technical Solutions Architect, Cisco focused on the application centric infrastructure (ACI) at Cisco Connect Toronto.
Presentation data center transformation cisco’s virtualization and cloud jo...xKinAnx
The document discusses Cisco's journey towards virtualization and cloud computing. It describes Cisco's global data center strategy, which includes building a new world-class data center in Allen, Texas and developing a cloud strategy and services called CITEIS. CITEIS provides infrastructure as a service using Cisco UCS, Nexus switches, and other Cisco technologies to enable automated, self-service provisioning and elastic infrastructure capacity.
Cisco Virtualized Multi-tenant Data Center solution (VMDC) is an architectural approach to IT which delivers a Cloud Ready Infrastructure. The architecture encompasses multiple systems and functions defining a standard framework for an IT organization. Standardization allows the organization to achieve operational efficiencies, reduce risk and achieve cost reductions while offering a consistent platform for business.
Migrating from VMs to Kubernetes using HashiCorp Consul Service on AzureMitchell Pronschinske
DevOps tools became very popular with the adoption of public cloud, but Operational teams now realize that their benefits can be extended to enterprise data centers. In reality, cloud native tools can help bridge public clouds and private data centers by enabling a common framework to manage applications and their underlying infrastructure components.
In this session you’ll learn about the latest Cisco ACI integrations with Hashicorp Terraform and Consul to deliver a powerful solution for end-to-end on-prem and cloud infrastructure deployments.
Cisco Connect 2018 Indonesia - software-defined access-a transformational ap...NetworkCollaborators
The document discusses Cisco's Software Defined Access (SDA) and intent-based networking solutions. It highlights how SDA and the Cisco DNA Center simplify network design, provisioning, policy implementation and assurance through automation and analytics. Traditional networks are complex to manage and secure, while SDA provides a more flexible, software-driven approach through centralized management and segmentation based on user identity rather than network topology.
This document provides an overview and summary of Cisco's Data Center networking and storage solutions, with a focus on the new Cisco MDS 9710 Director. Some key points:
- Cisco offers a multi-protocol portfolio including Fibre Channel, FCoE, and IP networking solutions to address growing data and connectivity demands in modern data centers.
- The Cisco MDS 9710 is the newest storage director that provides the highest scalability, availability, and investment protection in the industry for large scale data centers.
- It supports up to 384 line-rate 16Gbps Fibre Channel ports or 48-port 10GbE FCoE modules in a single chassis. This provides 3 times the performance of competing
Cisco Connect Toronto 2018 sd-wan - delivering intent-based networking to t...Cisco Canada
This document discusses Cisco SD-WAN and its ability to deliver intent-based networking to branches and the WAN. It begins by noting the business challenges of traditional network architectures in supporting modern needs around mobility, cloud applications, and security. It then introduces Cisco SD-WAN as a software-defined solution that provides automated, predictive, and business-intent driven networking through centralized control, application-aware policies, hybrid WAN transport, and integrated security and analytics capabilities. Key components of the Cisco SD-WAN architecture are also summarized, including the data, control, management, and orchestration planes.
Cisco Digital Network Architecture – Deeper Dive, “From the Gates to the GUICisco Canada
This document provides an overview and agenda for a session on Cisco's Digital Network Architecture. The session will cover industry trends driving digital transformation, Cisco DNA and the importance of flexible hardware, the evolution of application specific integrated circuits (ASICs), DNA/Software-Defined Access, DNA Center, Encrypted Traffic Analytics, and the Catalyst 9000 series of switches. Attendees will learn how Cisco is innovating in silicon and software development and how these innovations are powering new platforms and solutions from the "gates to the GUI." The session aims to provide deeper insight into Cisco's latest switching silicon and how ASICs are designed and built to deliver advanced network capabilities.
Cisco Digital Network Architecture Deeper Dive From The Gates To The GuiCisco Canada
This document provides an overview of a Cisco session on the Cisco Digital Network Architecture. The session will cover Cisco's evolution from silicon gates to graphical user interfaces, including topics like Cisco DNA, Software-Defined Access, DNA Center, Encrypted Traffic Analytics, and the Catalyst 9000 family of switches. The session aims to provide insight into how Cisco is driving innovation through advances in programmable and flexible application specific integrated circuits (ASICs) and how this foundational technology enables new solutions.
Cisco ucs overview ibm team 2014 v.2 - handoutSarmad Ibrahim
The document discusses Cisco's Unified Computing System (UCS) which automates IT processes to support any workload in minutes with lower infrastructure costs. UCS provides unified management, intelligent infrastructure, and unified fabric for superior price/performance. It has benefits beyond efficiency such as more effective IT. UCS has become the fastest growing product in the server market with over 3,850 unique customers and 95% of Fortune 500 companies investing in UCS.
Cisco’s Cloud Strategy, including our acquisition of CliQr Cisco Canada
At Partner Summit we made a series of exciting announcements in our Cloud portfolio, including our acquisition of CliQr. Join us to learn about these new announcements and an understanding of Cisco’s Cloud Strategy.
- How does CliQr fit into our existing Cloud portfolio (Metapod, APIC, Enterprise Cloud Suite, Cloud Consumption-as-a-Service)?
- How does our Cloud portfolio today meet the needs of our customers? What problems are we solving?
- How does our portfolio today position us for the world of Containers and Microservices?
Join us for a presentation of how these announcements fit into our current environment and what they mean to your longer-term strategy.
Cisco Powered Presentation - For CustomersCisco Powered
The document discusses Cisco Powered, a program that provides cloud and managed services through validated partners using Cisco technologies. It highlights key aspects of the Cisco Powered methodology including a broad services portfolio, security everywhere, enabling fast IT, choice of consumption models, services and networks designed together, strategic alignment with ecosystem partners, enterprise-class SLAs, independently validated services, and industry innovation leadership. The document also provides information on finding a Cisco Powered services provider.
Cisco Connect Toronto 2017 - Introducing the Network IntuitiveCisco Canada
The document discusses Cisco Digital Network Architecture (DNA) and its network intuitive capabilities. DNA Center provides automation, analytics, and assurance to translate business intent into network policies and reduce manual operations. It features workflows for network design and deployment, access control policy authoring, and software image management. DNA collects metrics from the network to provide insights through anomaly detection, trend analysis, and machine learning. This allows for guided troubleshooting and self-remediation of issues.
Similar to How Cisco Migrated from MapReduce Jobs to Spark Jobs - StampedeCon 2015 (20)
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...StampedeCon
Despite widespread adoption and success most machine learning models remain black boxes. Many times users and practitioners are asked to implicitly trust the results. However understanding the reasons behind predictions is critical in assessing trust, which is fundamental if one is asked to take action based on such models, or even to compare two similar models. In this talk I will (1.) formulate the notion of interpretability of models, (2.) provide a review of various attempts and research initiatives to solve this very important problem and (3.) demonstrate real industry use-cases and results focusing primarily on Deep Neural Networks.
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017StampedeCon
Words are no longer sufficient in delivering the search results users are looking for, particularly in relation to image search. Text and languages pose many challenges in describing visual details and providing the necessary context for optimal results. Machine Learning technology opens a new world of search innovation that has yet to be applied by businesses.
In this session, Mike Ranzinger of Shutterstock will share a technical presentation detailing his research on composition aware search. He will also demonstrate how the research led to the launch of AI technology allowing users to more precisely find the image they need within Shutterstock’s collection of more than 150 million images. While the company released a number of AI search enabled tools in 2016, this new technology allows users to search for items in an image and specify where they should be located within the image. The research identifies the networks that localize and describe regions of an image as well as the relationships between things. The goal of this research was to improve the future of search using visual data, contextual search functions, and AI. A combination of multiple machine learning technologies led to this breakthrough.
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017StampedeCon
In many modern applications data are collected in unusual form. Connectome or brain imaging data are graphs. Wearable devices measuring activity are functions over time. In many cases these objects are collected for each individual or transaction leaving the statistician with the challenge of analyzing populations of data not in classical numeric and categorical formats in big spreadsheets. In this talk I introduce object oriented data analysis with an application we recently developed for regression analysis. This talk will be aimed at the general data scientist and emphasis on the concepts and not mathematical detail. The take home message is how can we use covariates (i.e., meta-data) to predict what the structure of a brain image graph will be.
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...StampedeCon
This talk aims to dive into technical details in machine learning model development, implementation and values it bring to Monsanto breeding pipeline. We genotype over 100 million seeds a year in order to save field resources and product development cycle time. Automation and high throughput production from the lab becomes key to R&D success. In house predictive model development incorporated random forest ensemble based approach with additional features derived from gaussian mixture model. The results show over 95% accuracy with less than 1% false positives/negatives. Model is highly generalizable with over 10 million data points being trained and tested on. The model also offers probabilistic approach to present genotypes in a more meaningful way and help enhanced downstream genomics analyses. The talk targets audience who are in breeding, genetics, molecular biology, and data scientists who are interested in practical applications.
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017StampedeCon
This document provides tips for talking about artificial intelligence (AI) to non-experts. It begins by explaining why communicating about AI is important and noting that AI will transform industries just as electricity did. The tips include gauging the audience's AI sophistication, using tangible examples, focusing on present applications not future scenarios, emphasizing how AI is used rather than algorithms, and discussing the complexity tradeoff between simple and accurate models. The document concludes with "dos and don'ts" such as sparking curiosity without worrying over vocabulary or focusing only on fictional AIs, and conveying real-world impact rather than doomsday scenarios.
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017StampedeCon
This technical session provides a hands-on introduction to TensorFlow using Keras in the Python programming language. TensorFlow is Google’s scalable, distributed, GPU-powered compute graph engine that machine learning practitioners used for deep learning. Keras provides a Python-based API that makes it easy to create well-known types of neural networks in TensorFlow. Deep learning is a group of exciting new technologies for neural networks. Through a combination of advanced training techniques and neural network architectural components, it is now possible to train neural networks of much greater complexity. Deep learning allows a model to learn hierarchies of information in a way that is similar to the function of the human brain.
Foundations of Machine Learning - StampedeCon AI Summit 2017StampedeCon
This presentation will cover all aspects of modeling, from preparing data, training and evaluating the results. There will be descriptions of the mainline ML methods including, neural nets, SVM, boosting, bagging, trees, forests, and deep learning. common problems of overfitting and dimensionality will be covered with discussion of modeling best practices. Other topics will include field standardization, encoding categorical variables, feature creation and selection. It will be a soup-to-nuts overview of all the necessary procedures for building state-of-the art predictive models.
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...StampedeCon
In this session, we’ll discuss approaches for applying convolutional neural networks to novel computer vision problems, even without having millions of images of your own. Pretrained models and generic image data sets from Google, Kaggle, universities, and other places can be leveraged and adapted to solve industry and business specific problems. We’ll discuss the approaches of transfer learning and fine tuning to help anyone get started on using deep learning to get cutting edge results on their computer vision problems.
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...StampedeCon
Cognitive systems use artificial intelligence, machine learning, and reasoning to enable natural interactions between people and machines. They extend and magnify human expertise and cognition to enable timely and accurate decision making. The document discusses how human cognition can be modeled as a computational process that performs operations on symbolic representations, and how cognitive systems may be able to bring real solutions to complex problems by taking a more holistic view of issues.
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017StampedeCon
This talk will walk through the important building blocks of Automated AI. Rajiv will highlight the current gaps in the analytics organizations, how to close those gaps using automated AI. Some of the issues discussed around automated AI are the accuracy of models, tradeoffs around control when using automation, interpretability of models, and integration with other tools. These issues will be highlighted with examples of automated analytics in different industries. The talk will end with some examples of how automated AI in the hands of data scientists and business analysts is transforming analytic teams and organizations.
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017StampedeCon
This document discusses AI in the enterprise from past, present, and future perspectives. It provides an overview of the history and recent developments in AI and deep learning, including improved performance on tasks like image recognition. Case studies are presented showing how various large companies have successfully applied deep learning techniques like convolutional neural networks to problems in different industries involving computer vision, predictive maintenance, fraud detection, and more. The importance of data quantity for deep learning performance is highlighted. The final sections discuss challenges in AI adoption and the importance of piloting models before full production deployment.
A Different Data Science Approach - StampedeCon AI Summit 2017StampedeCon
This session will focus on how to execute Data Science caliber efforts by creating teams with the attributes of Data Science to deliver meaningful results. As Data Scientists are harder to find and keep, this session should appeal to anyone who is either seeking an alternative approach to executing Data Science delivery or augmenting their current Data Science model with additional options.
Graph in Customer 360 - StampedeCon Big Data Conference 2017StampedeCon
Enterprises typically have many data silos of partial customer data and a common theme in big data projects to use big data tools and pipelines to unify all siloed customer data into a single, queryable, platform for improving all future customer interactions. This data often comes from billing, website traffic, logistics, and marketing; all in different formats with different properties. Graph provides a way to unify all of the data into a single place for use in tracking the flow of a user through the various silos. Graph can also be used for visualizations and analytics that are difficult in other systems.
In this talk we will explore the ways in which Graph can be leveraged in a customer 360 use case. What it can add to a more conventional system and what the approach to developing a graph based Customer 360 system should be.
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017StampedeCon
This talk will go over how to build an end-to-end data processing system in Python, from data ingest, to data analytics, to machine learning, to user presentation. Developments in old and new tools have made this particularly possible today. The talk in particular will talk about Airflow for process workflows, PySpark for data processing, Python data science libraries for machine learning and advanced analytics, and building agile microservices in Python.
System architects, software engineers, data scientists, and business leaders can all benefit from attending the talk. They should learn how to build more agile data processing systems and take away some ideas on how their data systems could be simpler and more powerful.
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017StampedeCon
Big Data doesn’t have to just mean Hadoop any more. Big Data can be done in the cloud, using tools developed by the Cloud providers. This session will cover using Amazon AWS services to implement a Big Data application. We will compare and contrast different services from Amazon with the Hadoop equivalents.
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...StampedeCon
Using big data isn’t about doing the same things we’ve always done just with different technologies. The technology advances that we’ve chosen to label as big data create the opportunity for wholly new kinds of solutions. Two of the key advances that are enabling new business capabilities are cloud-based data management platforms and streaming data processing and analytics.
In this session, Paul Boal will drill into the cloud-based streaming data architecture that has made possible EVŌ, a new breakthrough health and wellness platform. EVŌ uses a game-changing approach that leverages over 60 billion data points and a predictive analytics engine to intervene BEFORE someone becomes critically ill. All of this is possible by leveraging data from smartphones and wearable fitness devices along with advanced analytics which then help users develop and sustain positive behaviors. Attendees will learn how to create a cloud- based architecture that can receive data, apply multiple layers of dynamic business rules, and drive alerts and decisions through real-time stream processing using technologies including web services, Amazon DynamoDB and Kinesis, Drools, and Apache Spark.
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...StampedeCon
This document summarizes Ryan Kirk's presentation at StampedeCon 2016 about predicting outcomes in cloud IoT environments. The presentation covers IoT and cloud computing landscapes, challenges with prediction in different business domains, and lessons learned from data science projects. It discusses stages of a prediction lifecycle model and how different domains like business, engineering and research are involved in each stage. Key challenges and solutions addressed include developing a domain model, approaches for handling variability and uncertainty, techniques for anomaly detection, and the importance of feedback loops and training data evaluation.
Innovation in the Data Warehouse - StampedeCon 2016StampedeCon
Enterprise Holding’s first started with Hadoop as a POC in 2013. Today, we have clusters on premises and in the cloud. This talk will explore our experience with Big Data and outline three common big data architectures (batch, lambda, and kappa). Then, we’ll dive into the decision points to necessary for your own cluster, for example: cloud vs on premises, physical vs virtual, workload, and security. These decisions will help you understand what direction to take. Finally, we’ll share some lessons learned with the pieces of our architecture worked well and rant about those which didn’t. No deep Hadoop knowledge is necessary, architect or executive level.
Creating a Data Driven Organization - StampedeCon 2016StampedeCon
Companies today are all focused on finding new consumption models to better utilize the data they produce. This presentation will provide insights and best practices for creating the organization and sponsorship necessary to set the foundation for success.
For this session, Dan will provide an overview of the process and methodologies he employs to establish and sustain a Data Driven Culture. Key topics will include:
Data Driven Culture
Executive Sponsorship
Organizational Structure – Collaboration Hubs and Bi-Modal Analytics
Role of Hadoop and Big Data as Part of Data Driven Culture
Using The Internet of Things for Population Health Management - StampedeCon 2016StampedeCon
The Internet of (Human) Things is just beginning to take shape. The human body is an inexhaustible source of data about personal health, and the healthcare industry is just beginning to scratch the surface of the potential insights and value that will come from that data. While much of healthcare traditionally focuses on the episodic delivery of services, the Affordable Care Act is pushing healthcare providers, payers, and self-funded employer groups to look at ways to proactively encourage healthy behaviors. Providing personal health devices as a way to promote individual health is one way that healthcare is beginning to take advantage of IoT technologies. This session provides insight into how IoT is being leveraged in population health management through a solution jointly delivered by Amitech Solutions and Big Cloud Analytics. Attendees will learn how Hadoop is being used to gather personal device from various vendors, integrate and analyze that information, differentiate trends across regional and cultural diversity, and provide personal recommendations and insights into health risks. This session presents one important way the healthcare industry is leveraging IoT.
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...javier ramirez
Los sistemas distribuidos son difíciles. Los sistemas distribuidos de alto rendimiento, más. Latencias de red, mensajes sin confirmación de recibo, reinicios de servidores, fallos de hardware, bugs en el software, releases problemáticas, timeouts... hay un montón de motivos por los que es muy difícil saber si un mensaje que has enviado se ha recibido y procesado correctamente en destino. Así que para asegurar mandas el mensaje otra vez.. y otra... y cruzas los dedos para que el sistema del otro lado tenga tolerancia a los duplicados.
QuestDB es una base de datos open source diseñada para alto rendimiento. Nos queríamos asegurar de poder ofrecer garantías de "exactly once", deduplicando mensajes en tiempo de ingestión. En esta charla, te cuento cómo diseñamos e implementamos la palabra clave DEDUP en QuestDB, permitiendo deduplicar y además permitiendo Upserts en datos en tiempo real, añadiendo solo un 8% de tiempo de proceso, incluso en flujos con millones de inserciones por segundo.
Además, explicaré nuestra arquitectura de log de escrituras (WAL) paralelo y multithread. Por supuesto, todo esto te lo cuento con demos, para que veas cómo funciona en la práctica.
LLM powered contract compliance application which uses Advanced RAG method Self-RAG and Knowledge Graph together for the first time.
It provides highest accuracy for contract compliance recorded so far for Oil and Gas Industry.
How We Added Replication to QuestDB - JonTheBeachjavier ramirez
Building a database that can beat industry benchmarks is hard work, and we had to use every trick in the book to keep as close to the hardware as possible. In doing so, we initially decided QuestDB would scale only vertically, on a single instance.
A few years later, data replication —for horizontally scaling reads and for high availability— became one of the most demanded features, especially for enterprise and cloud environments. So, we rolled up our sleeves and made it happen.
Today, QuestDB supports an unbounded number of geographically distributed read-replicas without slowing down reads on the primary node, which can ingest data at over 4 million rows per second.
In this talk, I will tell you about the technical decisions we made, and their trade offs. You'll learn how we had to revamp the whole ingestion layer, and how we actually made the primary faster than before when we added multi-threaded Write Ahead Logs to deal with data replication. I'll also discuss how we are leveraging object storage as a central part of the process. And of course, I'll show you a live demo of high-performance multi-region replication in action.
Malviya Nagar @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Jina Singh Top Model Safe
How Cisco Migrated from MapReduce Jobs to Spark Jobs - StampedeCon 2015
1. Ken Owens
CTO Cisco Intercloud Services
07/15/15
How Cisco Migrated from
MapReduce Jobs to Spark
Jobs
1
2. Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public
Introduction
3. Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public
Introduction
4. Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public
Introduction
5. Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public
Introduction
6. Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public
Introduction
7. Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public
Source: IDC 7
30M
New devices
connected
every week
78%
Workloads
processed
in Cloud DCs
by 2018
5TB+
of data per person
by 2020
180B
Mobile apps
downloaded
in 2015
277X
Data created
by IoE devices
v. end-user
The Uber Trend: Exponential Rise in Connectivity
8. Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public
Exponential Trend
Linear Trend
Disruptive Stress
/Opportunity
Knee of Curve
Exponential Growth Drives Opportunities
Peter Diamandis: BOLD
9. Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public
When Products Become Cloud-enabled, They Become
10X More Valuable
$23.19
$249.00
$18.01
$199.00
$5.99
$59.99
10. Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public
SaaS
PaaS IaaS
A Broader Perspective than Hybrid Cloud Is Required…
Data Center Cloud Edge / IoT
11. Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public
Hyperscale applications serving several
thousands of users very quickly
Traditional enterprise applications
IoE and increasing connectivity driving the need
for such workloads
Hadoop, Mobile back-ends, Gaming, Social
Small (~10%), yet rapidly growing
percentage of applications in the Cloud
ERP, CRM, Applications that leverage
traditional databases
Majority of applications being run
for/by Enterprises today
CIOs Need to Embrace Both Traditional
and Hyperscale Application Deployment
12. Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public
SaaS
PaaS IaaS
Application Portability and Interoperability Is the Key
Traditional
Applications
ERP, Financial, Client/Server,
CRM, email, …
Cloud Native
Applications
IoT, BigData,Analytics,
Gaming, ...
Data Center Cloud Edge / IoT
13. Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public
Source: Gartner, Lydia Leong
of CIOs currently
have a second
fast/agile mode
of operation
45%
Traditional
Mode
Requires
Reliability
(ITIL, CMMI, COBIT)
Nonlinear Mode
Accept Instability
(DevOps,
automation,
reusable)
Systems
of
Differentiation
Systems
of
Innovation
Systems
of
Record
Change
Governance
Bimodal IT Is the New Normal
Source: Gartner, Lydia Leong
14. Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public
Intercloud
The
Intercloud
Web-scale Architecture
API-Driven Automation
Open, Secure, Compliant,
Hybrid IT
Internet
The
Internet
IP Based
Open Standards
World of Isolated Clouds
(2000s)
Individual custom-built clouds
without consistent APIs
Connected for application
acceleration with Open APIs
The Intercloud
Intercloud
Islands of Isolated
PC LAN Networks (1990s)
Multiple LANs using
a multitude of protocols
The Internet
Connected using industry-
standard IP protocol
We Must Connect the Clouds
16. Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public
Omni-Channel Customer Journeys
Server
Logs
Social
& Chat
Mobile
Event
Streams
Call
Center
S/W
Download
Open Trouble
Ticket
Assign
Engineer
Update
Trouble Ticket
Close Trouble
Ticket
Resolve
Trouble Ticket
Read Support
Documents
View Design
Documents
View Tech
Documents
New
Registration
Bug Search FAQs
Contract
Details
Product
Details
Device
Coverage
Interaction Touch points
Channels
Journey
Case Resolution
Software Upgrade
The customers’ interaction with Cisco across multiple touch points to get the desired business
outcome.
17. Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public
• Software Upgrades
• Bug Inquiry
• Software Inquiry
• Trouble Ticket Lifecycle
• Device Troubleshooting
• New Registration
• Contract Renewal
• Customer Interest
Analytics
• Customer Experience
Analytics
• Resource Forecasting
• Security and
Compliance
Customer Journeys Behavioral Insights
• Boost Self Service
• Real-time Content
Optimization &
Recommendation
• Context Based
Predictive Alerts
• Implicit Personalization
Impact
Customer Interaction Analytics
From Journey to Outcome…
18. Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public
Server Logs
Customer Interaction Analytics
Big Data Platform
Synthesize customer journey maps into behavioral insights.
Call Center
Mobility
Social
Event
Streams
Data
Sources
Data
Ingestion
CiscoDV
Kafka
Redis
ETL
Analytics
Model
Build Model
Activity
Refinement
Activity
Synthesis
Synthesized
Insights
Real-time Processing
Batch Analytics
Insight Services
CiscoDV
Interact
ImpalaHive
Pig ES
Zoomdata,Platfora
20. Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public
AWS Platform
Component Cloud::
Hadoop
(Batch
Analytics)
Cloud::
Queries
(Interactive
Queries)
Cloud::
Streams
(Near Real-
time
Analytics)
Virtual
Machines
30 6 5
AWS
Instance
Sizing
m3.2xlarge c3.xlarge m3.xlarge
Virtual
Cores
8/VM 4/VM 4/VM
RAM 30GB/VM 7.5GB/VM 15GB/VM
Disk 1.5 TB/VM 1.5 TB/VM 1.5 TB/VM
21. Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public
Case for Cisco Intercloud Services for Analytics…
Cisco Security and Compliance requirements
• Workloads that deal with personally identifiable data and Cisco
confidential content cannot be uploaded to AWS. Cisco internal cloud
solution is a better fit.
Customer journey beyond the enterprise
• Applications are hosted on AWS
• Partner systems hosted on AWS and other cloud providers
Presence in AWS and other cloud services required to support these
scenarios for end-end customer journey insights.
Data virtualization integrated in the CIS Analytics Stack
• Connect data from multiple clouds and multiple big data platforms
Integrated visualization toolset
22. Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public
CIS Analytics Platform
23. Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public
CIS Analytics Platform Requirements
Infra Provisioning
Deploy a virtual private cloud (VPC) on CIS with compute, storage and memory requirements comparable to the current
production system.
OpenStack
Icehouse OpenStack with Neutron, Nova, and Swift installed.
Big Data Ecosystem
Cloudera’s Hadoop distribution version CDH 5.1.3., ELK Stack, Apache Kafka and Apache Storm.
Data virtualization & Cloud Integration
Access to data services and data stores via Cisco Data Virtualization
Runtime Services
Foundational PaaS capabilities including SLAs for uptime, performance, latency, data retention, issue escalation and
support priorities, issue resolution, problem management, deployment process, patch management.
API Services
Provide both fine-grained and coarse-grained access to the all service layers of the CIS Analytics Platform. In the hybrid cloud
model it must support interoperability across platform service providers and promote the cloud concepts of extensibility and
flexibility.
24. Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public
AWS to CIS Migration – Success Criteria
Successful synthesis of customer interaction data
Successful automation of the end-end data process pipeline
Build behavioral insight services
Access to data and services via data discovery and visualization tools
Meet the performance, scale and platform stability requirements
Successful deployment of CiscoDV on CIS
Connect HDFS and Hive DS with CiscoDV via Hive and Impala
Build and expose insight services for consumption by limited users
25. Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public
AWS and CIS Data Node Sizing Comparison
Hadoop Cluster for Batch and Query Analytics
Node Service AWS Instance Type vCPU Mem Storage
Number of
Data Nodes
Comments
Data Nodes/
Node Master m3.2xlarge 8 30 2x80 GB 30
Each hadoop data node has 1500GB of EBS
available for HDFS storage
AWS Sizing
CCS Sizing
Node Service CCS Instance Type vCPU Mem Storage
Number of
Data Nodes
Comments
Data Nodes/
Node Master GP-2XLarge 8 32 50 35
Each hadoop data node has 1500GB of EBS
available for HDFS storage
Less than AWS sizing (Storage)
26. Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public
Pilot Test Data
• Test performed on one day’s production data
• Total no. of records processed – 110,852,667
• Total data size – 32GB
• Total no. of M/R jobs in the data pipeline – 17
• Two test cycles
• Cycle 1: Heterogeneous CCS nodes (vCPUs, storage, memory)
• Cycle 2: Homogeneous CCS nodes
27. Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public
CIS Performance of Batch Analytics – Limited Test
29. Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public
PoC: Analytics with Spark on CIS
Existing code
Made in Ruby with Wukong to run on Hadoop
A history of changes and modifications
Script-based, steps communicate via intermediary files
Goal
Revise, rethink and reimplement with Spark on CIS
Open for advanced cloud analytics
Improve maintainability by moving away from aging Ruby on Hadoop
30. Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public
Sessionize
Cleanse
logs
cleanse
private web
decorate
sessionize
(cookie, time)
sessioned
match 1st
(IP, UA, time)
build actions merge
session PSV
add to hivebug tool
first, others, bots
1..7
onlyBots
first
others
private
Main
computation
happens here
cleansed
Pre-process log records (‘cleanse’)
Extract HTTP sessions (‘sessionize’)
Extract user actions, such as ‘search’, ‘download
patch’, ‘open manual’, ‘open a bug’
Ruby: Scripts with temp files
Each box on the figure is a script in a separate file
They pipe Gb of data as input and output
Random matching of nodes to data for sessionizing
Lots of redundant shuffling
Ruby Flow
global sort in time
global group by IP
31. Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public
Sessionize
Cleanse
logs
cleanse
private web
decorate
sessionize
(cookie, time)
sessioned
match 1st
(IP, UA, time)
build actions merge
session PSV
add to hivebug tool
first, others, bots
1..7
onlyBots
first
others
private
Main
computation
happens here
cleansed
Same flow, but each box is a Java or Scala function
No intermediate temp files
Steps are chained by Spark, often without any need for
intermediate data
If still needed, the data is stored in memory and local
disk as much as possible
Local computation
Cleansing is computed on nodes local to data blocks
(same as Ruby)
Sessions are built per IP
On separate nodes each handling a single IP range
One copied to the node on partition the data remains
local
Spark Flow
global partition by IP
local sort in time
32. Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public
Volumes
Logs of a single day: 52 Gb
Total of 110 mil records
Where 53 mil records are kept after pre-filtering
Producing over 1 mil user actions
Cluster of 30 nodes
Ruby
Runtime 140 min
Spark
Runtime 7 min (20 times faster )
Runtime comparison
33. Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public
Extracting sessions means sort in time and group by IP
Ruby:
sorting in time and per-IP grouping is performed across the whole cluster (very bad, lots of IO)
Spark is good at dealing with partitions:
per-IP groups are placed on different machines (partitions)
global sort in time is replaced by many local per-IP sorts done on machines responsible for
extracting sessions for specific groups of IP addressed
Other improvements
Avoid redundant temp files, redundant (de)-serialization of objects (comes with Java/Scala), stages
keep data in memory when possible (comes with Spark)
Cache results of user agent resolution that are heavy on regular expressions
Why?
35. Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public
Data Virtualization for Intercloud Analytics
Customer Benefits
Discover data beyond the enterprise: Virtual integration that combines traditional
enterprise data, Big Data stores on CIS and AWS, cloud data from SaaS providers and,
Cisco Customers and Partners
Seamless interoperability offers easy access to data across distributed data sources
in the intercloud analytics platform
Universal data governance maximizes enforcement of data security rules
Analytics Data Hubs: Deployment flexibility to build hybrid/virtual sandboxes that
enable nimble data discovery and rapid data analytics to support multiple LOBs
Deliver data to any number of analytics tools.
36. Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public
Use Case 1: Get Case Interactions
Use Case Description # of cases opened by company X that
are currently open. (other variations
would include cases by company,
trends etc.)
CiscoDV Value CiscoDV enforces data security rules to
restrict access on the intercloud
platform to customer sensitive data.
Data Sources SalesForce
Intercloud Solution CIS CiscoDV service can access the
“sanitized” version of CSOne data
through JDBC from RIDES(SWTG
CiscoDV) API.
Connection Type DV on hybrid cloud Enterprise data
store
37. Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public
Use Case 2: Get Customer Journey
Use Case Description Customer interactions on the web
pertaining to bug search and case
submission process. Foundational data
can be used to explore trends and feed
into content recommendation models
CiscoDV Value Direct access to Data on CIS Intercloud Analytics
Platform
Data Sources SAS Analytics
Intercloud Solution By direct network access to the Impala
Server, the CIS CiscoDV server
connects to the Impala Service in
Hadoop also on CIS as a Data Source.
SQL Queries configured in CiscoDV
execute Impala queries
Connection Type DV on hybrid cloud VPC Big Data
platform
38. Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public
Use Case 3: Get Bug Interactions
Use Case
Description
Another foundational data service that provides
a breakdown of customer exposure or interest
in bugs. The service can be refined further to
look at trends specific to a company or a
product for further analytics.
CiscoDV Value Real-time data federation that accesses
extremely large data in CIS Intercloud Analytics
platform and join that with Bug Data accessed
via departmental CiscoDV instance (RIDES)
Data Sources SASA Analytics and QDDTS via RIDES
Intercloud
Solution
By building on the access to the Impala Server,
the DV server can join the Bug Data from the
Enterprise Data Stores with the HDFS data to
provide a federated view.
Connection
Type
DV on hybrid cloud VPC Big Data platform
and Enterprise data store
39. Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public
CiscoDV on Intercloud Analytics Platform (CIS)
Scenario 1
CIS Cisco DV to Cisco
Enterprise Data Store
Scenario 2
CIS CiscoDV to Impala and
Hive on CIS Intercloud
Analytics Platform
Scenario 3
CIS Cisco DV to Hive on AWS
Big Data Cluster
Scenario1
Scenario 3
Editor's Notes
FABIO – a few items from Pankaj and Liz Monday:
Per the John Chambers slides I sent you Monday night, please be sure to fully address digitization in the opener, so Pankaj can connect to John’s opening remarks.
Set the stage here for what the digital transformation is and why it dries IoE and cloud. Explain where we came from, where we are today – exponential growth and a magnitude of changes still to come.
Please see new VNI, to see if there are any newer/better stats re the Data Center.
Pankaj feels the top 3 data points are ok in this slide, but perhaps we could find better ones for the bottom 2 data points? Maybe uplevel them a bit?
-------------------------------------------------------
The world is changing. The digital transformation is turning traditional business models on their heads. We are seeing unprecedented growth in the explosion of devices and mobile apps and in data utilization.
IoE – IoE devices create 277 times the data that the end user is creating. But only a fraction of it ever reaches the data center. A Boeing 787 for example, generates 40 TB of data per every hour of flight time. But only 0.5 TB is ultimately transmitted to the data center.
Mobility: In 2014, global mobile data traffic grew 1.7x or 69%… In 2014 alone, 77B+ mobile apps downloaded… by 2015 180B apps (233% increase)
Internet… IDC predicts by 2017, there will be 3.6 billion global Internet users… More than 1/2 the world population
Big Data… By 2020 there will be more than 5,000 GB of data for every person on Earth
These massive changes are putting tremendous stress on the data center. The traditional data center model has to evolve in order to meet demand today and into the future.
We know how to fix this
We’re going to do for cloud what we did for data. You couldn’t move data between the networks – they weren’t connected. Cisco unified those worlds
The world of cloud today is a world of isolated clouds. There’s no workload or data portability.
“Amazon is hotel California – you can never leave, and that data is staying there”
Our vision is to connect all these clouds together into the Intercloud - whether private, public , or hybrid through technology and innovation
Intercloud is going to connect these clouds together in the same way we connected data together.
No one cloud model or single cloud approach, such as the massively scalable clouds from Amazon, Google or Microsoft will win alone in this space