Basic introduction to HTTP/2, and how it can help to speed up SAP Fiori applications. Presented at the 2017 SAP Inside Track Silicon Valley #sitSV, and at SAP TechEd in Las Vegas as session NET52433.
Data processing and deep learning are often split into two pipelines, one for ETL processing, the second for model training. Enabling deep learning frameworks to integrate seamlessly with ETL jobs allows for more streamlined production jobs, with faster iteration between feature engineering and model training.
Alluxio Day VI October 12, 2021 https://www.alluxio.io/alluxio-day/ Speaker: David Zhu, Alluxio
"鹿児島Linux勉強会 2017.01 - connpass" https://kagolug.connpass.com/event/47774/
Files in Hadoop are broken into blocks that are replicated across multiple Data Nodes for redundancy. The document shows how log files a.txt, b.txt and e.txt are each broken into multiple blocks that are stored on different Data Nodes, with each block replicated at least 3 times to ensure the data is not lost if a Node fails.
Contrail provides software defined networking and virtual network capabilities for OpenStack clouds. Key components of Contrail include the Contrail controller, vRouters running on hypervisors, and integration with OpenStack using Neutron and Nova. Virtual networks in Contrail can be created which provide isolation between groups of virtual machines and connectivity to physical networks.
OpenStack DVR (Distributed Virtual Router) allows L3 routing functions to be distributed across compute nodes by creating router namespaces on each compute node. This avoids bottlenecks and single points of failure at network nodes. DVR supports east-west inter-subnet routing, SNAT for external access without floating IPs, and floating IPs associated with internal VMs for direct external access. Traffic flows are encapsulated in VXLAN/GRE tunnels between compute nodes and routed appropriately within each node's router namespace.
This document discusses key concepts in designing large-scale distributed systems. It covers consistency models like eventual consistency and sequential consistency. It discusses why systems are distributed, including for failures and geolocation. It also covers decentralized architectures, transactions, and consensus protocols like Paxos. Tradeoffs between different techniques are presented, like consistency vs availability. Real-world examples like Dynamo and Megastore are also summarized.
MRUnit is a testing library that makes it easier to test Hadoop jobs. It allows programmatically specifying test input and output, reducing the need for external test files. Tests can focus on individual map and reduce functions. MRUnit abstracts away much of the boilerplate test setup code, though it has some limitations like a lack of distributed testing. Overall though, the benefits of using MRUnit to test Hadoop jobs outweigh the problems.
Summary of recent progress on Apache Drill, an open-source community-driven project to provide easy, dependable, fast and flexible ad hoc query capabilities.
This training module introduces Resource Description Framework (RDF) for describing data, including representing data as triples, graphs and syntax; it also introduces the SPARQL query language for querying and manipulating RDF data, covering SELECT, CONSTRUCT, DESCRIBE, and ASK query types and the structure of SPARQL queries. The module provides learning objectives and an overview of the content which includes an introduction to RDF and SPARQL with examples and pointers to further resources.
If you’re trying to process financial market data, monitor IoT sensor metrics or run real-time fraud detection, you’ll be thinking of stream processing. Stream processing sounds wonderful in concept, but scaling and debugging stream processing frameworks on distributed systems can be a nightmare. In clustered environments, your logs are scattered across many different computers making errors and strange behaviors are hard to trace. On frameworks like Apache Storm, the many layers of abstraction make it difficult to predict performance and do capacity planning. In micro batching frameworks like Spark Streaming, stateful aggregations can be a hassle. Moreover, in most of the existing frameworks, changing a single line of code requires a full topology redeploy causing operational strain. Concord strives to solve all the challenges above. In this talk, you’ll learn how Concord differs from other stream processing frameworks and how Concord can provide flexibility, simplicity, and predictable performance with help from Apache Mesos. https://databythebay2016.sched.org/event/6EPy/concord-simple-amp-flexible-stream-processing-on-apache-mesos
Manufacturers have an abundance of data, whether from connected sensors, plant systems, manufacturing systems, claims systems and external data from industry and government. Manufacturers face increased challenges from continually improving product quality, reducing warranty and recall costs to efficiently leveraging their supply chain. For example, giving the manufacturer a complete view of the product and customer information integrating manufacturing and plant floor data, with as built product configurations with sensor data from customer use to efficiently analyze warranty claim information to reduce detection to correction time, detect fraud and even become proactive around issues requires a capable enterprise data hub that integrates large volumes of both structured and unstructured information. Learn how an enterprise data hub built on Hadoop provides the tools to support analysis at every level in the manufacturing organization.
This document provides an overview of web services and service-oriented architecture (SOA). It discusses the history and evolution of web services including SOAP, WSDL, UDDI, and RESTful web services. It also covers testing, security, and resources for further information on web services and SOA.
This presentation is about what Presto is and how Treasure Data uses it (presented at db tech showcase Sapporo 2015).