LinkedIn serves traffic for its 467 million members from four data centers and multiple PoPs spread geographically around the world. Serving live traffic from from many places at the same time has taken us from a disaster recovery model to a disaster avoidance model where we can take an unhealthy data center or PoP out of rotation and redistribute its traffic to a healthy one within minutes, with virtually no visible impact to users. The geographical distribution of our infrastructure also allows us to optimize the end-user's experience by geo routing users to the best possible PoP and datacenter. This talk provide details on how LinkedIn shifts traffic between its PoPs and data centers to provide the best possible performance and availability for its members. We will also touch on the complexities of performance in APAC, how IPv6 is helping our members and how LinkedIn stress tests data centers verify its disaster recovery capabilities.
Microservices became the new black in enterprise architectures. APIs provide functions to other applications or end users. Even if your architecture uses another pattern than microservices, like SOA (Service-Oriented Architecture) or Client-Server communication, APIs are used between the different applications and end users. Apache Kafka plays a key role in modern microservice architectures to build open, scalable, flexible and decoupled real time applications. API Management complements Kafka by providing a way to implement and govern the full life cycle of the APIs. This session explores how event streaming with Apache Kafka and API Management (including API Gateway and Service Mesh technologies) complement and compete with each other depending on the use case and point of view of the project team. The session concludes exploring the vision of event streaming APIs instead of RPC calls.
Kafka Streams is a client library for building distributed applications that process streaming data stored in Apache Kafka. It provides a high-level streams DSL that allows developers to express streaming applications as set of processing steps. Alternatively, developers can use the lower-level processor API to implement custom business logic. Kafka Streams handles tasks like fault-tolerance, scalability and state management. It represents data as streams for unbounded data or tables for bounded state. Common operations include transformations, aggregations, joins and table operations.
In 1853 Britain’s workshops built 90 new gunboats for the Royal Navy in just 90 days: an astonishing feat of engineering. Industrial standardization made this possible - and in this talk, my first at Strata, I argued that data-sophisticated corporations need a new standardization of their own, in the form of schema registries like Confluent Schema Registry or Snowplow’s own Iglu. Talk abstract: At the start of the Crimean War in 1853, Britain's Royal Navy needed 90 new gunboats ready to fight in the Baltic in just 90 days. Assembling the boats was straightforward - the challenge was to build all of the engine sets in time. Marine engineer John Penn did an unusual thing: he took a pair of reference engines, disassembled them and distributed the pieces to the best machine shops across Britain. These workshops - latter-day micro-services - each built 90 sets of their allocated parts, which were then assembled into the engines for the new gunboats, ready for battle. This was the nineteenth century - how could the Admiralty be certain that the parts from all these independent workshops would come together to form 90 high-powered engines? The answer lay in a crucial piece of standardization: the Whitworth thread, the world’s first national screw thread standard, devised by Sir Joseph Whitworth in 1841. By the time the Royal Navy came knocking, this standard had been adopted by workshops across Britain; John Penn could be confident that engine parts built by any workshop to the Whitworth standard would fit together. In this talk, Snowplow co-founder Alexander Dean will draw on the story of the Crimean War gunboats to argue that our data processing architectures urgently require a standardization of their own, in the form of schema registries. Like the Whitworth screw thread, a schema registry, such as Confluent Schema Registry or Snowplow’s own Iglu, allows enterprises to standardise on a set of business entities which can be used throughout their batch and stream processing architectures. Like the artisanal workshops in 1850s Britain, micro-services can work on narrowly defined data processing tasks, confident that their inputs and outputs will be compatible with their peers. This talk will start with the rationale for putting a schema registry at the heart of your business, before moving on to the practicalities of an implementation, including: a side-by-side comparison of the available registries; best practises about schema versioning; strategies around schema federation across different companies such as Snowplow’s own Iglu Central.
Monitor availability and performance of applications hosted in the Amazon cloud. Monitor your Amazon EC2 and RDS instances and gain insight into the performance of your cloud computing environment, troubleshoot and resolve problems before end users are affected. Forums: https://forums.site24x7.com/ Facebook: http://www.facebook.com/Site24x7 Twitter: http://twitter.com/site24x7 Google+: https://plus.google.com/+Site24x7 LinkedIn: https://www.linkedin.com/company/site24x7 View Blogs: http://blogs.site24x7.com/
Have many services? Writing new ones often? If so middleware can help you cut down on the ceremony for writting new services and at same time consolidate the handling of cross cutting concerns. But what is middleware? OWIN and ASP.NET Core both have a concept of middleware. What are they? How do they help? In this talk we will dive into the code, write some middleware and show how middleware helps you handle cross-cutting concerns in an isolated and re-usable way across your services. I'll compare and contrast the OWIN and ASP.NET Core middleware concepts and talk about where each is appropriate.
LinkedIn developed the Azkaban workflow manager to schedule and run Hadoop jobs. They created versions 1.0, 2.0, and 2.5 of Azkaban, adding new features like plug-ins, authentication, and a redesigned UI. Azkaban is now used by over 1,000 LinkedIn users to run 2,500 workflows and 30,000 jobs daily across multiple Hadoop clusters.
The workshop tells about HBase data model, architecture and schema design principles. Source code demo: https://github.com/moisieienko-valerii/hbase-workshop
In this workshop we will set up a streaming framework which will process realtime data of traffic sensors installed within the Belgian road system. Starting with the intake of the data, you will learn best practices and the recommended approach to split the information into events in a way that won't come back to haunt you. With some basic stream operations (count, filter, ... ) you will get to know the data and experience how easy it is to get things done with Spring Boot & Spring Cloud Stream. But since simple data processing is not enough to fulfill all your streaming needs, we will also let you experience the power of windows. After this workshop, tumbling, sliding and session windows hold no more mysteries and you will be a true streaming wizard.
One of the great things about running applications in the cloud is that you only pay for the resources that you use. But that also makes it more important than ever for our applications to be resource-efficient. This becomes even more critical when we use serverless functions. Micronaut is an application framework that provides dependency injection, developer productivity features, and excellent support for Apache Kafka. By performing dependency injection, AOP, and other productivity-enhancing magic at compile time, Micronaut allows us to build smaller, more efficient microservices and serverless functions. In this session, we'll explore the ways that Apache Kafka and Micronaut work together to enable us to build fast, efficient, event-driven applications. Then we'll see it in action, using the AWS Lambda Sink Connector for Confluent Cloud.
This talk is about our experience at LinkedIn migrating our content ingestion system from using Oracle to using our internal database system Espresso. I explain some of the reasons for doing the migration as well as how we met the challenges of swapping database technologies with no down time and in a way that was transparent to our clients. This talk was delivered at the SATURN 2018 conference in Plano, TX on May 9, 2018.
The document discusses workflow schedulers like Azkaban and Oozie. It explains that a workflow scheduler helps manage dependencies between jobs in a data pipeline. Azkaban was implemented at LinkedIn to solve dependency issues for Hadoop jobs. It uses properties files while Oozie uses XML files. Workflow schedulers allow easy management of task dependencies, scheduling, monitoring progress, and retrying failed jobs.
In this talk we'll look at the relationship between three of the most disruptive software engineering paradigms: event sourcing, stream processing and serverless. We'll debunk some of the myths around event sourcing. We'll look at the inevitability of event-driven programming in the serverless space and we'll see how stream processing links these two concepts together with a single 'database for events'. As the story unfolds we'll dive into some use cases, examine the practicalities of each approach-particularly the stateful elements-and finally extrapolate how their future relationship is likely to unfold. Key takeaways include: The different flavors of event sourcing and where their value lies. The difference between stream processing at application- and infrastructure-levels. The relationship between stream processors and serverless functions. The practical limits of storing data in Kafka and stream processors like KSQL.
Over the past couple of years, Scala has become a go-to language for building data processing applications, as evidenced by the emerging ecosystem of frameworks and tools including LinkedIn's Kafka, Twitter's Scalding and our own Snowplow project (https://github.com/snowplow/snowplow). In this talk, Alex will draw on his experiences at Snowplow to explore how to build rock-sold data pipelines in Scala, highlighting a range of techniques including: * Translating the Unix stdin/out/err pattern to stream processing * "Railway oriented" programming using the Scalaz Validation * Validating data structures with JSON Schema * Visualizing event stream processing errors in ElasticSearch Alex's talk draws on his experiences working with event streams in Scala over the last two and a half years at Snowplow, and by Alex's recent work penning Unified Log Processing, a Manning book.
An overview of the Confluent platform for Apache Kafka and how to use KSQL to build streaming data pipelines
Several different frameworks have been developed to draw data from Kafka and maintain standard SQL over continually changing data. This provides an easy way to query and transform data - now accessible by orders of magnitude more users. At the same time, using Standard SQL against changing data is a new pattern for many engineers and analysts. While the language hasn’t changed, we’re still in the early stages of understanding the power of SQL over Kafka - and in some interesting ways, this new pattern introduces some exciting new idioms. In this session, we’ll start with some basic use cases of how Standard SQL can be effectively used over events in Kafka- including how these SQL engines can help teams that are brand new to streaming data get started. From there, we’ll cover a series of more advanced functions and their implications, including: - WHERE clauses that contain time change the validity intervals of your data; you can programmatically introduce and retract records based on their payloads! - LATERAL joins turn streams of query arguments into query results; they will automatically share their query plans and resources! - GROUP BY aggregations can be applied to ever-growing data collections; reduce data that wouldn't even fit in a database in the first place. We'll review in-production examples where each of these cases make unmodified Standard SQL, run and maintain over data streams in Kafka, and provide the functionality of bespoke stream processors.
Presented by Michael Noll, Product Manager, Confluent. Why are there so many stream processing frameworks that each define their own terminology? Are the components of each comparable? Why do you need to know about spouts or DStreams just to process a simple sequence of records? Depending on your application’s requirements, you may not need a full framework at all. Processing and understanding your data to create business value is the ultimate goal of a stream data platform. In this talk we will survey the stream processing landscape, the dimensions along which to evaluate stream processing technologies, and how they integrate with Apache Kafka. Particularly, we will learn how Kafka Streams, the built-in stream processing engine of Apache Kafka, compares to other stream processing systems that require a separate processing infrastructure.
Apache Kafka and Amazon Kinesis are more than just message queues — they can serve as a unified log which you can put at the heart of your business, effectively creating a "digital nervous system" which your company's applications and processes can be re-structured around. In this talk, Alex will provide an introduction to unified log technology, highlight some killer use cases and also show how Kinesis is being used "in anger" at Snowplow. Alex's talk will draw on his experiences working with event streams over the last two and a half years at Snowplow; it’s also heavily influenced by Jay Kreps’ unified log monograph, and by Alex's recent work penning Unified Log Processing, a Manning book. Alex's talk will show how event streams inside a unified log are an incredibly powerful primitive for building rich event-centric applications, unbundling local transactional silos and creating a single version of truth for a company. Alex's talk will conclude with a live demo of Amazon Kinesis in action processing Snowplow events.
Michael Kehoe is a senior site reliability engineer at LinkedIn who discusses their use of Kafka, Hadoop, and Couchbase. LinkedIn uses Kafka for monitoring, messaging, analytics and as a building block for distributed applications. They collect member usage data in Kafka clusters and push some of this data to Hadoop for analysis and reporting. Couchbase is used across 80 services for caching, with clusters of up to 70 servers, and Hadoop is used to build and restore Couchbase buckets. Their jobs cluster supports over 150k queries per second with low latency.
The document discusses onboarding entry-level talent (ELTs) at LinkedIn. It recommends introducing ELTs to company values and culture, and providing a roadmap and flexibility. Training should include technical and non-technical materials, and mentoring by pairing ELTs with experienced engineers. Managing ELTs requires support, career opportunities, acknowledgment, and avoiding boring work. Relationships, diverse skills, efficient use of time, and feedback are important lessons.
This document discusses LinkedIn's use of Kafka, Hadoop, Storm, and Couchbase in their big data pipeline. It provides an overview of each technology and how LinkedIn uses them together. Specifically, it describes how LinkedIn uses Kafka to stream data to Hadoop for analytics and report generation. It also discusses how LinkedIn uses Hadoop to pre-build and warm Couchbase buckets for improved performance. The presentation includes a use case of streaming member profile and activity data through Kafka to both Hadoop and Couchbase clusters.
This document discusses the benefits of site reliability engineers (SREs) and how to realize their full potential. It emphasizes the importance of feedback loops to continuously improve processes, projects and tools. A 5-step plan is outlined to effectively manage feedback: 1) Know your audience, 2) Remove facades to get real feedback, 3) Isolate and triage issues, 4) Know how to present feedback to the right stakeholders, and 5) Implement solutions and iteratively adjust as needed. Regular feedback through 1:1s, retrospectives and surveys can help surface issues and ideas to make continuous improvements.
Uso de CouchDB como base de datos para soluciones que ameriten el manejo de gran cantidad de información a través de aplicaciones Android. La presentación muestra una pequeña introducción sobre ¿Cómo conectarse y manejar bases de datos CouchDB en Android? Las diapositivas fueron desarrolladas por mi persona para ExpoTech 2013 (31-01 al 01-02-2013) , en Puerto Ordaz - Venezuela.
The document describes LinkedIn's use of Couchbase for caching and the automation of Couchbase clusters using SaltStack. Key points: - LinkedIn uses Couchbase to store cached data for read scaling across hundreds of clusters totaling thousands of servers. - Automation is achieved using SaltStack's states, pillars and grains to configure Couchbase installation, cluster expansion/reduction, and uninstall remotely. - A Couchbase execution module and Salt runners implement cluster operations like setup, expansion, reduction through the REST API and CLI while providing output to the user.
Which possibilities offers Google Analytics for online vacancies? How do visitors get on your site, what behavior do they exhibit there, and when do they leave again? Google Analytics can give all this information. For that you need to know the environment of course, and know what the possibilities are. Want to learn how Analytics works and to work on some practical assignments? Watch the presentation to learn more.