This document discusses CQL, the Cassandra Query Language. CQL is designed to be similar to SQL but with some differences to account for Cassandra's data model. The presentation provides an overview of CQL's syntax and capabilities, discusses why CQL was created to provide a more stable interface than Cassandra's native protocol, and analyzes CQL's performance compared to the native protocol. Future roadmap items for CQL are also presented, including prepared statements and custom transports. Available CQL drivers for languages like Java, Python, Ruby, and Node.js are also briefly mentioned.
Tips, tricks and strategies we use at EverythingMe to scale and keep our servers always running, no matter what
This document discusses using Gluster object storage with OpenStack Swift. Gluster-Swift mounts the Swift storage using FUSE and allows Swift to interface with Gluster backends. This avoids reimplementing the Swift object API. Gluster-Swift overrides Swift's distribution and replication to use the Gluster backend. The Swift API is implemented using FUSE operations on the Gluster volume. Future work includes upgrading Gluster-Swift, packaging, optimizations, and potentially developing a native Gluster object interface.
This document proposes an architecture for distributed indexing, storage, and real-time analysis of logs. It discusses challenges of scaling log collection and analysis across hundreds of servers generating terabytes of data daily. The proposed architecture uses multicast messaging and sharding to distribute indexing and querying across clusters of servers for scalability. It emphasizes low overhead indexing and real-time aggregation of results.
The document discusses several key improvements and changes in Apache Kafka 1.0, including: 1) Tolerating single disk failures in brokers so they remain available. 2) Cleanup and improvements to the Kafka Streams builder API and addition of new classes to view topology and task details. 3) Enhancements to the print and writeAsText methods for debugging streams applications. 4) Addition of exception handlers to Kafka Streams to control behavior on deserialization errors.
- The document discusses using the ELK stack (Elasticsearch, Logstash, Kibana) to perform real-time log search, analysis, and monitoring. It provides examples of using Logstash and Elasticsearch for parsing and indexing application logs, and using Kibana for visualization and analysis. - The document identifies several performance and stability issues with Logstash and Elasticsearch including high CPU usage from grok filtering, GeoIP filtering performance, and Elasticsearch relocation and recovery times. It proposes solutions like custom filtering plugins, tuning Elasticsearch configuration, and optimizing mappings. - Rsyslog is presented as an alternative to Logstash for log collection with better performance. Examples are given of using Rsyslog plugins and Rainerscript for efficient
The Prometheus monitoring system collects and stores time series data to give valuable insights over hosts, containers, and applications. Its storage engine was designed to be multiple orders of magnitude faster and more space efficient than, say, RRD or SQL storage. However, with the rise of orchestration systems such as Docker Swarm and Kubernetes, and their extensive use of techniques like rolling updates and auto-scaling, environments are becoming increasingly dynamic. This increases the strain on metrics collection systems. To deal with the challenges, a new storage engine has been developed from scratch, bringing a sharp increase in performance and enabling new features. This talk will describe this new storage engine, its architecture, its data structures, and explain why and how it is well suited to gracefully handle high turnover rates of monitoring targets and provide consistent query performance.
This presentation will provide a preview of our new high-level API designed around community feedback and built on the solid foundation of Hector client internals currently in use by a number of production systems. A brief introduction to the existing Hector client will be included to accomadate new users.
This document discusses tuning Solr for log search and analysis. It provides the results of baseline tests on Solr performance and capacity indexing 10 million logs. Various configuration changes are then tested, such as using time-based collections, DocValues, commit settings, and hardware optimizations. Using tools like Apache Flume to preprocess logs before indexing into Solr is also recommended for improved throughput. Overall, the document emphasizes that software and hardware optimizations can significantly improve Solr performance and capacity when indexing logs.
This document discusses centralized and unified logging. It describes how Fluentd provides a pluggable architecture for collecting, transporting, storing, analyzing, and alerting on logs from various sources in a centralized and scalable way. Examples are given of using Fluentd plugins to collect Apache logs, parse and enrich the data, forward to multiple outputs like Elasticsearch and Graphite, and more.
Perl provides tools like perldoc, cpan, and Perl::Tidy to help developers work more efficiently. One-liners allow running Perl commands and programs directly from the command line. ExtUtils::Command provides functions that emulate common shell commands to make Perl scripts more portable. Perl::Tidy can reformat code to make it more readable.
While delivering VoIP solutions to customers for more than ten years, at sipgate we have gained experience in monitoring our VoIP setup. The talk will give an insight on how to monitor Asterisk, Kamailio, Yate and other vital parts of our setup through standard checks and own scripts. We will not only show how to monitor standard SIP, but also how to detect bottlenecks and misfunctions.
Small Node.js proxy to turn a paginated JSON REST API into a CSV streaming download. Examples of code and patterns. Presented at the London Node User Group meetup, April 2014
The document discusses various techniques for profiling CPU and memory performance in Rust programs, including: - Using the flamegraph tool to profile CPU usage by sampling a running process and generating flame graphs. - Integrating pprof profiling into Rust programs to expose profiles over HTTP similar to how it works in Go. - Profiling heap usage by integrating jemalloc profiling and generating heap profiles on program exit. - Some challenges with profiling asynchronous Rust programs due to the lack of backtraces. The key takeaways are that there are crates like pprof-rs and techniques like jemalloc integration that allow collecting CPU and memory profiles from Rust programs, but profiling asynchronous programs
This document discusses Go programming patterns and best practices presented by MegaEase, an enterprise cloud native architecture provider. It covers topics like slices, interfaces, performance optimization, and common Go mistakes. Examples are provided to demonstrate slice internals, deep comparison, interface patterns, and how to check interface compliance.
A monitoring system is arguably the most crucial system to have in place when administering and tweaking the performance of any database system. DBAs also find themselves with a variety of monitoring systems and plugins to use; ranging from small scripts in cron to complex data collection systems. In this talk, I’ll discuss how Box made a shift from the Cacti monitoring system and other various shell scripts to OpenTSDB and the changes made to our servers and daily interaction with monitoring to increase our agility in identifying and addressing changes in database behavior.
This document discusses Circuit, a lightweight cluster operating system. It provides a real-time API to view and control hosts, processes, and containers. The API allows traversal and manipulation of the cluster as a unified namespace. The document outlines the API, including command line usage and a Go client package. It then describes how to build a job scheduler service using the Circuit API, including designing the state, handling events, and running jobs on hosts. The vision is for Circuit to enable easy sharing of systems and for any program to take on different roles by executing as a recursive process tree on the cluster.
The document provides an overview and examples of data modeling techniques for Cassandra. It discusses four use cases - shopping cart data, user activity tracking, log collection/aggregation, and user form versioning. For each use case, it describes the business needs, issues with a relational database approach, and provides the Cassandra data model solution with examples in CQL. The models showcase techniques like de-normalizing data, partitioning, clustering, counters, maps and setting TTL for expiration. The presentation aims to help attendees properly model their data for Cassandra use cases.
Presentation on Cassandra indexing techniques at Cassandra Summit SF 2011. See video at http://blip.tv/datastax/indexing-in-cassandra-5495633
1) The document discusses microservices and REST architectures. It defines microservices as small, focused pieces of software that are independently developed and deployed. 2) REST is described as an architectural style using HTTP as a stateless protocol and uniform interfaces to access resources. The key constraints of REST like client-server, statelessness and cacheability are explained. 3) The document advocates for building microservices that expose functionality through RESTful APIs and HTTP to allow independent development and deployment of services.
CQL is the query language for Apache Cassandra that provides an SQL-like interface. The document discusses the evolution from the older Thrift RPC interface to CQL and provides examples of modeling tweet data in Cassandra using tables like users, tweets, following, followers, userline, and timeline. It also covers techniques like denormalization, materialized views, and batch loading of related data to optimize for common queries.
The document discusses how the choice of storage is critical for Cassandra deployments. It summarizes that SSDs are generally the best choice as they have no moving parts, resulting in much faster performance compared to HDDs. Specifically, SSDs can eliminate issues caused by disk seeks and allow the use of compaction strategies like leveled compaction that require lower seek times. The document provides measurements showing SSDs are up to 100x faster than HDDs for read/write speeds and latency. It recommends choosing local SSD storage in a JBOD configuration when possible for best performance and manageability.
This document summarizes how Cassandra Query Language (CQL) works under the hood compared to previous Cassandra APIs. It explains that while CQL provides a SQL-like interface, the underlying data model and storage remain the same. CQL addresses issues with prior APIs like Thrift by introducing a common query language, supporting cursors to avoid loading entire result sets into memory, and standardizing schema definitions and features across clients. The document also describes how CQL queries map to the underlying storage layout using concepts like partition keys, clustering columns, and composite keys to organize data across partitions and determine retrieval order.
This document summarizes Eric Evans' presentation on using Cassandra as the backend for Wikimedia's content API. It discusses Wikimedia's goals of providing free knowledge, key metrics about Wikipedia and its architecture. It then focuses on how Wikimedia uses Cassandra, including their data model, compression techniques, and ongoing work to optimize compaction strategies and reduce node density to improve performance.
Among the resources offered by Wikimedia is an API providing low-latency access to full-history content, in many formats. Its results are often the product of computationally intensive transforms, and must be pre-generated and stored to meet latency expectations. Unsurprisingly, there are many challenges to providing low-latency access to such a large data-set, in a demanding, globally distributed environment. This presentation covers the Wikimedia content API and its use of Apache Cassandra as storage for a diverse and growing set of use-cases. Trials, tribulations, and triumphs, of both a development and operational nature will be discussed.
Webinaire Banque / Assurance Reprenez le pouvoir sur vos données
Webinar Degetel DataStax du 15 octobre 2015 Du SQL au NoSQL : Pourquoi ? Différences ? Comment ça marche ?
This document discusses using Apache Cassandra to store and retrieve time series data more efficiently than the traditional RRDTool approach. It describes how Cassandra is well-suited for time series data due to its high write throughput, ability to store data sorted on disk, and partitioning and replication. The document also outlines a data model for storing time series metrics in Cassandra and discusses Newts, an open source time series data store built on Cassandra.