This document discusses HDFS high availability using Journal Nodes. It describes how Journal Nodes provide a write ahead log to synchronize data between an active and standby NameNode. The architecture involves Journal Nodes durably logging metadata operations to tolerate NameNode failures. An automatic failover process uses ZooKeeper for elections to transition the standby NameNode to an active state when the active NameNode fails.
Apache Pulsar: Why Unified Messaging and Streaming Is the Future - Pulsar Sum...StreamNative
Data insights and data-driven strategies create the competitive differentiators companies thrive off today. The need for unified messaging and streaming has never been more apparent.
Pulsar started with the goal of building a global, geo-replicated infrastructure to serve Yahoo!’s messaging needs. With the increased need to process both business events (such as payment request, billing request) and operational events (such as log data, click events, etc), the team at Yahoo! set out to build a true unified infrastructure platform to handle all in-motion data. That technology became Apache Pulsar.
In this talk, Matteo Merli and Sijie Guo will dive into the landscape of unified messaging and streaming, how Pulsar helps companies achieve this vision, and what the future of Pulsar will look like.
In this session, you'll learn how RBD works, including how it:
Uses RADOS classes to make access easier from user space and within the Linux kernel.
Implements thin provisioning.
Builds on RADOS self-managed snapshots for cloning and differential backups.
Increases performance with caching of various kinds.
Uses watch/notify RADOS primitives to handle online management operations.
Integrates with QEMU, libvirt, and OpenStack.
The document provides an overview of performance tuning Apache Tomcat, including adjusting logging configuration to reduce duplicate logs, understanding how TCP and HTTP protocols impact performance, choosing an optimal connector (BIO, NIO, or APR) based on the application workload, and configuring connectors to optimize throughput and request processing.
BlueStore: a new, faster storage backend for CephSage Weil
Traditionally Ceph has made use of local file systems like XFS or btrfs to store its data. However, the mismatch between the OSD's requirements and the POSIX interface provided by kernel file systems has a huge performance cost and requires a lot of complexity. BlueStore, an entirely new OSD storage backend, utilizes block devices directly, doubling performance for most workloads. This talk will cover the motivation a new backend, the design and implementation, the improved performance on HDDs, SSDs, and NVMe, and discuss some of the thornier issues we had to overcome when replacing tried and true kernel file systems with entirely new code running in userspace.
Oracle Real Application Clusters 19c- Best Practices and Internals- EMEA Tour...Sandesh Rao
In this session, I will cover under-the-hood features that power Oracle Real Application Clusters (Oracle RAC) 19c specifically around Cache Fusion and Service management. Improvements in Oracle RAC helps in integration with features such as Multitenant and Data Guard. In fact, these features benefit immensely when used with Oracle RAC. Finally we will talk about changes to the broader Oracle RAC Family of Products stack and the algorithmic changes that helps quickly detect sick/dead nodes/instances and the reconfiguration improvements to ensure that the Oracle RAC Databases continue to function without any disruption
a Secure Public Cache for YARN Application ResourcesDataWorks Summit
This document discusses YARN's shared cache feature for application resources. It provides an overview of how YARN localizes resources for each application and containers. The shared cache aims to address inefficiencies in this process by caching identical resources on NodeManagers and sharing them between applications and containers. The design goals are for the shared cache to be scalable, secure, fault-tolerant and transparent. It works by having a shared cache client interface with a shared cache manager that maintains metadata and persisted resources. This can significantly reduce data transfer and localization costs for applications that reuse common resources.
Oracle RAC 19c - the Basis for the Autonomous DatabaseMarkus Michalewicz
Oracle Real Application Clusters (RAC) has been Oracle's premier database availability and scalability solution for more than two decades as it provides near linear horizontal scalability without the need to change the application code. This session explains why Oracle RAC 19c is the basis for Oracle's Autonomous Database by introducing some of its latest features, some of which were specifically designed for ATP-D, as well as by taking a peek under the hood of the dedicated Autonomous Database Service (ATP-D).
Druid: Sub-Second OLAP queries over Petabytes of Streaming DataDataWorks Summit
When interacting with analytics dashboards in order to achieve a smooth user experience, two major key requirements are sub-second response time and data freshness. Cluster computing frameworks such as Hadoop or Hive/Hbase work well for storing large volumes of data, although they are not optimized for ingesting streaming data and making it available for queries in realtime. Also, long query latencies make these systems sub-optimal choices for powering interactive dashboards and BI use-cases.
In this talk we will present Druid as a complementary solution to existing hadoop based technologies. Druid is an open-source analytics data store, designed from scratch, for OLAP and business intelligence queries over massive data streams. It provides low latency realtime data ingestion and fast sub-second adhoc flexible data exploration queries.
Many large companies are switching to Druid for analytics, and we will cover how druid is able to handle massive data streams and why it is a good fit for BI use cases.
Agenda -
1) Introduction and Ideal Use cases for Druid
2) Data Architecture
3) Streaming Ingestion with Kafka
4) Demo using Druid, Kafka and Superset.
5) Recent Improvements in Druid moving from lambda architecture to Exactly once Ingestion
6) Future Work
This document provides an overview of the Hitachi Content Platform (HCP) architecture. It describes HCP as a secure, simple and smart web-scale object storage platform that can scale from 4TB to unlimited capacity. It supports a variety of use cases including archiving, regulatory compliance, backup reduction, cloud applications, unstructured data management, and file sync and share. Key features of HCP include unprecedented capacity scaling, multi-protocol access, hybrid storage pools, strong security, extensive metadata and search capabilities, and global access topology.
How to convert schema to pluggable database to increase isolation. - Presentation - Advantages - Demo
Benefits of pluggable database for upgrade process. - To new platform - To new hardware -
Ceph is an open-source distributed storage platform that provides file, block, and object storage in a single unified system. It uses a distributed storage component called RADOS that provides reliable and scalable storage through data replication and erasure coding across commodity hardware. Higher-level services like RBD provide virtual block devices, RGW provides S3-compatible object storage, and CephFS provides a distributed file system.
The document summarizes Apache Phoenix and HBase as an enterprise data warehouse solution. It discusses how Phoenix provides OLTP and analytics capabilities over HBase. It then covers various use cases where companies are using Phoenix and HBase, including for web analytics and time series data. Finally, it discusses optimizations that can be made to the schema design, queries, and writes in Phoenix to improve performance.
The document summarizes Fred Hutch's transition to using BeeGFS for their file storage needs over recent years:
1) They initially deployed a scale-out NAS in 2012 to consolidate storage for hundreds of terabytes of data across multiple research groups.
2) In 2014, they deployed BeeGFS for scratch storage for up to 500TB of data across 10 research groups with 100% uptime for 3 years.
3) By 2019, they migrated all 150 research groups and over 700 million files totaling 2 petabytes to a redundant BeeGFS deployment for higher reliability, with backup to the cloud.
This document provides an overview of Oracle GoldenGate 12c, a heterogeneous replication tool. It describes GoldenGate's key features like real-time data integration and query offloading. The document outlines GoldenGate's topologies, architecture, supported databases, and data types. It compares GoldenGate to Oracle Streams and details new features in 12c like optimized capture methods and improved high availability. Basic concepts are explained, such as classic and integrated capture, downstream and bi-directional replication. Restrictions on data types and database features are also noted.
Ceph Object Storage Reference Architecture Performance and Sizing GuideKaran Singh
Together with my colleagues at Red Hat Storage Team, i am very proud to have worked on this reference architecture for Ceph Object Storage.
If you are building Ceph object storage at scale, this document is for you.
در بیگ دیتا عموما با دادههایی سر و کار داریم که حجم آنها بیشتر از
. ظرفیت نرمافزارهای عادی است و بنابراین، فرآیند تحلیل، فرآیندی مشکل است
مقیاس بزرگداده، به طور مداوم در حال رشد از محدوده چند ۱۰ ترابایت به چندین پتابایت، در یک مجموعه داده واحد است..
Big Data Processing in Cloud Computing EnvironmentsFarzad Nozarian
This is my Seminar presentation, adopted from a paper with the same name (Big Data Processing in Cloud Computing Environments), and it is about various issues of Big Data, from its definitions and applications to processing it in cloud computing environments. It also addresses the Big Data technologies and focuses on MapReduce and Hadoop.
یک چارچوب متن باز برای ذخیره سازی و پردازش داده های بزرگ است.
این تکنولوژی با ترکیب و توزیع داده به ذخیره سازی آن می پردازد و به زبان جاوا پیاده سازی شده است.
اینترنت اشیا و چالش های امنیتی پیش رو
معرفی فناوری اینترنت اشیا و کاربردهای آن و چالش های امنیتی و نحوه مقابله با آنها
فیلم وبینار مربوطه در:
http://www.quickheal.co.ir/webinar/webinar-videos/
ارائه در زمینه کلان داده،
کارگاه آموزشی "عصر کلان داده، چرا و چگونه؟" در بیست و دومین کنفرانس انجمن کامپیوتر ایران csicc2017.ir
وحید امیری
vahidamiry.ir
datastack.ir
درباره ی فناوری ها و پارادایم های نوظهور در کتابخانه ها و کتابداری - ارائه شده برای کتابداران کتابخانه های شهرداری تهران
در دو بخش پارادایم ها و بخش مهارت ها
نحوه نمایه سازی و ایندکسینگ در موتور جستجوی گوگل و دیگر موتورهای جستجو.
به همراه توضیحات تکملی در قسمت Note مربوط به هر اسلاید
تهیه شده در بهار 1395
جواد پورحسینی
برنا یک نرم افزار پیشرفته جهت مدیریت و گزارش گیری ازActive Directory می باشد. این نرم افزار در سه حوزه مدیریت کاربران و کامپیوتر های دامین، گزارش گیری از اکتیو دایرکتوری و مانیتورینگ امنیت اکتیودایرکتوری، نیاز های مدیران فناوری اطلاعات و راهبران شبکه را پوشش می دهد. برنا تحت وب بوده و امکان مدیریت چندین دامین مختلف را به صورت متمرکز فراهم می سازد.
برنا یک نرم افزار پیشرفته جهت مدیریت و گزارش گیری ازActive Directory می باشد. این نرم افزار در سه حوزه مدیریت کاربران و کامپیوتر های دامین، گزارش گیری از اکتیو دایرکتوری و مانیتورینگ امنیت اکتیودایرکتوری، نیاز های مدیران فناوری اطلاعات و راهبران شبکه را پوشش می دهد. برنا تحت وب بوده و امکان مدیریت چندین دامین مختلف را به صورت متمرکز فراهم می سازد.
توسعه نرمافزارهای مقیاسپذیر بر اساس معماری ریزسرویسها (Microservices) و اجر...Web Standards School
معماری ریزسرویسها رویکردی در جهت ماژولار کردن نرم افزار است. یک مفهوم قدیمی اما با تعاریف جدید و مدرن. در این ارائه به معرفی این معماری، مزایا و چالشهای آن، نحوه پیادهسازی، تست و استقرار آن در بستر ابری خواهم پرداخت.
سیستم عامل مدیریت شبکه
هدف درس: کسب مهارت شروع کار با سیستم عامل های مدیریت شبکه
مباحث اصلی:
1- مفاهیم اولیه:
تعریف سیستم عامل مدیریت شبکه و مفاهیم مربوطه
مفاهیم مربوط به اشتراک گذاری منابع و امنیت آن
انواع روشهای به اشتراک گذاری منابع و امنیت آن
2- Active Directory
معرفی ابزار Active Directory، نگهداری و پشتیبانی و ...
3- ISA Server، TMG ، UAG و Windows Server 2012 R2 feature Web Application Proxy
4- معرفی پیکربندی های مختلف
پیکربندی Firewall، Cache Server، Proxy Server، VPN و ...
Data: توصیفی از بعضی چیزها که امکان سازماندهی، ضبط و آنالیز آنها وجود دارد.
Datafy : دیتافای کردن یک پدیده یعنی بتوان این پدیده را در یک فرمت کمیتپذیر (قابل اندازهگیری) قرار داد تا بتوان آن را جدول بندی (مرتب سازی بصورت جدول) و آنالیز کرد.
دیتافیکیشن (Datafication): یعنی از زیرخاک درآوردن دادههایی از مواد (Materials) که هیچ کس فکر نمیکرد هیچ ارزشی پیدا کنند.