The document provides an introduction to the Neo4j graph database. It discusses trends driving the adoption of NoSQL databases like increasing data size, connectedness of data, semi-structured data, and changing application architectures. It categorizes NoSQL databases and describes key features of key-value stores, column family databases, document databases, and graph databases. The remainder focuses on graph databases and their suitability for interconnected data, introduces Neo4j as an example graph database, and discusses getting started with Neo4j including its core APIs.
Jamaica Personal Income Tax Guide 2016 Edition (1)
The document provides guidance on Jamaica's personal income tax rates and thresholds for 2016-2017, which were increased from the previous levels. Key points include:
- For 2016, the threshold was increased to $1,000,272 from July 1, and the tax rate above $6 million increased to 30% for the latter half of the year.
- For 2017, the threshold further increased to $1,500,096 from April 1.
- Worked examples are provided to illustrate the tax calculations and potential refunds for individuals under the new thresholds.
- Guidance is given for applying the changes for employed and self-employed individuals for the dual tax periods in 2016.
TDD подход к разработке зарекомендовал себя как очень надежный и быстрый способ реализовать задачи бизнеса с помощью программного кода. Но большая часть примеров на тренингах и в интернете показывает как применять TDD в очень простых ситуациях для кода вида вход/выход или с использованием заглушек для простых зависимостей. А как насчет осталь��ых областей разработки приложения как интеграция с БД? Возможно ли применить TDD к ним? Что даст в этом случае TDD разработчику? Я попробую в своем докладе ответить на эти вопросы и покажу на практических примерах как может быть полезен подход TDD для кода интеграции с БД, как он уменьшает риски и открывает двери для техник рефакторинга БД. В качестве бонуса будут затронуты некоторые NoSQL решения, что должно сделать тему еще популярнее!
P.S. Все примеры будут демонстрироваться на Java.
The Pomodoro Technique is a time management method developed by Francesco Cirillo in the late 1980s. The technique uses a timer to break down work into intervals traditionally 25 minutes in length, separated by short breaks.
Presentation by TachyonNexus & Intel at Strata Singapore 2015
Make Tachyon Ready for Next-Gen Data Center Platforms with NVM.
The talk was presented at Strata Singapore, December 2015, focusing on using Tachyon Tiered Storage with NVM as the next generation data center platforms.
This document summarizes a presentation about Tachyon, an open source memory-centric distributed storage system. It introduces Tachyon and how it can be used with Spark to resolve issues around slow data sharing, in-memory data loss during crashes, and data duplication. The presentation outlines new features in Tachyon 0.8.0 like tiered storage, pluggable data management policies, and a unified namespace across storage systems. It concludes by inviting users and collaborators to try, develop, and get involved with the Tachyon community.
This document summarizes Spark's development over the past 12 months and provides a look ahead. It discusses improvements to both the frontend, such as DataFrames and machine learning pipelines, and the backend through projects like Tungsten for performance optimizations. Going forward, it mentions new features like the Dataset API, streaming DataFrames, and potential hardware improvements from technologies like 3D XPoint memory. The overall goal is to provide a unified engine and APIs that can automatically optimize analytics workloads across languages and domains.
Great functional testing with WebDriver and Thucydides
Presentation from online conference ConfeT&QA (October 2012) and Selenium Camp 2013 (February 2013) about techniques and approaches to create great functional automated tests.
Ceph at Work in Bloomberg: Object Store, RBD and OpenStack
Bloomberg's Chris Jones and Chris Morgan joined Red Hat Storage Day New York on 1/19/16 to explain how Red Hat Ceph Storage helps the financial giant tackle its data storage challenges.
Epiphany: Connecting Millions of Events to Thirty Billion Data Points in Real...
This document describes Epiphany, Rocket Fuel's real-time attribution platform. It connects millions of events to 50 billion data points to attribute conversions across devices and algorithms. It uses HBase to lookup impressions in milliseconds. Data flows from actions keyed by user/impression/conversion days to HBase and Hive tables. It enables idempotent attribution across advertisers and algorithms at scale in real-time.
Alluxio Use Cases at Strata+Hadoop World Beijing 2016
1) Alluxio is an open-source virtual distributed storage system that provides memory-speed access to data across various storage platforms including HDFS, S3, and Swift.
2) Alluxio was presented as having four main use cases - using off-heap memory to alleviate resource pressure, enabling fast data sharing between jobs, accelerating access to remote storage, and providing a unified namespace across different storage systems.
3) Case studies demonstrated that Alluxio improved performance and enabled new workflows, with speedups of 15-300x reported for different customers including Barclays, Qunar, and Baidu.
Spark Summit EU 2015: Lessons from 300+ production users
At Databricks, we have a unique view into over a hundred different companies trying out Spark for development and production use-cases, from their support tickets and forum posts. Having seen so many different workflows and applications, some discernible patterns emerge when looking at common performance and scalability issues that our users run into. This talk will discuss some of these common common issues from an engineering and operations perspective, describing solutions and clarifying misconceptions.
The document summarizes trends observed at CES 2016. It notes that while hardware changes more slowly than software and expectations, CES 2016 showed an evolution in products that were faster, thinner, cheaper and more connected. Key trends included autonomous mobility with self-driving vehicles; collaborative systems as companies partner to create more value; cognitive robotics becoming more human-like; infinite screens as everything becomes a display; mixed reality with virtual and augmented reality gaining momentum; and diagnostic wearables that closely monitor health metrics.
Alluxio (formerly Tachyon) provides a unified namespace and tiered storage that allows data to be shared across clusters at memory speed. It is a virtual distributed storage system with a memory-centric architecture that abstracts persistent storage from applications. Alluxio enables data sharing between frameworks by allowing inter-process sharing at memory speed rather than being slowed by network or disk I/O. It also provides data resilience during application crashes by allowing processes to re-read data from memory I/O rather than network or disk I/O. Alluxio further allows consolidating memory usage across applications by preventing data duplication at the memory level.
The document discusses different architects and what they find architecture in, including nature, curving forms, simple geometries, form and function, materials, details, volume, light, technology, and sustainability. It asks what architecture is, stating that architecture is simply a need that is developed through an innovative design for the future. The document encourages sharing ideas to lead to better ideas, and remembering that architecture is more than buildings but is the essence of life.
Rebeca González Eriksen tiene una amplia experiencia en investigación clínica y dietética. Ha obtenido doctorados en medicina y nutrigenómica del Imperial College London y maestrías en nutrición y salud pública del London School of Hygiene & Tropical Medicine. Actualmente trabaja como dietista clínica en Marbella, España después de haber ocupado cargos de investigación en varias universidades e instituciones médicas en Reino Unido, Ghana y España.
An Introduction to NOSQL, Graph Databases and Neo4j
Neo4j is a graph database that stores data in nodes and relationships. It allows for efficient querying of connected data through graph traversals. Key aspects include nodes that can contain properties, relationships that connect nodes and also contain properties, and the ability to navigate the graph through traversals. Neo4j provides APIs for common graph operations like creating and removing nodes/relationships, running traversals, and managing transactions. It is well suited for domains that involve connected, semi-structured data like social networks.
This document provides an overview of NoSQL databases and their characteristics. It discusses the different eras of databases and pressures that led to the rise of NoSQL databases. It then categorizes and describes the different types of NoSQL databases, including key-value stores, document stores, column family stores, and graph databases. Specific examples like MongoDB, Cassandra, HBase, Neo4j are also outlined. The document emphasizes that the type of database chosen should depend on the problem to be solved and characteristics of the data.
This document provides an overview of NoSQL databases, including a brief history, classifications, pros and cons of usage, and trends. It discusses how NoSQL technologies originated from distributed computing needs and were driven by scalability, parallelization, and costs. Major classifications of NoSQL databases are described as column-oriented stores, key-value stores, document stores, and graph databases. Examples like MongoDB, Cassandra, and Neo4j are outlined. Both benefits and limitations of NoSQL are presented. Emerging trends around SQL access and adoption of Hadoop are also noted.
This document discusses trends driving the adoption of NoSQL databases, including increasing data size, connectivity of information, semi-structured data, and distributed application architectures. It describes four categories of NoSQL databases - aggregate-oriented, key-value stores, column family (BigTable), and document databases - and provides examples and comparisons of their pros and cons.
How to use NoSQL in Enterprise Java Applications - NoSQL Roadshow Zurich
This document discusses how to use NoSQL databases in enterprise Java applications. It provides an overview of Spring Data, an open source framework that supports NoSQL and SQL databases. Spring Data provides common infrastructure and repositories to access data stores like MongoDB, Redis, and Neo4J. The presentation includes an example of using Spring Data to access MongoDB, with annotations for entities, configuration for the data store, and repositories for data access. Attendees are encouraged to try Spring Data with a data model that matches their data.
Introduction to Big Data and NoSQL.
This presentation was given to the Master DBA course at John Bryce Education in Israel.
Work is based on presentations by Michael Naumov, Baruch Osoveskiy, Bill Graham and Ronen Fidel.
HBase is a distributed, scalable, big data store that provides fast lookup capabilities like Google BigTable. It uses a table-like data structure with rows indexed by a key and stores data in columns grouped by families. HBase is designed to operate on top of Hadoop HDFS for scalability and high availability. It allows for fast lookups, full table scans, and range scans across large datasets distributed across clusters of commodity servers.
This document discusses Grails integration with Neo4j graph databases. It begins with an introduction to graph databases and Neo4j. It then covers the Grails Neo4j plugin which allows using Neo4j as the persistence layer for Grails domain classes. Finally, it addresses some challenges in mapping the Grails domain model to the Neo4j nodespace and potential solutions.
Traackr evaluated several NoSQL database options to store its heterogeneous, unstructured web data. Document databases were the best fit due to their flexibility to store variable length text like tweets and blog posts without predefined schemas. MongoDB was selected due to its maturity, adoption, and support for ad-hoc queries and batch processing needed by Traackr in early 2010.
This was presented at NHN on Jan. 27, 2009.
It introduces Big Data, its storages, and its analyses.
Especially, it covers MapReduce debates and hybrid systems of RDBMS and MapReduce.
In addition, in terms of Schema-Free, various non-relational data storages are explained.
How to Get Started with Your MongoDB Pilot Project
Open source, high performance database MongoDB can be used for a pilot project. The document discusses finding a non-critical initial project, getting experience with MongoDB, benchmarking performance, and presenting the business case for broader use. It also outlines steps for moving a successful pilot to production, including using MongoDB's auto-sharding, replication, and commercial support options.
Disclaimer :
The images, company, product and service names that are used in this presentation, are for illustration purposes only. All trademarks and registered trademarks are the property of their respective owners.
Data/Image collected from various sources from Internet.
Intention was to present the big picture of Big Data & Hadoop
Slides for the talk at AI in Production meetup:
https://www.meetup.com/LearnDataScience/events/255723555/
Abstract: Demystifying Data Engineering
With recent progress in the fields of big data analytics and machine learning, Data Engineering is an emerging discipline which is not well-defined and often poorly understood.
In this talk, we aim to explain Data Engineering, its role in Data Science, the difference between a Data Scientist and a Data Engineer, the role of a Data Engineer and common concepts as well as commonly misunderstood ones found in Data Engineering. Toward the end of the talk, we will examine a typical Data Analytics system architecture.
Life science databases are sometimes difficult to understand due to lack of information. I'd like to add metadata into databases and improve search results.
NoSQL databases provide an alternative to traditional relational databases that is well-suited for large datasets, high scalability needs, and flexible, changing schemas. NoSQL databases sacrifice strict consistency for greater scalability and availability. The document model is well-suited for semi-structured data and allows for embedding related data within documents. Key-value stores provide simple lookup of data by key but do not support complex queries. Graph databases effectively represent network-like connections between data elements.
"Get Ready for Big Data" presentation from Gilbane Boston 2011; for more details, see http://gilbaneboston.com/conference_program.html#t2 and http://pbokelly.blogspot.com/2011/12/gilbane-boston-2011-big-data.html
This document provides an introduction to NoSQL databases. It discusses the history and limitations of relational databases that led to the development of NoSQL databases. The key motivations for NoSQL databases are that they can handle big data, provide better scalability and flexibility than relational databases. The document describes some core NoSQL concepts like the CAP theorem and different types of NoSQL databases like key-value, columnar, document and graph databases. It also outlines some remaining research challenges in the area of NoSQL databases.
This document discusses recommendations engines that use graph databases like Neo4j. It introduces GraphAware, an open-source recommendation engine plugin for Neo4j. The document outlines the business and technical challenges of building recommendation engines, and how GraphAware addresses these challenges through its flexible, high-performance architecture and APIs. It provides an example of building a simple friend recommendation engine using GraphAware.
Advanced Neo4j Use Cases with the GraphAware Framework
The document discusses GraphAware Framework, which makes it easy to build, test, and deploy custom APIs, transaction-driven behavior, and asynchronous computation functionality for Neo4j. It provides examples like representing time series data, tracking graph changes, assigning UUIDs, and running algorithms. GraphAware Framework is open source and supports building both generic and domain-specific Neo4j extensions.
The document discusses the GraphAware Framework, which allows developers to build custom APIs, transaction-driven behavior, and asynchronous computations for Neo4j. It provides examples like the TimeTree module for storing and querying time series data and a change feed module for tracking graph changes. The framework makes it easy to build, test, and deploy these advanced functionalities for Neo4j.
Modelling Data in Neo4j, bidirectional relationships, qualifying relationships with properties vs. relationship types (performance comparison), Neo4j hardware sizing, Cypher vs. Java API
This document discusses graph theory and its applications to data science. It provides examples of social and technological networks that can be represented as graphs, and covers graph theory concepts like connected components, triadic closure, structural balance, and centrality measures. Neo4j is presented as an open-source graph database that allows storing and querying graph data using the Cypher query language.
論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...
Jindong Gu, Zhen Han, Shuo Chen, Ahmad Beirami, Bailan He, Gengyuan Zhang, Ruotong Liao, Yao Qin, Volker Tresp, Philip Torr "A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models" arXiv2023
https://arxiv.org/abs/2307.12980
An invited talk given by Mark Billinghurst on Research Directions for Cross Reality Interfaces. This was given on July 2nd 2024 as part of the 2024 Summer School on Cross Reality in Hagenberg, Austria (July 1st - 7th)
YOUR RELIABLE WEB DESIGN & DEVELOPMENT TEAM — FOR LASTING SUCCESS
WPRiders is a web development company specialized in WordPress and WooCommerce websites and plugins for customers around the world. The company is headquartered in Bucharest, Romania, but our team members are located all over the world. Our customers are primarily from the US and Western Europe, but we have clients from Australia, Canada and other areas as well.
Some facts about WPRiders and why we are one of the best firms around:
More than 700 five-star reviews! You can check them here.
1500 WordPress projects delivered.
We respond 80% faster than other firms! Data provided by Freshdesk.
We’ve been in business since 2015.
We are located in 7 countries and have 22 team members.
With so many projects delivered, our team knows what works and what doesn’t when it comes to WordPress and WooCommerce.
Our team members are:
- highly experienced developers (employees & contractors with 5 -10+ years of experience),
- great designers with an eye for UX/UI with 10+ years of experience
- project managers with development background who speak both tech and non-tech
- QA specialists
- Conversion Rate Optimisation - CRO experts
They are all working together to provide you with the best possible service. We are passionate about WordPress, and we love creating custom solutions that help our clients achieve their goals.
At WPRiders, we are committed to building long-term relationships with our clients. We believe in accountability, in doing the right thing, as well as in transparency and open communication. You can read more about WPRiders on the About us page.
While new parents are often consumed with spending time with their new babies, Miguel Aliaga wants to make sure they have the finance tips they need to ensure a financially secure future for their child.
Mobile Strategy Partners 2010 Mobile Banking Summit Workshop PresentationDavid Eads
This document provides an overview of mobile banking for financial institutions. It discusses why institutions implement mobile banking and how mobile affects the entire organization. Key points include measuring adoption and success, understanding the mobile landscape and technologies, connecting to existing infrastructure, considerations for offline customers, and working with existing partners or building solutions internally. The document emphasizes having a clear vision and business case to guide mobile decisions and strategies.
Jamaica Personal Income Tax Guide 2016 Edition (1)Dawgen Global
The document provides guidance on Jamaica's personal income tax rates and thresholds for 2016-2017, which were increased from the previous levels. Key points include:
- For 2016, the threshold was increased to $1,000,272 from July 1, and the tax rate above $6 million increased to 30% for the latter half of the year.
- For 2017, the threshold further increased to $1,500,096 from April 1.
- Worked examples are provided to illustrate the tax calculations and potential refunds for individuals under the new thresholds.
- Guidance is given for applying the changes for employed and self-employed individuals for the dual tax periods in 2016.
TDD подход к разработке зарекомендовал себя как очень надежный и быстрый способ реализовать задачи бизнеса с помощью программного кода. Но большая часть примеров на тренингах и в интернете показывает как применять TDD в очень простых ситуациях для кода вида вход/выход или с использованием заглушек для простых зависимостей. А как насчет остальных областей разработки приложения как интеграция с БД? Возможно ли применить TDD к ним? Что даст в этом случае TDD разработчику? Я попробую в своем докладе ответить на эти вопросы и покажу на практических примерах как может быть полезен подход TDD для кода интеграции с БД, как он уменьшает риски и открывает двери для техник рефакторинга БД. В качестве бонуса будут затронуты некоторые NoSQL решения, что должно сделать тему еще популярнее!
P.S. Все примеры будут демонстрироваться на Java.
The Pomodoro Technique is a time management method developed by Francesco Cirillo in the late 1980s. The technique uses a timer to break down work into intervals traditionally 25 minutes in length, separated by short breaks.
Presentation by TachyonNexus & Intel at Strata Singapore 2015Tachyon Nexus, Inc.
Make Tachyon Ready for Next-Gen Data Center Platforms with NVM.
The talk was presented at Strata Singapore, December 2015, focusing on using Tachyon Tiered Storage with NVM as the next generation data center platforms.
This document summarizes a presentation about Tachyon, an open source memory-centric distributed storage system. It introduces Tachyon and how it can be used with Spark to resolve issues around slow data sharing, in-memory data loss during crashes, and data duplication. The presentation outlines new features in Tachyon 0.8.0 like tiered storage, pluggable data management policies, and a unified namespace across storage systems. It concludes by inviting users and collaborators to try, develop, and get involved with the Tachyon community.
Spark Summit EU 2015: Reynold Xin KeynoteDatabricks
This document summarizes Spark's development over the past 12 months and provides a look ahead. It discusses improvements to both the frontend, such as DataFrames and machine learning pipelines, and the backend through projects like Tungsten for performance optimizations. Going forward, it mentions new features like the Dataset API, streaming DataFrames, and potential hardware improvements from technologies like 3D XPoint memory. The overall goal is to provide a unified engine and APIs that can automatically optimize analytics workloads across languages and domains.
Great functional testing with WebDriver and ThucydidesMikalai Alimenkou
Presentation from online conference ConfeT&QA (October 2012) and Selenium Camp 2013 (February 2013) about techniques and approaches to create great functional automated tests.
Ceph at Work in Bloomberg: Object Store, RBD and OpenStackRed_Hat_Storage
Bloomberg's Chris Jones and Chris Morgan joined Red Hat Storage Day New York on 1/19/16 to explain how Red Hat Ceph Storage helps the financial giant tackle its data storage challenges.
Epiphany: Connecting Millions of Events to Thirty Billion Data Points in Real...DataWorks Summit
This document describes Epiphany, Rocket Fuel's real-time attribution platform. It connects millions of events to 50 billion data points to attribute conversions across devices and algorithms. It uses HBase to lookup impressions in milliseconds. Data flows from actions keyed by user/impression/conversion days to HBase and Hive tables. It enables idempotent attribution across advertisers and algorithms at scale in real-time.
Alluxio Use Cases at Strata+Hadoop World Beijing 2016Alluxio, Inc.
1) Alluxio is an open-source virtual distributed storage system that provides memory-speed access to data across various storage platforms including HDFS, S3, and Swift.
2) Alluxio was presented as having four main use cases - using off-heap memory to alleviate resource pressure, enabling fast data sharing between jobs, accelerating access to remote storage, and providing a unified namespace across different storage systems.
3) Case studies demonstrated that Alluxio improved performance and enabled new workflows, with speedups of 15-300x reported for different customers including Barclays, Qunar, and Baidu.
Spark Summit EU 2015: Lessons from 300+ production usersDatabricks
At Databricks, we have a unique view into over a hundred different companies trying out Spark for development and production use-cases, from their support tickets and forum posts. Having seen so many different workflows and applications, some discernible patterns emerge when looking at common performance and scalability issues that our users run into. This talk will discuss some of these common common issues from an engineering and operations perspective, describing solutions and clarifying misconceptions.
CES 2016 Trends and Implications - Havas Tom Goodwin
The document summarizes trends observed at CES 2016. It notes that while hardware changes more slowly than software and expectations, CES 2016 showed an evolution in products that were faster, thinner, cheaper and more connected. Key trends included autonomous mobility with self-driving vehicles; collaborative systems as companies partner to create more value; cognitive robotics becoming more human-like; infinite screens as everything becomes a display; mixed reality with virtual and augmented reality gaining momentum; and diagnostic wearables that closely monitor health metrics.
Alluxio Presentation at Strata San Jose 2016Jiří Šimša
Alluxio (formerly Tachyon) provides a unified namespace and tiered storage that allows data to be shared across clusters at memory speed. It is a virtual distributed storage system with a memory-centric architecture that abstracts persistent storage from applications. Alluxio enables data sharing between frameworks by allowing inter-process sharing at memory speed rather than being slowed by network or disk I/O. It also provides data resilience during application crashes by allowing processes to re-read data from memory I/O rather than network or disk I/O. Alluxio further allows consolidating memory usage across applications by preventing data duplication at the memory level.
The document discusses different architects and what they find architecture in, including nature, curving forms, simple geometries, form and function, materials, details, volume, light, technology, and sustainability. It asks what architecture is, stating that architecture is simply a need that is developed through an innovative design for the future. The document encourages sharing ideas to lead to better ideas, and remembering that architecture is more than buildings but is the essence of life.
Rebeca González Eriksen tiene una amplia experiencia en investigación clínica y dietética. Ha obtenido doctorados en medicina y nutrigenómica del Imperial College London y maestrías en nutrición y salud pública del London School of Hygiene & Tropical Medicine. Actualmente trabaja como dietista clínica en Marbella, España después de haber ocupado cargos de investigación en varias universidades e instituciones médicas en Reino Unido, Ghana y España.
An Introduction to NOSQL, Graph Databases and Neo4jDebanjan Mahata
Neo4j is a graph database that stores data in nodes and relationships. It allows for efficient querying of connected data through graph traversals. Key aspects include nodes that can contain properties, relationships that connect nodes and also contain properties, and the ability to navigate the graph through traversals. Neo4j provides APIs for common graph operations like creating and removing nodes/relationships, running traversals, and managing transactions. It is well suited for domains that involve connected, semi-structured data like social networks.
This document provides an overview of NoSQL databases and their characteristics. It discusses the different eras of databases and pressures that led to the rise of NoSQL databases. It then categorizes and describes the different types of NoSQL databases, including key-value stores, document stores, column family stores, and graph databases. Specific examples like MongoDB, Cassandra, HBase, Neo4j are also outlined. The document emphasizes that the type of database chosen should depend on the problem to be solved and characteristics of the data.
This document provides an overview of NoSQL databases, including a brief history, classifications, pros and cons of usage, and trends. It discusses how NoSQL technologies originated from distributed computing needs and were driven by scalability, parallelization, and costs. Major classifications of NoSQL databases are described as column-oriented stores, key-value stores, document stores, and graph databases. Examples like MongoDB, Cassandra, and Neo4j are outlined. Both benefits and limitations of NoSQL are presented. Emerging trends around SQL access and adoption of Hadoop are also noted.
This document discusses trends driving the adoption of NoSQL databases, including increasing data size, connectivity of information, semi-structured data, and distributed application architectures. It describes four categories of NoSQL databases - aggregate-oriented, key-value stores, column family (BigTable), and document databases - and provides examples and comparisons of their pros and cons.
How to use NoSQL in Enterprise Java Applications - NoSQL Roadshow ZurichPatrick Baumgartner
This document discusses how to use NoSQL databases in enterprise Java applications. It provides an overview of Spring Data, an open source framework that supports NoSQL and SQL databases. Spring Data provides common infrastructure and repositories to access data stores like MongoDB, Redis, and Neo4J. The presentation includes an example of using Spring Data to access MongoDB, with annotations for entities, configuration for the data store, and repositories for data access. Attendees are encouraged to try Spring Data with a data model that matches their data.
Introduction to Big Data and NoSQL.
This presentation was given to the Master DBA course at John Bryce Education in Israel.
Work is based on presentations by Michael Naumov, Baruch Osoveskiy, Bill Graham and Ronen Fidel.
HBase is a distributed, scalable, big data store that provides fast lookup capabilities like Google BigTable. It uses a table-like data structure with rows indexed by a key and stores data in columns grouped by families. HBase is designed to operate on top of Hadoop HDFS for scalability and high availability. It allows for fast lookups, full table scans, and range scans across large datasets distributed across clusters of commodity servers.
This document discusses Grails integration with Neo4j graph databases. It begins with an introduction to graph databases and Neo4j. It then covers the Grails Neo4j plugin which allows using Neo4j as the persistence layer for Grails domain classes. Finally, it addresses some challenges in mapping the Grails domain model to the Neo4j nodespace and potential solutions.
Traackr evaluated several NoSQL database options to store its heterogeneous, unstructured web data. Document databases were the best fit due to their flexibility to store variable length text like tweets and blog posts without predefined schemas. MongoDB was selected due to its maturity, adoption, and support for ad-hoc queries and batch processing needed by Traackr in early 2010.
This was presented at NHN on Jan. 27, 2009.
It introduces Big Data, its storages, and its analyses.
Especially, it covers MapReduce debates and hybrid systems of RDBMS and MapReduce.
In addition, in terms of Schema-Free, various non-relational data storages are explained.
How to Get Started with Your MongoDB Pilot ProjectDATAVERSITY
Open source, high performance database MongoDB can be used for a pilot project. The document discusses finding a non-critical initial project, getting experience with MongoDB, benchmarking performance, and presenting the business case for broader use. It also outlines steps for moving a successful pilot to production, including using MongoDB's auto-sharding, replication, and commercial support options.
Disclaimer :
The images, company, product and service names that are used in this presentation, are for illustration purposes only. All trademarks and registered trademarks are the property of their respective owners.
Data/Image collected from various sources from Internet.
Intention was to present the big picture of Big Data & Hadoop
Slides for the talk at AI in Production meetup:
https://www.meetup.com/LearnDataScience/events/255723555/
Abstract: Demystifying Data Engineering
With recent progress in the fields of big data analytics and machine learning, Data Engineering is an emerging discipline which is not well-defined and often poorly understood.
In this talk, we aim to explain Data Engineering, its role in Data Science, the difference between a Data Scientist and a Data Engineer, the role of a Data Engineer and common concepts as well as commonly misunderstood ones found in Data Engineering. Toward the end of the talk, we will examine a typical Data Analytics system architecture.
Life Science Database Cross Search and MetadataMaori Ito
Life science databases are sometimes difficult to understand due to lack of information. I'd like to add metadata into databases and improve search results.
NoSQL databases provide an alternative to traditional relational databases that is well-suited for large datasets, high scalability needs, and flexible, changing schemas. NoSQL databases sacrifice strict consistency for greater scalability and availability. The document model is well-suited for semi-structured data and allows for embedding related data within documents. Key-value stores provide simple lookup of data by key but do not support complex queries. Graph databases effectively represent network-like connections between data elements.
"Get Ready for Big Data" presentation from Gilbane Boston 2011; for more details, see http://gilbaneboston.com/conference_program.html#t2 and http://pbokelly.blogspot.com/2011/12/gilbane-boston-2011-big-data.html
This document provides an introduction to NoSQL databases. It discusses the history and limitations of relational databases that led to the development of NoSQL databases. The key motivations for NoSQL databases are that they can handle big data, provide better scalability and flexibility than relational databases. The document describes some core NoSQL concepts like the CAP theorem and different types of NoSQL databases like key-value, columnar, document and graph databases. It also outlines some remaining research challenges in the area of NoSQL databases.
Similar to Neo4j Introduction at Imperial College London (20)
This document discusses recommendations engines that use graph databases like Neo4j. It introduces GraphAware, an open-source recommendation engine plugin for Neo4j. The document outlines the business and technical challenges of building recommendation engines, and how GraphAware addresses these challenges through its flexible, high-performance architecture and APIs. It provides an example of building a simple friend recommendation engine using GraphAware.
Advanced Neo4j Use Cases with the GraphAware FrameworkMichal Bachman
The document discusses GraphAware Framework, which makes it easy to build, test, and deploy custom APIs, transaction-driven behavior, and asynchronous computation functionality for Neo4j. It provides examples like representing time series data, tracking graph changes, assigning UUIDs, and running algorithms. GraphAware Framework is open source and supports building both generic and domain-specific Neo4j extensions.
The document discusses the GraphAware Framework, which allows developers to build custom APIs, transaction-driven behavior, and asynchronous computations for Neo4j. It provides examples like the TimeTree module for storing and querying time series data and a change feed module for tracking graph changes. The framework makes it easy to build, test, and deploy these advanced functionalities for Neo4j.
Modelling Data in Neo4j, bidirectional relationships, qualifying relationships with properties vs. relationship types (performance comparison), Neo4j hardware sizing, Cypher vs. Java API
This document discusses graph theory and its applications to data science. It provides examples of social and technological networks that can be represented as graphs, and covers graph theory concepts like connected components, triadic closure, structural balance, and centrality measures. Neo4j is presented as an open-source graph database that allows storing and querying graph data using the Cypher query language.
論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...Toru Tamaki
Jindong Gu, Zhen Han, Shuo Chen, Ahmad Beirami, Bailan He, Gengyuan Zhang, Ruotong Liao, Yao Qin, Volker Tresp, Philip Torr "A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models" arXiv2023
https://arxiv.org/abs/2307.12980
An invited talk given by Mark Billinghurst on Research Directions for Cross Reality Interfaces. This was given on July 2nd 2024 as part of the 2024 Summer School on Cross Reality in Hagenberg, Austria (July 1st - 7th)
YOUR RELIABLE WEB DESIGN & DEVELOPMENT TEAM — FOR LASTING SUCCESS
WPRiders is a web development company specialized in WordPress and WooCommerce websites and plugins for customers around the world. The company is headquartered in Bucharest, Romania, but our team members are located all over the world. Our customers are primarily from the US and Western Europe, but we have clients from Australia, Canada and other areas as well.
Some facts about WPRiders and why we are one of the best firms around:
More than 700 five-star reviews! You can check them here.
1500 WordPress projects delivered.
We respond 80% faster than other firms! Data provided by Freshdesk.
We’ve been in business since 2015.
We are located in 7 countries and have 22 team members.
With so many projects delivered, our team knows what works and what doesn’t when it comes to WordPress and WooCommerce.
Our team members are:
- highly experienced developers (employees & contractors with 5 -10+ years of experience),
- great designers with an eye for UX/UI with 10+ years of experience
- project managers with development background who speak both tech and non-tech
- QA specialists
- Conversion Rate Optimisation - CRO experts
They are all working together to provide you with the best possible service. We are passionate about WordPress, and we love creating custom solutions that help our clients achieve their goals.
At WPRiders, we are committed to building long-term relationships with our clients. We believe in accountability, in doing the right thing, as well as in transparency and open communication. You can read more about WPRiders on the About us page.
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...Chris Swan
Have you noticed the OpenSSF Scorecard badges on the official Dart and Flutter repos? It's Google's way of showing that they care about security. Practices such as pinning dependencies, branch protection, required reviews, continuous integration tests etc. are measured to provide a score and accompanying badge.
You can do the same for your projects, and this presentation will show you how, with an emphasis on the unique challenges that come up when working with Dart and Flutter.
The session will provide a walkthrough of the steps involved in securing a first repository, and then what it takes to repeat that process across an organization with multiple repos. It will also look at the ongoing maintenance involved once scorecards have been implemented, and how aspects of that maintenance can be better automated to minimize toil.
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-InTrustArc
Six months into 2024, and it is clear the privacy ecosystem takes no days off!! Regulators continue to implement and enforce new regulations, businesses strive to meet requirements, and technology advances like AI have privacy professionals scratching their heads about managing risk.
What can we learn about the first six months of data privacy trends and events in 2024? How should this inform your privacy program management for the rest of the year?
Join TrustArc, Goodwin, and Snyk privacy experts as they discuss the changes we’ve seen in the first half of 2024 and gain insight into the concrete, actionable steps you can take to up-level your privacy program in the second half of the year.
This webinar will review:
- Key changes to privacy regulations in 2024
- Key themes in privacy and data governance in 2024
- How to maximize your privacy program in the second half of 2024
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdfNeo4j
Presented at Gartner Data & Analytics, London Maty 2024. BT Group has used the Neo4j Graph Database to enable impressive digital transformation programs over the last 6 years. By re-imagining their operational support systems to adopt self-serve and data lead principles they have substantially reduced the number of applications and complexity of their operations. The result has been a substantial reduction in risk and costs while improving time to value, innovation, and process automation. Join this session to hear their story, the lessons they learned along the way and how their future innovation plans include the exploration of uses of EKG + Generative AI.
Sustainability requires ingenuity and stewardship. Did you know Pigging Solutions pigging systems help you achieve your sustainable manufacturing goals AND provide rapid return on investment.
How? Our systems recover over 99% of product in transfer piping. Recovering trapped product from transfer lines that would otherwise become flush-waste, means you can increase batch yields and eliminate flush waste. From raw materials to finished product, if you can pump it, we can pig it.
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptxSynapseIndia
Your comprehensive guide to RPA in healthcare for 2024. Explore the benefits, use cases, and emerging trends of robotic process automation. Understand the challenges and prepare for the future of healthcare automation
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...Bert Blevins
Today’s digitally connected world presents a wide range of security challenges for enterprises. Insider security threats are particularly noteworthy because they have the potential to cause significant harm. Unlike external threats, insider risks originate from within the company, making them more subtle and challenging to identify. This blog aims to provide a comprehensive understanding of insider security threats, including their types, examples, effects, and mitigation techniques.
Support en anglais diffusé lors de l'événement 100% IA organisé dans les locaux parisiens d'Iguane Solutions, le mardi 2 juillet 2024 :
- Présentation de notre plateforme IA plug and play : ses fonctionnalités avancées, telles que son interface utilisateur intuitive, son copilot puissant et des outils de monitoring performants.
- REX client : Cyril Janssens, CTO d’ easybourse, partage son expérience d’utilisation de notre plateforme IA plug & play.
The DealBook is our annual overview of the Ukrainian tech investment industry. This edition comprehensively covers the full year 2023 and the first deals of 2024.
Coordinate Systems in FME 101 - Webinar SlidesSafe Software
If you’ve ever had to analyze a map or GPS data, chances are you’ve encountered and even worked with coordinate systems. As historical data continually updates through GPS, understanding coordinate systems is increasingly crucial. However, not everyone knows why they exist or how to effectively use them for data-driven insights.
During this webinar, you’ll learn exactly what coordinate systems are and how you can use FME to maintain and transform your data’s coordinate systems in an easy-to-digest way, accurately representing the geographical space that it exists within. During this webinar, you will have the chance to:
- Enhance Your Understanding: Gain a clear overview of what coordinate systems are and their value
- Learn Practical Applications: Why we need datams and projections, plus units between coordinate systems
- Maximize with FME: Understand how FME handles coordinate systems, including a brief summary of the 3 main reprojectors
- Custom Coordinate Systems: Learn how to work with FME and coordinate systems beyond what is natively supported
- Look Ahead: Gain insights into where FME is headed with coordinate systems in the future
Don’t miss the opportunity to improve the value you receive from your coordinate system data, ultimately allowing you to streamline your data analysis and maximize your time. See you there!
Measuring the Impact of Network Latency at TwitterScyllaDB
Widya Salim and Victor Ma will outline the causal impact analysis, framework, and key learnings used to quantify the impact of reducing Twitter's network latency.
Best Programming Language for Civil EngineersAwais Yaseen
The integration of programming into civil engineering is transforming the industry. We can design complex infrastructure projects and analyse large datasets. Imagine revolutionizing the way we build our cities and infrastructure, all by the power of coding. Programming skills are no longer just a bonus—they’re a game changer in this era.
Technology is revolutionizing civil engineering by integrating advanced tools and techniques. Programming allows for the automation of repetitive tasks, enhancing the accuracy of designs, simulations, and analyses. With the advent of artificial intelligence and machine learning, engineers can now predict structural behaviors under various conditions, optimize material usage, and improve project planning.
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - MydbopsMydbops
This presentation, delivered at the Postgres Bangalore (PGBLR) Meetup-2 on June 29th, 2024, dives deep into connection pooling for PostgreSQL databases. Aakash M, a PostgreSQL Tech Lead at Mydbops, explores the challenges of managing numerous connections and explains how connection pooling optimizes performance and resource utilization.
Key Takeaways:
* Understand why connection pooling is essential for high-traffic applications
* Explore various connection poolers available for PostgreSQL, including pgbouncer
* Learn the configuration options and functionalities of pgbouncer
* Discover best practices for monitoring and troubleshooting connection pooling setups
* Gain insights into real-world use cases and considerations for production environments
This presentation is ideal for:
* Database administrators (DBAs)
* Developers working with PostgreSQL
* DevOps engineers
* Anyone interested in optimizing PostgreSQL performance
Contact info@mydbops.com for PostgreSQL Managed, Consulting and Remote DBA Services
The Rise of Supernetwork Data Intensive ComputingLarry Smarr
Invited Remote Lecture to SC21
The International Conference for High Performance Computing, Networking, Storage, and Analysis
St. Louis, Missouri
November 18, 2021
13. Key-Value Stores
• “Dynamo: Amazon’s Highly Available Key-
Value Store” (2007)
• Data model:
– Global key-value mapping
– Big scalable HashMap
– Highly fault tolerant (typically)
• Examples:
– Riak, Redis, Voldemort
@bachmanm
14. Pros and Cons
• Strengths
– Simple data model
– Great at scaling out horizontally
• Scalable
• Available
• Weaknesses:
– Simplistic data model
– Poor for complex data
@bachmanm
15. Column Family (BigTable)
• Google’s “Bigtable: A Distributed Storage
System for Structured Data” (2006)
• Data model:
– A big table, with column families
– Map-reduce for querying/processing
• Examples:
– HBase, HyperTable, Cassandra
@bachmanm
16. Pros and Cons
• Strengths
– Data model supports semi-structured data
– Naturally indexed (columns)
– Good at scaling out horizontally
• Weaknesses:
– Unsuited for interconnected data
@bachmanm
17. Document Databases
• Data model
– Collections of documents
– A document is a key-value collection
– Index-centric, lots of map-reduce
• Examples
– CouchDB, MongoDB
@bachmanm
18. Pros and Cons
• Strengths
– Simple, powerful data model (just like SVN!)
– Good scaling (especially if sharding supported)
• Weaknesses:
– Unsuited for interconnected data
– Query model limited to keys (and indexes)
• Map reduce for larger queries
@bachmanm
19. Graph Databases
• Data model:
– Nodes with properties
– Named relationships with properties
– Hypergraph, sometimes
• Examples:
– Neo4j (of course), Sones GraphDB, OrientDB,
InfiniteGraph, AllegroGraph
@bachmanm
20. Pros and Cons
• Strengths
– Powerful data model
– Fast
• For connected data, can be many orders of magnitude
faster than RDBMS
• Weaknesses:
– Sharding
• Though they can scale reasonably well
• And for some domains you can shard too!
@bachmanm
21. Social Network “path exists”
Performance
• Experiment:
• ~1k persons # persons query time
• Average 50 friends per Relational 1000 2000ms
database
person
Neo4j 1000 2ms
• pathExists(a,b)
Neo4j 1000000 2ms
limited to depth 4
• Caches warm to
eliminate disk IO
@bachmanm
23. What are graphs good for?
• Recommendations
• Business intelligence
• Social computing
• Geospatial
• MDM
• Systems management
• Web of things
• Genealogy
• Time series data
• Product catalogue
• Web analytics
• Scientific computing (especially bioinformatics)
• Indexing your slow RDBMS
• And much more!
@bachmanm
24. Neo4j is a Graph Database
So we need to detour through a little
graph theory
@bachmanm
40. Getting started is easy
• Single package download, includes server stuff
– http://neo4j.org/download/
• For developer convenience, Ivy (or whatever):
– <dependency org="org.neo4j" name="neo4j-community" rev="1.9.M04"/>
@bachmanm
41. Run it!
• Server is easy to start stop
– cd <install directory>
– bin/neo4j start
– bin/neo4j stop
• Provides a REST API in addition to the other
APIs we’ve seen
• Provides some ops support
– JMX, data browser, graph visualisation
@bachmanm
42. Embed it!
• If you want to host the database in your
process just load the jars
• And point the config at the right place on disk
• Embedded databases can be HA too
– You don’t have to run as server
@bachmanm
43. name: Phil Johnson
title: Cognitive Psychology
duration: 30 name: Michal Bachman
name: UX
title: Intro to Neo4j
duration: 45
name: Martin Macke
name: Jeremy White INTERESTED name: Neo4j name: NOSQL
@bachmanm
49. name: Phil Johnson
title: Cognitive Psychology
duration: 30 name: Michal Bachman
name: UX
title: Intro to Neo4j
duration: 45
name: Martin Macke
name: Jeremy White INTERESTED name: Neo4j name: NOSQL
@bachmanm
50. All Conference Topics
Node webExpo = neo.getReferenceNode();
for (Relationship talksAt : webExpo.getRelationships(INCOMING, TALKS_AT)) {
Node speaker = talksAt.getStartNode();
for (Relationship delivers : speaker.getRelationships(OUTGOING, DELIVERS)) {
Node talk = delivers.getEndNode();
for (Relationship about : talk.getRelationships(OUTGOING, ABOUT)) {
String topicName = (String) about.getEndNode().getProperty(NAME);
//add to result...
}
}
}
-------------------
Printing all topics
All topics: development, data, advertising, education, usa, business, microsoft, webdesign, software,
responsiveness, ux, e-commerce, php, psychology, crm, api, chef, javascript, patterns, product design,
marketing, metro, social media, web, startup, analytics, lean, cqrs, node.js, branding, cloud, testing, neo4j,
rest, css, design, publishing, nosql. Took: 2 ms
52. name: Phil Johnson
title: Cognitive Psychology
duration: 30 name: Michal Bachman
name: UX
title: Intro to Neo4j
duration: 45
name: Martin Macke
name: Jeremy White INTERESTED name: Neo4j name: NOSQL
@bachmanm
53. Which talks should I attend?
TraversalDescription talksTraversal = Traversal.description()
.uniqueness(Uniqueness.NONE)
.breadthFirst()
.relationships(INTERESTED, OUTGOING)
.relationships(ABOUT, INCOMING)
.evaluator(Evaluators.atDepth(2));
Node attendee =
neo.index().forNodes("people").get("name", ”Jeremy White").getSingle();
Iterable<Node> talks = talksTraversal.traverse(attendee).nodes();
//iterate over talks and print
------------------------------------------
Suggesting talks for 100 random attendees.
...
Aneta Lebedova: Measure Everything!, To the USA, The real me. Took: 1 ms
Bohumir Kubat: Beyond the polar bear, How (not) to do API, Critical interface design. Took: 1 ms
Vladimir Vales: Application Development for Windows 8 Metro. Took: 1 ms
Suggested talks for 100 random attendees in 449 ms
55. name: Phil Johnson
title: Cognitive Psychology
duration: 30 name: Michal Bachman
name: UX
title: Intro to Neo4j
duration: 45
name: Martin Macke
name: Jeremy White INTERESTED name: Neo4j name: NOSQL
@bachmanm
56. What do we have in common?
//retrieve attendeeOne and attendeeTwo from index
int maxDepth = 2;
Iterable<Path> paths = GraphAlgoFactory
.allPaths(Traversal.expanderForAllTypes(), maxDepth)
.findAllPaths(attendeeOne, attendeeTwo);
for (Path path : paths) {
//print it
}
------------------------------------------------------------
Finding things in common for 100 random couples of attendees
...
Karel Kunc and Phil Smith:
(Karel Kunc)--[INTERESTED]-->(ux)<--[INTERESTED]--(Phil Smith),
(Karel Kunc)--[DISLIKED]-->(Be a punk consumer!)<--[DISLIKED]--(Phil Smith),
(Karel Kunc)--[DISLIKED]-->(Beyond the polar bear)<--[LIKED]--(Phil Smith),
(Karel Kunc)--[LIKED]-->(Shipito.com – business in USA)<--[LIKED]--(Phil Smith).
Took: 0 ms.
...
Found things in common for 100 random couples of attendees in 142 ms.
58. Who is my beer mate?
myself beerMate:?
talk:?
@bachmanm
59. Who is my beer mate?
(myself) (beerMate)
(talk)
@bachmanm
60. Who is my beer mate?
start myself=node:people(name = "Emil Votruba")
match (myself)-[:LIKED]->(talk)<-[:LIKED]-(beerMate)
return distinct beerMate.name, count(beerMate)
order by count(beerMate) desc
limit 5;
@bachmanm
61. Cypher Query
start myself=node:people(name = ”Alex Smart")
match (myself)-[:LIKED]->(talk)<-[:LIKED]-(beerMate)
return distinct beerMate.name, count(beerMate)
order by count(beerMate) desc
limit 5;
@bachmanm
62. Cypher Query
start myself=node:people(name = ”Emil Votruba")
match (myself)-[:LIKED]->()<-[:LIKED]-(beerMate)
return distinct beerMate.name, count(beerMate)
order by count(beerMate) desc
limit 5;
@bachmanm
64. Current Research
• Graph partitioning
• Graph analytics (“OLAP” and predictive)
• Performance improvements
• Query languages
• MVCC and single-threaded write models
• ACID (tradeoffs for weakening C and I)
• Yield and Harvest in distributed systems
• Application-level
– Recommendations
– Protein interactions
–…
@bachmanm
WelcomeIntroduce myself, NeoTechMotivations:Presented this at a conference Conversations with FriendsTalked to Serena, no affiliationBigData and NOSQL popular termsGraphs are getting more and more popular (Facebook)Not much attention at ImperialAsk about the audience, heard about graph databases? Graphs? Databases?Outcomes:Learn about a new technologySee application of graph theory in practiceTailored to students (not industry)Agenda:Intro to NOSQLIntro to Graph DatabasesIntro to Neo4jPractical part – how to work with oneReal experiencesCurrent researchQ & A
Why now?Not woke up one day thinking Rel DBs are not cool any moretrends
Generate, process, store and work with
UGC = User Generated ContentGGG = Giant Global Graph (what the web will become)– každýkousíček, každájednotkazajímavýchdat je sémantickypropojená s každoudalšízajímavoujednotkoudat (Tim Berners-Lee)Data jsoupropojenější (lineárně)RDFa (Resource Description Framework in attributes), českysystémpopisuzdrojů v atributech, je technologie pro přenosstrukturovanýchinformacíuvnitřwebovýchstránek. RDFa je jedenzezpůsobůzápisu (serializace) datovéhoformátu Resource Description Framework (RDF). Ontologie je v informaticevýslovný (explicitní) a formalizovanýpopisurčitéproblematiky. Je to formální a deklarativníreprezentace, kteráobsahujeglosář (definicipojmů) a tezaurus (definicivztahůmezijednotlivýmipojmy). Ontologie je slovníkem, kterýslouží k uchovávání a předáváníznalostitýkající se určitéproblematiky.
Data losing predictable structureIndividualisation of data, can’t box each individual, want data about meShape of data, less predictable structureDecentralisation of data creation accelerates this trend
Apps can choose what makes sense to store the data
This is strictly about connected data – joins kill performance there.No bashing of RDBMS performance for tabular transaction processing
Krásavesvětě NOSQL - nikdovámnepřikazuje, vybratdatabázi, kteráodpovídátypučicharakteristicedat, se kterýmipracujete. key-value databáze: jedenklíč - jednahodnota, hash mapy, Redis, Riak (Amazon Dynamo), Většinouvysocetolerantnívůčivýpadkům, Jednoduchýdatový model, Vynikajícíhorizontálníškálovatelnost, Dostupnost, BigTabledatabáze: k-vvvvvvv store s implicitnímiindexy, Cassandra (Google), PodporačástečněstrukturovanýchdatAutomatický index (sloupce), Dobráhorizontálníškálovatelnost, opětnevhodné pro propojená dataDokumentovédatabáze, známá je například subversion, MongoDB, CouchDB, …Kolekcedokumentů, Dokument je kolekce key-value párů, Index je důležitý, hodně map-reduce,Škálovatelnostcelkemdobrá. (Ne takjako key-value, složitějšímdatovýmmodelem, Jednoduchý a výkonýdatový model, jako subversion.Nevýhodouvšech 3 je nejsouúplněvhodné pro hustěpropojená data. Přílišjednoduchýdatový (HashMap, rychlá, ale…) model znamená, žechceme-li získatjakékolivokamžitéhlubšíporozuměníuloženýmdatům. Musí to býtzodpovědnostíaplikačnívrstvy (čili to musímenějaknaprogramovat). Velmičastojsoutedytytodatabázespojeny s frameworkyjako Map-Reduce, pro kterémusímevytvořitúlohy, kterénámtotoporozuměníumožnízískat.Map-reduce je dávkováoperace (to bychuvedl v kontrastu s on-line / in-the-click-stream synchronníoperací), abystezískalipohlednavašepropojená data.Všechny 3 pracují s agregovanýmidaty, tzn. Ževyžadujístruktutupředem, data, kterápatřílogicky k sobě (jakoobjednávka a jejíjednotlivépoložky), jsou v databáziuloženy u sebe a je k nimtaké v dotazechpřistupovánojako k celku. V key-value úložištích je tímcelkemhodnota, v CF CF a v Dok. Dbsdokumenty.OKvpřípadech, kdypřístup k datůmvyžadujepřesnětutostrukturu. Pokud se ale chcemena data podívatjinak, napříkladanalyzovat z objednávekcelkovéprodejejednotlivýchproduktů, musíme s toustrukturoutrochubojovat a to je ten důvod, proč se tolikmluví o map-reduce vespojení s těmitodatabázemi. Výhodouukládánídat v neagregovanýchformách je to, že se dajíanalyzovat a prezentovat z různáchúhlůpohledy v závislotinakonkrétnímpřípadě.A samozřejměgrafovédatabáze, kvůlikterýmtudnesjsme a o kterých se tohodozvíme o něcovíczaminutku
History – Amazon decide that they always wanted the shopping basket to be available, but couldn’t take a chance on RDBMSSo they built their ownBig risk, but simple data model and well-known computing science underpinning it (e.g. consistent hashing, Bloom filters for sensible replication)+ Massive read/write scale- Simplistic data model moves heavy lifting into the app tier (e.g. map reduce)
Mongo DB has a reputation for taking liberties with durability to get speedCouch DB has good multimaster replication from Lotus Notes
People talk about Codd’s relational model being mature because it was proposed in 1969 – 42 years old.Euler’s graph theory was proposed in 1736 – 275 years old.
Can’t easily shard graphs like documents or KV stores.This means that high performance graph databases are limited in terms of data set size that can be handled by a single machine.Can use replicas to speed things up (and improve availability) but limits data set size limited to a single machine’s disk/memory.Some domains can shard easily (.e.g geo, most web apps) using consistent routing approach and cache sharding – we’ll cover that later.
Teoriegrafůzkoumávlastnostistruktur, zvanýchgrafy. Ty jsoutvořenyvrcholy, kteréjsouvzájemněspojenéhranami. Znázorňuje se obvyklejakomnožinabodůspojenýchčárami. Formálně je grafuspořádanoudvojicímnožinyvrcholů V a množinyhran E.
SedmmostůměstaKrálovce (dnes Kaliningrad)Kdodělá pro velkoufirmu, tímmyslímněkolikvrstevmanagementu, softwarovýarchitektnajinémpatřenežvývojářiTatoinformace je pro Vás, v těchtofirmáchbývátěžképrosadit “nové” technologie. Ale relační model, se kterýmpřišel E.F. Codd v roce 1969, je pouze 43 let starý. Grafový model je 276 starý. TakžepříštěažVámšéfnebochytrýarchitektřeknenaadopci NOSQL něcovesmyslu “tadypoužívámejenomzralé a prokázanévyspělétechnologie”, víte, kterýmsměrem ho máteposlat… tímmámnamyslitřebatutopřednáškunawebunebopříslušnéstránkynawikipedii. Takžejakukládáme data v grafu…
Takžejakukládáme data v grafu…V grafuukládámedata jakovrcholy a vrcholyjsouvlastnědokumenty, kterémodoumítlibovolnéklíče a k nimpřiřazenéhodnoty. Stejnějakodokument v MongoDB. V čem se grafliší od MongoDB je že v grafujsouvztahymezivrcholy. A to je trade-off, MongoDB je lépeškálovatelné, protožetohlenedělá. Neo4J je lepší pro propojená data, tohledělá. Ukládávztahymezijednotlivýmivrcholy. Ale nenítakdobřeškálovatelné. A do musímevzít v potazpřiřešeníVašichproblémů: chcetemasivníškálovatelnost, nebookamžitýnáhled do propojenostiVašich dat. POPSAT GRAFVztahymajisemantickyvyznam! Recnici, prednasky v RDBMSJe to poměrněintuitivnízpůsobukládánídat! Úkolgrafovédatabáze je vzíttatointuitivní data, kterásimůžemejednodušenačrtnoutnatabulinebokuspapíru a rychle je procházetvevašichprogramech.
A to je jednahezkávlastnostgrafů – jsouideální pro tabule,zadnístranyobálek, pivníchtácků a krabiček od cigaret… to jsouvěci, nakterýchtynejlepšídesigny (zejménavestartupech) většinouvznikajíJájsemsivybraljakopříkladWebExpo, původnějsemchtělzmapovatkorupčníaféryčeskýchpolitiků, ale tohle je o něconeškodnější. Vztahymeziřečníky, přednáškam, tématy, účastníky a podobněsimůžemenakreslitnapivnítácek! WebExpo je doména,kterámáspoustuvztahů – řečnícimajípřednášky, …To simůžetejednodušenakreslitnatabuli, to je mimochodem to, co dělámejakoprogramátoři, kdyžsedíme s lidmi, kteřípotřebujínějakýkussoftwaru a my se snažímetomu business problému, tédoméněporozumět. Sednemsi k tabuli, nakreslímezákazníky, objednávky, faktury, produkty a podobně a vztahymezinimi!A co udělámepak – vezmemenášpěkný design a denormalizujeme ho. Potíme se vymýšlením, jak to všechnonaládujeme do tabulek. A jsmešťastní a usměvaví, než to zpustímenaživo, do provozu…. A ono to bežíjakželva… Co uděláme? Denormalitzujemenáš model! Všechnaenergie, kteroujsmeinvestovali, krev, pot a slzy, všechno v niveč. U grafovédatabáze, to co je napapíře je přesně to, co naházíte do databáze.
To neznamená,žejsteomluveni s designovéfáze. Pořád se musítehlubocezamysletnadtím, jaké entity (neboobjekty) tvořívašidoménu a jakéjsoumezinimivztahy! Stálepotřebujete design.Nemůžetejednoduševzít data ztabulek, kterámáte a násilím je natřískat do vašízbrusunovégrafovédatabáze. Člověkmusízačítmyslet v nódách a vztazích.Přinavrhovánídatovéhomodelu pro WebExpomusímeudělathodnědesignovýchrozhodnutí: jakodlišitřečníky od účastníků? A je to vůbecpotřeba? Udělatzepátka a sobotynódy, nebojenomvlastnostnajednotlivýchpřednáškách?Stálemusítedělat design, ale pointa je že design datovéhomodelu pro grafovoudatabázimůžebýtpříjemná a přirozenázkušenost.
Stará se proVás o nódy, vztahymezinimi a indexy.Neo4j je stabilní a běží od roku 2003ProcházíaktivnímvývojemPrimárně pro Javu, ale použitelná se spoustoudalšíchtechnologiíIdeální pro škáludesítekserverů v clusteru, ne pro stovkyPro hustěpropojená data, není to KV store
Plně a militantně ACID. Kdoneví, co to znamená?Rychlevysvětlit: atomicity, consistency, isolation, durabilityNěkterédalší NOSQL databáze se vzdávajíněkterýchgarancíveprospěchvýkonu, u Neo4j tohlevypnoutnejde. Data jsouvždyzapsánana disk.
Vyhledatzacatek v indexu (Lucene)Prozkoumavatokoli
Vyhledatzacatek v indexu (Lucene)Prozkoumavatokoli
Neo mázabudovanoucelouknihovnugrafovýchalgoritmů, jakonejkratšícesta, všechnycesty, atp
1m hops zasekundunanormálnímlaptopu, žádnýrozdílpřiznásobenípočtudatHigh performance graph operationsTraverses 1,000,000+ relationships / second on commodity hardware
Obecněpokudpoužíváte MySQL a neplatítezaněj, nebudeteplatitaniza Neo.
Pojďmesikázatpoužití v embedded módunakonkrétnímpříkladu. Vytvořiljsemgraf z webexpa, řečníci a přednáškyjsouopravdové, 1000 účastníkůmánáhodněvygenerovanájména. Popsatgraf a scénář.KdonečteJavuKodbudenagithubu
Vztahymůžoubýtbuďřetězceznaků, neboEnum, kterévámdajívýhodustatickéhotypování v IDE, pro Neo4j v tom nenížádnýrozdíl.Postupopakujemedokudnemámecelýgraf
Tohle je screenshot z webovékonzole, kdemůžemegrafvizálněprocházet. Běžínalaptopu, dámVámnakonci URL, abystesi s tímmohlipohrát.Tak, mámegraf, ale jak z nějteďdostaneme data ven?
Existujeněkolikzpůsobů,jakpsátdotazy v Neo4j, liší se čitelností, složitostí, výkonem a úrovníabstrakce. UkážuVámněkterézezpůsobů a začnuodspoda, tzn. On nativníhonejrychlejšího API.
Core API pracujepřímo s jednotkami, kteréjsme do databázeuložili – vrcholy, hrany a jejichvlastnosti.
Podívejme se ještějednounavelýgraf. Novýgrafmávždyjednunódu s ID 0, z téjsmeudělalliWebExpo.
Tohle je imperativní API, všechnupráciděláprogramátor, je nejvýkonnější
Pojďme se podívat o úroveňvýš co se abstrakcetýčenatakzvané traversal API, kterénámumožnípsátdotazydeklarativně, to znamenápopsat, jakchcemegrafprocházet. Samotnéprocházeníudělá Neo4J zanás.
Můžemepsátvlastníevaluatory
Dalšípovedenoufunkcí je knihovnaalgoritmů pro hledánícestmezidvěmauzly.
Takénejkratšícesta, Dijkstra a další
Těžké pro neprogramátory, pojďmě se podívatnaněcojednoduššího
Na nejvyššíúrovniabstrakce Neo4j zprostředkovávásvůjvlastníjazyk pro psanídotazů, částečněinspirovaný SQL. Ten jazyk se jmenuje Cypher a rozumílidskyčitelnýmpříkazům, jakonapříkladtomu, kterýtadyteďvidíte.
Musímenědezačít, napomocsivezmeme index s názvem people, kdenajdemepanaEmilaVotrubupodlejména.Dálemusímeupřesnit, co za data vlastněchcemezískat, v tomtopřípadějménočlověka a skóre, kolikvěcímámespolečnýchNakonecasinechcemejítnapivoúplně se všemi, ale janomřekněme s 5 lidmi, se kterýmitohomámespolečnéhonejvícAsividítevliv SQL----- Meeting Notes (09/09/2012 20:18) -----animace
Musímenědezačít, napomocsivezmeme index s názvem people, kdenajdemepanaEmilaVotrubupodlejména.Dálemusímeupřesnit, co za data vlastněchcemezískat, v tomtopřípadějménočlověka a skóre, kolikvěcímámespolečnýchNakonecasinechcemejítnapivoúplně se všemi, ale janomřekněme s 5 lidmi, se kterýmitohomámespolečnéhonejvícAsividítevliv SQL----- Meeting Notes (09/09/2012 20:18) -----animace
Musímenědezačít, napomocsivezmeme index s názvem people, kdenajdemepanaEmilaVotrubupodlejména.Dálemusímeupřesnit, co za data vlastněchcemezískat, v tomtopřípadějménočlověka a skóre, kolikvěcímámespolečnýchNakonecasinechcemejítnapivoúplně se všemi, ale janomřekněme s 5 lidmi, se kterýmitohomámespolečnéhonejvícAsividítevliv SQL----- Meeting Notes (09/09/2012 20:18) -----animace