The document discusses NoSQL databases as an alternative to SQL databases for big data. It provides an overview of why NoSQL databases were created due to limitations of SQL for large, distributed datasets. It then categorizes and describes some popular NoSQL databases, including key-value stores like Dynamo and Redis, document databases like MongoDB and CouchDB, graph databases like Neo4J and FlockDB, and column-oriented databases like BigTable and HBase. The document also contrasts ACID transactions with the BASE model and eventual consistency used by many NoSQL databases.
This document discusses machine learning on fast data. It presents an agenda covering ML on production systems, TensorFlow, Kafka, Docker and Kubernetes. It then describes the machine learning process and shows how an enterprise analytics platform can integrate data sources, a machine learning cluster using Kafka, and data destinations. Details are provided on using TensorFlow for linear regression and neural networks. Apache Kafka is explained as a distributed streaming platform using topics, brokers, and consumer groups. The Confluent platform, KStream and KTable APIs are also summarized. Docker and Kubernetes are mentioned for containerization.
This document summarizes digital transformation with Microsoft Azure, including cloud computing, big data, and data lakes. It discusses data lake characteristics such as structured, semi-structured, and unstructured data. Data lakes are used for reporting, visualization, analytics, and machine learning. They provide a single store for raw and processed data ranging from raw copies of source systems to structured data for analytics. The document also briefly mentions Azure Data Lake Analytics, DataBricks, and concludes by thanking the reader.
The document discusses Azure Data Factory and its capabilities for cloud-first data integration and transformation. ADF allows orchestrating data movement and transforming data at scale across hybrid and multi-cloud environments using a visual, code-free interface. It provides serverless scalability without infrastructure to manage along with capabilities for lifting and running SQL Server Integration Services packages in Azure.
The document provides an introduction and overview of NoSQL databases. It discusses: - How NoSQL databases are non-relational and differ from traditional relational databases by not requiring fixed schemas and supporting horizontal scaling. - Examples of different types of NoSQL databases like document stores, key-value stores, and graph databases. - The CAP theorem and eventual consistency of NoSQL databases, which allow high availability and partitioning at the cost of strong consistency. - How NoSQL databases are used by large companies to store rapidly growing unstructured and unpredictable data more efficiently than relational databases.
This document provides an overview of using Polybase for data virtualization in SQL Server. It discusses installing and configuring Polybase, connecting external data sources like Azure Blob Storage and SQL Server, using Polybase DMVs for monitoring and troubleshooting, and techniques for optimizing performance like predicate pushdown and creating statistics on external tables. The presentation aims to explain how Polybase can be leveraged to virtually access and query external data using T-SQL without needing to know the physical data locations or move the data.
This document discusses MS SQL Server 2019's capabilities for big data processing through PolyBase and Big Data Clusters. PolyBase allows SQL queries to join data stored externally in sources like HDFS, Oracle and MongoDB. Big Data Clusters deploy SQL Server on Linux in Kubernetes containers with separate control, compute and storage planes to provide scalable analytics on large datasets. Examples of using these technologies include data virtualization across sources, building data lakes in HDFS, distributed data marts for analysis, and integrated AI/ML tasks on HDFS and SQL data.
Things Database Administrators should know if the are working with SharePoint databases. Presentation from SQLsat614
Azure Data Factory can now use Mapping Data Flows to orchestrate ETL workloads. Mapping Data Flows allow users to visually design transformations on data from disparate sources and load the results into Azure SQL Data Warehouse for analytics. The key benefits of Mapping Data Flows are that they provide a visual interface for building expressions to cleanse and join data with auto-complete assistance and live previews of expression results.
The document discusses Azure Data Factory V2 data flows. It will provide an introduction to Azure Data Factory, discuss data flows, and have attendees build a simple data flow to demonstrate how they work. The speaker will introduce Azure Data Factory and data flows, explain concepts like pipelines, linked services, and data flows, and guide a hands-on demo where attendees build a data flow to join customer data to postal district data to add matching postal towns.
This document provides an overview and agenda for a presentation on new features in the Microsoft data platform universe including SQL Server, Azure, and Power BI, with a focus on business intelligence (BI) and artificial intelligence (AI). The presentation covers new features in SQL Server 2019 for BI like calculation groups and dynamic formatting in Analysis Services tabular mode. It also highlights important updates in Azure including new serverless and hyperscale options for Azure SQL Database, updates to data ingestion and storage with Azure Data Factory v2 and Azure Data Lake Gen 2, and changes to data modeling, preparation, and serving capabilities in Azure SQL Data Warehouse. The document concludes by discussing options for incorporating AI and machine learning into BI solutions using either
Azure Data Factory is a cloud data integration service that allows users to create data-driven workflows (pipelines) comprised of activities to move and transform data. Pipelines contain a series of interconnected activities that perform data extraction, transformation, and loading. Data Factory connects to various data sources using linked services and can execute pipelines on a schedule or on-demand to move data between cloud and on-premises data stores and platforms.
This document provides an overview and demonstration of Azure Data Lake Store and Azure Data Lake Analytics. The presenter discusses how Azure Data Lake can store and analyze large amounts of data in its native format. Key capabilities of Azure Data Lake Store like unlimited storage, security features, and support for any data type are highlighted. Azure Data Lake Analytics is presented as an elastic analytics service built on Apache YARN that can process large amounts of data. The U-SQL language for big data analytics is demonstrated, along with using Visual Studio and PowerShell for interacting with Azure Data Lake. The presentation concludes with a question and answer section.
SQLBits X Presentation "Scaling Out Your Cloud Database With SQL Azure Federations" Copyright (c) Microsoft Corp.
Spark is a fast and general engine for large-scale data processing. It supports Scala, Python, Java, SQL, R and more. Spark applications can access data from many sources and perform tasks like ETL, machine learning, and SQL queries. Azure Databricks provides a managed Spark service on Azure that makes it easier to set up clusters and share notebooks across teams for data analysis. Databricks also integrates with many Azure services for storage and data integration.
This document summarizes new features in SQL Server 2019 including: - Support for big data analytics using SQL Server, Spark, and data lakes for high volume data storage and access. - Enhanced data virtualization capabilities to combine data from multiple sources. - An integrated AI platform to ingest, prepare, train, store, and operationalize machine learning models. - Improved security, performance, and high availability features like Always Encrypted with secure enclaves and accelerated database recovery. - Continued improvements based on community feedback.
Azure Data Factory introduces Visual Data Flow, a limited preview feature that allows users to visually design data flows without writing code. It provides a drag-and-drop interface for users to select data sources, place transformations on imported data, and choose destinations for transformed data. The flows are run on Azure and default to using Azure Data Lake Storage for staging transformed data, though users can optionally configure other staging options. The feature supports common data formats and transformations like sorting, merging, joining, and lookups.
This document summarizes the limitations of Azure SQL Database. Key limitations include not being able to change collation settings of system objects, use Windows authentication, manage high availability features, or use various SQL Server tools and stored procedures. The document provides a detailed list of limitations across database and T-SQL functionality, database internals, and other areas in comparison to on-premises SQL Server. It advises checking official Microsoft documentation for the most up-to-date information.
The document discusses NoSQL databases and CouchDB. It provides an overview of NoSQL, the different types of NoSQL databases, and when each type would be used. It then focuses on CouchDB, explaining its features like document centric modeling, replication, and fail fast architecture. Examples are given of how to interact with CouchDB using its HTTP API and tools like Resty.
RavenDB is a document database built on .NET that stores data in JSON format. It provides a .NET client API for CRUD and querying operations using LINQ. Key features include dynamic and static indexing for querying, scalability through sharding and replication, and a schema-free design. The document covers an overview of NoSQL databases, RavenDB architecture and features, and development with the .NET client API. Additional topics like advanced querying, server administration, and deployment options are recommended for further learning.
This document discusses document-oriented databases and RavenDB. It provides an overview of the challenges of relational databases and why non-SQL databases are needed. It then describes document stores and how RavenDB is a .NET document database that is schema-free, horizontally scalable, and supports Linq queries. The document demonstrates RavenDB through examples of CRUD operations, indexing, and sharding. It also briefly explains the MapReduce programming model.
The document provides an introduction to NOSQL databases. It begins with basic concepts of databases and DBMS. It then discusses SQL and relational databases. The main part of the document defines NOSQL and explains why NOSQL databases were developed as an alternative to relational databases for handling large datasets. It provides examples of popular NOSQL databases like MongoDB, Cassandra, HBase, and CouchDB and describes their key features and use cases.
- The document discusses NoSQL databases and compares them to traditional SQL databases. It provides an overview of different types of NoSQL databases like column-oriented stores, document stores, memory stores, and graph databases. Examples of popular NoSQL databases are also given for each category.
This document discusses using SQOOP to connect Hadoop and relational databases. It describes four common interoperability scenarios and provides an overview of SQOOP's features. It then focuses on optimizing SQOOP for Oracle databases by discussing how the Quest/Cloudera OraOop extension improves performance by bypassing Oracle parallelism and buffering. The document concludes by recommending best practices for using SQOOP and its extensions.
This document compares NoSQL solutions like Redis, Couchbase, MongoDB, and Membase. It discusses their data models, features, and how they differ from relational databases. Key-value, column-oriented, and document-oriented databases are covered. Specific products like Membase, Redis, MongoDB, and CouchDB are also summarized, including their data models, replication methods, and typical uses in applications.
Using JPA vs. Activerecord for persistence in Jruby Chris Bucchere and Pieter Humphrey from Openworld 2009.
Laine Campbell, CEO of Blackbird, will explain the options for running MySQL at high volumes at Amazon Web Services, exploring options around database as a service, hosted instances/storages and all appropriate availability, performance and provisioning considerations using real-world examples from Call of Duty, Obama for America and many more. Laine will show how to build highly available, manageable and performant MySQL environments that scale in AWS—how to maintain then, grow them and deal with failure. Some of the specific topics covered are: * Overview of RDS and EC2 – pros, cons and usage patterns/antipatterns. * Implementation choices in both offerings: instance sizing, ephemeral SSDs, EBS, provisioned IOPS and advanced techniques (RAID, mixed storage environments, etc…) * Leveraging regions and availability zones for availability, business continuity and disaster recovery. * Scaling patterns including read/write splitting, read distribution, functional dataset partitioning and horizontal dataset partitioning (aka sharding) * Common failure modes – AZ and Region failures, EBS corruption, EBS performance inconsistencies and more. * Managing and mitigating cost with various instance and storage options
My workshop at Software Architect 2015: A full day about angular js, node, express and mongoDB. You could find the code: https://github.com/habmic/MeanDemoCode
This document provides an overview and comparison of relational and NoSQL databases. Relational databases use SQL and have strict schemas while NoSQL databases are schema-less and include document, key-value, wide-column, and graph models. NoSQL databases provide unlimited horizontal scaling, very fast performance that does not deteriorate with growth, and flexible queries using map-reduce. Popular NoSQL databases include MongoDB, Cassandra, HBase, and Redis.
Slides from a customer presentation at the 'Powering games with Amazon Web Services' event in London.
Spark is fast becoming a critical part of Customer Solutions on Azure. Databricks on Microsoft Azure provides a first-class experience for building and running Spark applications. The Microsoft Azure CAT team engaged with many early adopter customers helping them build their solutions on Azure Databricks. In this session, we begin by reviewing typical workload patterns, integration with other Azure services like Azure Storage, Azure Data Lake, IoT / Event Hubs, SQL DW, PowerBI etc. Most importantly, we will share real-world tips and learnings that you can take and apply in your Data Engineering / Data Science workloads
The document discusses different NoSQL databases including Cassandra, CouchDB, MongoDB, Neo4J, and Redis. It explains some of the key concepts of NoSQL databases like being non-relational, schema-less, and emphasizing availability and partition tolerance. It provides brief overviews of the data models and architectures of different NoSQL databases and how they handle concepts like distribution, replication, and querying.
Introductiion à NoSQL dans le cadre des Last Thursday strasbourgeois http://www.facebook.com/home.php#!/group.php?gid=44635341639&ref=ts
iSCSI provides a standard way to access Ceph block storage remotely over TCP/IP. SUSE Enterprise Storage 3 includes an iSCSI target driver that allows any iSCSI initiator to connect to Ceph storage. This provides multiple platforms with standardized access to Ceph without needing to join the cluster. Optimizations are made in iSCSI to efficiently handle SCER operations by offloading work to OSDs. openATTIC provides a web-based interface for managing Ceph and other storage. It currently allows pool, OSD, and RBD management along with cluster monitoring. Future plans include extended pool and OSD management, CephFS and RGW integration, and deployment/configuration of Ceph nodes via Salt.
This document discusses SQL and NoSQL approaches to scaling databases. It describes how social networks and other large-scale websites use techniques like sharding and messaging to partition data across many databases. It also discusses how SQL Server is adopting NoSQL paradigms like flexible schemas and federated sharding to provide scalability. The document aims to educate about scaling databases and how SQL Server is evolving to support both SQL and NoSQL approaches.
This document discusses extending Oracle E-Business Suite using Ruby on Rails. It provides an overview of how to extend EBS functionality by embedding EBS data in other systems or customizing forms and reports. It then discusses the evolution of EBS extension approaches over time from custom PL/SQL to various Java-based frameworks. It introduces Ruby on Rails as an alternative approach, describing how Rails uses an MVC architecture and Active Record pattern. It demonstrates how to connect Rails to Oracle databases using enhanced Oracle adapters and call PL/SQL from Ruby. Finally, it discusses deployment options and provides references for more information.
The document provides information about Couchbase, a NoSQL database. It discusses Couchbase's key-value data model and how data is stored and accessed. The main architectural components are nodes, clusters, buckets, and documents. Data is accessed via reads, writes, views, and N1QL queries. Couchbase provides scalability and high performance through its caching architecture and append-only disk writes.
Databases in the Cloud discusses AWS database services for moving workloads to the cloud. It describes Amazon Relational Database Service (RDS) which provides several fully managed relational database options including MySQL, PostgreSQL, MariaDB, Oracle, SQL Server, and Amazon Aurora. It also discusses non-relational database services like DynamoDB, ElastiCache, and Redshift for analytics workloads. The document provides guidance on choosing between SQL and NoSQL databases and discusses benefits of managed database services over hosting databases on-premises or in EC2 instances.
Tips how to become awesome developers: - be a good developer - automate server infra - continuously deploy - monitor & measure - understand internals
The document discusses how a Lithuanian startup re-architected their website on Windows Azure to address scaling issues as their traffic grew from 20,000 to potential spikes of 50 page views per second, including moving content to blob storage, splitting the database and hosting across multiple VMs, and leveraging other Azure services like caching. It describes the scaling issues encountered at various traffic levels and how the site was restructured on Azure with different computing, data, and networking services to allow for flexibility and scalability.