SlideShare a Scribd company logo
1
Ilkay ALTINTAS and Geoffrey FOX - March, 2014
Big Data Use Cases and Requirements
2
Ilkay ALTINTAS and Geoffrey FOX - March, 2014
Requirements and Use Case Subgroup
2
The focus is to form a community of interest from industry, academia, and
government, with the goal of developing a consensus list of Big Data
requirements across all stakeholders. This includes gathering and
understanding various use cases from diversified application domains.
Tasks
•Gather use case input from all stakeholders
•Derive Big Data requirements from each use case.
•Analyze/prioritize a list of challenging general requirements that may delay
or prevent adoption of Big Data deployment
•Work with Reference Architecture to validate requirements and reference
architecture
•Develop a set of general patterns capturing the “essence” of use cases (to
do)
3
Ilkay ALTINTAS and Geoffrey FOX - March, 2014
Use Case Template
• 26 fields completed for
51 usecases
– Government Operation: 4
– Commercial: 8
– Defense: 3
– Healthcare and Life
Sciences: 10
– Deep Learning and Social
Media: 6
– The Ecosystem for
Research: 4
– Astronomy and Physics: 5
– Earth, Environmental and
Polar Science: 10
– Energy: 1
4
Ilkay ALTINTAS and Geoffrey FOX - March, 2014
51 Detailed Use Cases: Many TB’s to Many PB’s
• Government Operation: National Archives and Records Administration, Census Bureau
• Commercial: Finance in Cloud, Cloud Backup, Mendeley (Citations), Netflix, Web Search,
Digital Materials, Cargo shipping (as in UPS)
• Defense: Sensors, Image surveillance, Situation Assessment
• Healthcare and Life Sciences: Medical records, Graph and Probabilistic analysis, Pathology,
Bioimaging, Genomics, Epidemiology, People Activity models, Biodiversity
• Deep Learning and Social Media: Driving Car, Geolocate images/cameras, Twitter, Crowd
Sourcing, Network Science, NIST benchmark datasets
• The Ecosystem for Research: Metadata, Collaboration, Language Translation, Light source
experiments
• Astronomy and Physics: Sky Surveys compared to simulation, Large Hadron Collider at
CERN, Belle Accelerator II in Japan
• Earth, Environmental and Polar Science: Radar Scattering in Atmosphere, Earthquake,
Ocean, Earth Observation, Ice sheet Radar scattering, Earth radar mapping, Climate simulation
datasets, Atmospheric turbulence identification, Subsurface Biogeochemistry (microbes to
watersheds), AmeriFlux and FLUXNET gas sensors
• Energy: Smart grid
Next step involves matching extracted requirements and reference architecture.
Alternatively develop a set of general patterns capturing the “essence” of use cases.

Recommended for you

GlobusWorld 2021: Arecibo Observatory Data Movement
GlobusWorld 2021: Arecibo Observatory Data MovementGlobusWorld 2021: Arecibo Observatory Data Movement
GlobusWorld 2021: Arecibo Observatory Data Movement

The story of how Globus helped move petabytes of data from the Arecibo Observatory to TACC, and thereby save 50+ years of data for posterity and future research. Presented at the GlobusWorld 2021 conference by George Robb III.

globusglobusworld tourarecibo observatory
SmartGeo/Eiagrid portal (Guido Satta, CRS4)
SmartGeo/Eiagrid portal (Guido Satta, CRS4)SmartGeo/Eiagrid portal (Guido Satta, CRS4)
SmartGeo/Eiagrid portal (Guido Satta, CRS4)

Viene descritta la piattaforma EiAGRID/SmartGeo, un portale di calcolo e analisi dati per sismica a riflessione e acquisizioni GPR multioffset, che mette a disposizione dell'utente una serie di servizi di calcolo e di processing accessibili attraverso un'interfaccia Web basata su un'infrastruttura Grid. La piattaforma consente all'utente in campo, tramite un dispositivo client (laptop, PC, tablet, etc.), di usufruire di una serie di servizi computazionali che risiedono e girano su server remoti, secondo il paradigma SaaS (Software as a Service). Verranno illustrate le soluzioni modellistiche e tecnologiche adottate e alcuni risultati ottenuti su dati reali.

metodi sismicigridgeofisica applicata
Future of hpc
Future of hpcFuture of hpc
Future of hpc

The document discusses the future of high performance computing (HPC). It covers several topics: - Next generation HPC applications will involve larger problems in fields like disaster simulation, urban science, and data-intensive science. Projects like the Square Kilometer Array will generate exabytes of data daily. - Hardware trends include using many-core processors, accelerators like GPUs, and heterogeneous computing with CPUs and GPUs. Future exascale systems may use conventional CPUs with GPUs or innovative architectures like Japan's Post-K system. - The top supercomputers in the world currently include Summit, a IBM system combining Power9 CPUs and Nvidia Voltas at Oak Ridge, and China's Sunway Taihu

exascale computersupercomputer
5
Ilkay ALTINTAS and Geoffrey FOX - March, 2014
Some Trends
• Practitioners consider themselves Data Scientists
• Images are a major source of Big Data
– Radar
– Light Synchrotrons
– Phones
– Bioimaging
5
• Hadoop and HDFS dominant
• Business – main emphasis at
NIST – interested in analytics and
assume HDFS
• Academia also extremely
interested in data management
• Clouds v. Grids
6
Ilkay ALTINTAS and Geoffrey FOX - March, 2014
Example Use Case I: Summary of Genomics
• Application: 19: NIST Genome in a Bottle Consortium
– integrates data from multiple sequencing technologies and methods
– develops highly confident characterization of whole human genomes as
reference materials,
– develops methods to use these Reference Materials to assess performance of
any genome sequencing run.
• Current Approach:
– The storage of ~40TB NFS at NIST is full; there are also PBs of genomics data
at NIH/NCBI.
– Use Open-source sequencing bioinformatics software from academic groups on
a 72 core cluster at NIST supplemented by larger systems at collaborators.
• Futures:
– DNA sequencers can generate ~300GB compressed data/day which volume
has increased much faster than Moore’s Law.
– Future data could include other ‘omics’ measurements, which will be even larger
than DNA sequencing. Clouds have been explored.
Healthcare/Life Sciences
7
Ilkay ALTINTAS and Geoffrey FOX - March, 2014
Example Use Case II: Census Bureau Statistical
Survey Response Improvement (Adaptive Design)
• Application: Survey costs are increasing as survey response declines.
– Uses advanced “recommendation system techniques” that are open and scientifically objective
– Data mashed up from several sources and historical survey para-data (administrative data about the
survey) to drive operational processes
– The end goal is to increase quality and reduce the cost of field surveys
• Current Approach:
– ~1PB of data coming from surveys and other government administrative sources.
– Data can be streamed with approximately 150 million records transmitted as field data streamed
continuously, during the decennial census.
– All data must be both confidential and secure.
– All processes must be auditable for security and confidentiality as required by various legal statutes.
– Data quality should be high and statistically checked for accuracy and reliability throughout the
collection process.
– Use Hadoop, Spark, Hive, R, SAS, Mahout, Allegrograph, MySQL, Oracle, Storm, BigMemory,
Cassandra, Pig software.
• Futures:
– Analytics needs to be developed which give statistical estimations that provide more detail, on a more
near real time basis for less cost.
– The reliability of estimated statistics from such “mashed up” sources still must be evaluated.
Government Operation
8
Ilkay ALTINTAS and Geoffrey FOX - March, 2014
Example Use Case III: 26: Large-scale Deep Learning
• Application: 26: Large-scale Deep Learning
– Large models (e.g., neural networks with more neurons and connections) combined with large datasets
are increasingly the top performers in benchmark tasks for vision, speech, and Natural Language
Processing.
– One needs to train a deep neural network from a large (>>1TB) corpus of data (typically imagery, video,
audio, or text).
– Such training procedures often require customization of the neural network architecture, learning
criteria, and dataset pre-processing.
– In addition to the computational expense demanded by the learning algorithms, the need for rapid
prototyping and ease of development is extremely high.
• Current Approach:
– The largest applications so far are to image recognition and scientific studies of unsupervised learning
with 10 million images and up to 11 billion parameters on a 64 GPU HPC Infiniband cluster.
– Both supervised (using existing classified images) and unsupervised applications investigated.
• Futures:
– Large datasets of 100TB or more may be necessary in order to exploit the representational power of
the larger models.
– Training a self-driving car could take 100 million images at megapixel resolution.
– Deep Learning shares many characteristics with the broader field of machine learning. The paramount
requirements are high computational throughput for mostly dense linear algebra operations, and
extremely high productivity for researcher exploration.
– One needs integration of high performance libraries with high level (python) prototyping environments.
8
Deep Learning and Social Media

Recommended for you

The Pacific Research Platform
The Pacific Research PlatformThe Pacific Research Platform
The Pacific Research Platform

Opening Keynote Lecture 15th Annual ON*VECTOR International Photonics Workshop Calit2’s Qualcomm Institute University of California, San Diego February 29, 2016

analyticscyberinfrastructurepacific research platform
An Integrated West Coast Science DMZ for Data-Intensive Research
An Integrated West Coast Science DMZ for Data-Intensive ResearchAn Integrated West Coast Science DMZ for Data-Intensive Research
An Integrated West Coast Science DMZ for Data-Intensive Research

Panel Presentation CENIC Annual Conference University of California, Irvine - Irvine, CA March 9, 2015

big dataanalyticscyberinfrastucture
Hattrick Simpers TMS Machine Learning Workshop Slides
Hattrick Simpers TMS Machine Learning Workshop SlidesHattrick Simpers TMS Machine Learning Workshop Slides
Hattrick Simpers TMS Machine Learning Workshop Slides

The document discusses the Materials Genome Initiative (MGI) and the High-Throughput Experimental Materials Collaboratory (HTE-MC). It describes NIST's role in supporting MGI through developing a materials innovation infrastructure. It outlines the vision for HTE-MC, which would integrate high-throughput synthesis and characterization tools across multiple institutions through a shared network and data management platform. This would provide broader access to experimental facilities and materials data to support accelerated materials discovery. A workshop was held in 2018 to discuss establishing the HTE-MC concept and defining its technical, operational and business models.

9
Ilkay ALTINTAS and Geoffrey FOX - March, 2014
Example Use Case IV:
EISCAT 3D incoherent scatter radar system
• Application: EISCAT 3D incoherent scatter radar system
– EISCAT: European Incoherent Scatter Scientific Association
– Research on the lower, middle and upper atmosphere and ionosphere using the incoherent scatter radar
technique.
– This technique is the most powerful ground-based tool for these research applications.
– EISCAT studies instabilities in the ionosphere, as well as investigating the structure and dynamics of the
middle atmosphere. It is also a diagnostic instrument in ionospheric modification experiments with
addition of a separate Heating facility.
– Currently EISCAT operates 3 of the 10 major incoherent radar scattering instruments worldwide with its
facilities in in the Scandinavian sector, north of the Arctic Circle.
• Current Approach:
– The current old EISCAT radar generates terabytes per year rates and no present special challenges.
• Futures:
– The next generation radar, EISCAT_3D, will consist of a core site with a transmitting and receiving radar
arrays and four sites with receiving antenna arrays at some 100 km from the core.
– The fully operational 5-site system will generate several thousand times data of current EISCAT system
with 40 PB/year in 2022 and is expected to operate for 30 years.
– EISCAT 3D data e-Infrastructure plans to use the high performance computers for central site data
processing and high throughput computers for mirror sites data processing.
– Downloading the full data is not time critical, but operations require real-time information about certain
pre-defined events to be sent from the sites to the operation center and a real-time link from the operation
center to the sites to set the mode of radar operation on with immediate action.
9
Astronomy and Physics
10
Ilkay ALTINTAS and Geoffrey FOX - March, 2014
• Application: 51: Consumption forecasting in Smart Grids
– Predict energy consumption for customers, transformers, sub-stations and the electrical grid
service area using smart meters providing measurements every 15-mins at the granularity of
individual consumers within the service area of smart power utilities.
– Combine Head-end of smart meters (distributed), Utility databases (Customer Information,
Network topology; centralized), US Census data (distributed), NOAA weather data (distributed),
Micro-grid building information system (centralized), Micro-grid sensor network (distributed).
– This generalizes to real-time data-driven analytics for time series from cyber physical systems
• Current Approach:
– GIS based visualization.
– Data is around 4 TB a year for a city with 1.4M sensors in Los Angeles.
– Uses R/Matlab, Weka, Hadoop software.
– Significant privacy issues requiring anonymization by aggregation.
– Combine real time and historic data with machine learning for predicting consumption.
• Futures:
– Wide spread deployment of Smart Grids with new analytics integrating diverse data and
supporting curtailment requests. Mobile applications for client interactions.
Energy
Example Use Case V: Consumption forecasting in
Smart Grids
11
Ilkay ALTINTAS and Geoffrey FOX - March, 2014
• Application: 17:Pathology Imaging/ Digital Pathology II
• Current Approach:
– 1GB raw image data + 1.5GB analytical results per 2D image.
– MPI for image analysis; MapReduce + Hive with spatial extension on supercomputers and clouds.
– GPU’s used effectively. Figure below shows the architecture of Hadoop-GIS, a spatial data warehousing system over
MapReduce to support spatial analytics for analytical pathology imaging.
Example Use Case VI: Pathology Imaging
Healthcare/Life Sciences
Architecture of Hadoop-GIS, a spatial data warehousing system over
MapReduce to support spatial analytics for analytical pathology imaging
• Futures: Recently, 3D pathology
imaging is made possible through 3D
laser technologies or serially
sectioning hundreds of tissue
sections onto slides and scanning
them into digital images. Segmenting
3D microanatomic objects from
registered serial images could
produce tens of millions of 3D objects
from a single image. This provides a
deep “map” of human tissues for next
generation diagnosis. 1TB raw image
data + 1TB analytical results per 3D
image and 1PB data per moderated
hospital per year.
12
Ilkay ALTINTAS and Geoffrey FOX - March, 2014
• Application: 20: Comparative analysis for metagenomes and genomes
– Given a metagenomic sample, (1) determine the community composition in terms of other
reference isolate genomes, (2) characterize the function of its genes, (3) begin to infer
possible functional pathways, (4) characterize similarity or dissimilarity with other
metagenomic samples, (5) begin to characterize changes in community composition and
function due to changes in environmental pressures, (6) isolate sub-sections of data
based on quality measures and community composition.
• Current Approach:
– Integrated comparative analysis system for metagenomes and genomes, front ended by an
interactive Web UI with core data, backend precomputations, batch job computation submission
from the UI.
– Provide interface to standard bioinformatics tools (BLAST, HMMER, multiple alignment and
phylogenetic tools, gene callers, sequence feature predictors…).
• Futures:
– Management of heterogeneity of biological data is currently performed by RDMS (Oracle).
Unfortunately, it does not scale for even the current volume 50TB of data.
– NoSQL solutions aim at providing an alternative but unfortunately they do not always lend
themselves to real time interactive use, rapid and parallel bulk loading, and sometimes
have issues regarding robustness.
Example Use Case VII: Metagenomics
Healthcare/Life Sciences

Recommended for you

How to expand the Galaxy from genes to Earth in six simple steps (and live sm...
How to expand the Galaxy from genes to Earth in six simple steps (and live sm...How to expand the Galaxy from genes to Earth in six simple steps (and live sm...
How to expand the Galaxy from genes to Earth in six simple steps (and live sm...

FACE-IT is an effort to develop a new IT infrastructure to accelerate existing disciplinary research and enable information transfer among traditionally separate fields. At present, finding data and processing it into usable form can dominate research efforts. By providing ready access to not only data but also the software tools used to process it for specific uses (e.g., climate impact and economic model inputs), FACE-IT allows researchers to concentrate their efforts on analysis. Lowering barriers to data access allows researchers to stretch in new directions and allows researchers to learn and respond to the needs of other fields. FACE-IT builds on the Globus Galaxies platform, which has been developed over the past several years at the University of Chicago. FACE-IT also benefit from substantial software development undertaken by the communities who have developed most of the domain-specific tools required to populate FACE-IT with useful capabilities. The FACE-IT Galaxy manages earth system datatypes (as NetCDF), new tool parameters (dates, map, opendap), aggregated datatypes (RAFT), service providers and cool map visualizers.

galaxyearth scienceworkflow
ML in materials discovery
ML in materials discovery ML in materials discovery
ML in materials discovery

The document discusses using machine learning to accelerate materials discovery. Specifically: - Scientists developed a system combining machine learning algorithms trained on experimental data with high-throughput experiments to discover new metallic glass alloys 200 times faster than before. - The system uses machine learning models to predict optimal new material compositions and processing parameters based on large datasets of materials properties and compositions. - As an example, the document discusses using random forest machine learning on a dataset of 2722 hydrogen storage alloy compositions and properties to predict promising new alloy compositions for hydrogen storage applications.

grid mining
grid mininggrid mining
grid mining

The document discusses grids and their potential use for data mining applications in Earth science. Some key points: - Grids can connect distributed computing and data resources to enable large-scale applications and collaboration. - The Grid Miner application was developed to mine satellite data on NASA's Information Power Grid as a demonstration. - Grids could help couple satellite data archives to computational resources, allowing users to process large datasets. - For this to be realized, data archives need to be connected to grids and tools developed to enable scientists to access and analyze data.

13
Ilkay ALTINTAS and Geoffrey FOX - March, 2014
• Application: 27: Organizing large-scale, unstructured
collections of consumer photos
– Produce 3D reconstructions of scenes using collections of millions to billions of
consumer images, where neither the scene structure nor the camera positions
are known a priori.
– Use resulting 3d models to allow efficient browsing of large-scale photo
collections by geographic position.
– Geolocate new images by matching to 3d models. Perform object recognition on
each image. 3d reconstruction posed as a robust non-linear least squares
optimization problem where observed relations between images are constraints
and unknowns are 6-d camera pose of each image and 3-d position of each
point in the scene.
• Current Approach:
– Hadoop cluster with 480 cores processing data of initial applications.
– Note over 500 billion images on Facebook and over 5 billion on Flickr with over
500 million images added to social media sites each day.
13
Deep Learning Social Networking
Example Use Case VIII: Consumer photography
14
Ilkay ALTINTAS and Geoffrey FOX - March, 2014
27: Organizing large-scale, unstructured collections of
consumer photos II
• Futures:
– Need many analytics including feature extraction, feature matching, and large-scale
probabilistic inference, which appear in many or most computer vision and image
processing problems, including recognition, stereo resolution, and image denoising.
– Need to visualize large-scale 3-d reconstructions, and navigate large-scale collections of
images that have been aligned to maps.
Deep Learning Social Networking
15
Ilkay ALTINTAS and Geoffrey FOX - March, 2014
• Application: 28: Truthy: Information diffusion research from
Twitter Data
– Understanding how communication spreads on socio-technical networks.
– Detecting potentially harmful information spread at the early stage (e.g., deceiving
messages, orchestrated campaigns, untrustworthy information, etc.)
• Current Approach:
– 1) Acquisition and storage of a large volume (30 TB a year compressed) of continuous
streaming data from Twitter (~100 million messages/day, ~500GB data/day increasing);
– (2) near real-time analysis of such data, for anomaly detection, stream clustering, signal
classification and online-learning;
– 3) data retrieval, big data visualization, data-interactive Web interfaces, public API for data
querying. Use Python/SciPy/NumPy/MPI for data analysis. Information diffusion,
clustering, and dynamic network visualization capabilities already exist
• Futures:
– Truthy plans to expand incorporating Google+ and Facebook.
– Need to move towards Hadoop/IndexedHBase & HDFS distributed storage.
– Use Redis as an in-memory database to be a buffer for real-time analysis.
– Need streaming clustering, anomaly detection and online learning.
15
Deep Learning Social Networking
Example Use Case IX: Twitter Data
16
Ilkay ALTINTAS and Geoffrey FOX - March, 2014
1
Part of Property Summary Table

Recommended for you

Advanced Research Computing at York
Advanced Research Computing at YorkAdvanced Research Computing at York
Advanced Research Computing at York

This document provides an overview of advanced research computing resources and services available to researchers at the University of York. It describes the research computing facilities including research0, the York Advanced Research Computing Cluster (YARCC), the regional N8 HPC facility, and the national ARCHER HPC service. It also covers storage, virtual machines, databases, software, support and training resources, research data management, and includes case studies of researchers using the facilities. The resources aim to support researchers by providing computing power for complex analysis and large datasets that is faster and more productive than standard desktop computers.

NSF Software @ ApacheConNA
NSF Software @ ApacheConNANSF Software @ ApacheConNA
NSF Software @ ApacheConNA

A description of software as infrastructure at NSF, and how Apache projects may be similar. What lessons can be shared from one organization to the other? How does science software compare with more general software?

2016 06-07 data driven production
2016 06-07 data driven production2016 06-07 data driven production
2016 06-07 data driven production

Efficient O&G does not suffice in an industry downturn – effective investment in time and effort is required to rise above the pack Production analysis need not be mystical; it should not be rote Nuance and subtle variations provide leading indicators into impending production issues Decline curves, certainly crucial, must be analyzed in context Case-based, topological analysis, rule inference, curve plotting solutions are common solutions, but fall short Application of nuance analysis within environment of Data-Intensive Scientific Discovery

upstreamoil and gasproduction
17
Ilkay ALTINTAS and Geoffrey FOX - March, 2014
Requirements Gathering
• Data sources
– data size, file formats, rate of grow, at rest or in motion, etc.
• Data lifecycle management
– curation, conversion, quality check, pre-analytic processing, etc.
• Data transformation
– data fusion/mashup, analytics
• Capability infrastructure
– software tools, platform tools, hardware resources such as storage
and networking
• Security & Privacy; and data usage
– processed results in text, table, visual, and other formats
A total of 437 specific requirements under 35
high-level generalized requirement summaries.
18
Ilkay ALTINTAS and Geoffrey FOX - March, 2014
Interaction Between Subgroups
Technology
Roadmap
Requirements
& Use Cases
Definitions &
Taxonomies

Reference
Architecture 


Security &
Privacy


Due to time constraints, activities were carried out in parallel.
19
Ilkay ALTINTAS and Geoffrey FOX - March, 2014
Reference
Architecture
• Multiple stacks of
technologies
– Open and
Proprietary
• Provide example
stacks for different
applications
• Come up with usage
patterns and best
practices
20
Ilkay ALTINTAS and Geoffrey FOX - March, 2014
Next Steps
• Approach for RDA to implement use cases
and NBD to identify abstract interface
• Planning for implementation of usecases
– Resource availability
– Application-specific support
– Computation and storage leverage
• Multiple potential directions
– Prioritization is one of the goals for this meeting.

Recommended for you

Fr1T101-Kuo-20110729 IGARSS ESC.pptx
Fr1T101-Kuo-20110729 IGARSS ESC.pptxFr1T101-Kuo-20110729 IGARSS ESC.pptx
Fr1T101-Kuo-20110729 IGARSS ESC.pptx

The document proposes an Earth Science Collaboratory (ESC) that would provide access to Earth science models, data, tools, and services to facilitate collaboration and reproducibility in data-intensive Earth science research. It describes the current fragmented state of accessing and sharing models, data, tools, and knowledge. The ESC would integrate these components and provide services like cloud computing, discovery, and provenance tracking. It presents a use case of how the ESC could help collaboration in the development of precipitation retrieval algorithms for the Global Precipitation Measurement mission.

2016 03-16 digital energy luncheon
2016 03-16 digital energy luncheon2016 03-16 digital energy luncheon
2016 03-16 digital energy luncheon

Machine Learning encompasses data acquisition, transmission, retention, analysis, and reduction. The expected outgrowth of 24x7 data systems and operations centers is Knowledge Engineering and Data Intensive Analytics AKA Machine Learning. This presentation will develop and apply Machine Learning concepts to the Upstream O&G industry. Specific focus will be given to the fundamental concepts and definitions of Machine Learning along with the application of Machine Learning.

upstreamoil and gasmachine learning
Open Science Data Cloud - CCA 11
Open Science Data Cloud - CCA 11Open Science Data Cloud - CCA 11
Open Science Data Cloud - CCA 11

The Open Science Data Cloud is a hosted, managed, distributed facility that allows scientists to manage and archive medium and large datasets, provide computational resources to analyze the data, and share the data with colleagues and the public. It currently consists of 6 racks, 212 nodes, 1568 cores and 0.9 PB of storage across 4 locations with 10G networks. Projects using the Open Science Data Cloud include Bionimbus for hosting genomics data and Matsu 2 for providing flood data to disaster response teams. The goal is to build it out over the next 10 years into a small data center for science that can preserve data like libraries and museums preserve collections.

science clouddata intensive computingopen science data cloud
21
Ilkay ALTINTAS and Geoffrey FOX - March, 2014
Key Links
• Use cases listing:
http://bigdatawg.nist.gov/usecases.php
• Latest version of the document (Dated Oct 12,
2013):
http://bigdatawg.nist.gov/_uploadfiles/M0245_
v5_6066621242.docx

More Related Content

What's hot

The Pacific Research Platform
The Pacific Research PlatformThe Pacific Research Platform
The Pacific Research Platform
Larry Smarr
 
Long Term Ecological Research Network
Long Term Ecological Research NetworkLong Term Ecological Research Network
Long Term Ecological Research Network
TERN Australia
 
Exascale Computing Project (ECP) Update
Exascale Computing Project (ECP) UpdateExascale Computing Project (ECP) Update
Exascale Computing Project (ECP) Update
inside-BigData.com
 
GlobusWorld 2021: Arecibo Observatory Data Movement
GlobusWorld 2021: Arecibo Observatory Data MovementGlobusWorld 2021: Arecibo Observatory Data Movement
GlobusWorld 2021: Arecibo Observatory Data Movement
Globus
 
SmartGeo/Eiagrid portal (Guido Satta, CRS4)
SmartGeo/Eiagrid portal (Guido Satta, CRS4)SmartGeo/Eiagrid portal (Guido Satta, CRS4)
SmartGeo/Eiagrid portal (Guido Satta, CRS4)
CRS4 Research Center in Sardinia
 
Future of hpc
Future of hpcFuture of hpc
Future of hpc
Putchong Uthayopas
 
The Pacific Research Platform
The Pacific Research PlatformThe Pacific Research Platform
The Pacific Research Platform
Larry Smarr
 
An Integrated West Coast Science DMZ for Data-Intensive Research
An Integrated West Coast Science DMZ for Data-Intensive ResearchAn Integrated West Coast Science DMZ for Data-Intensive Research
An Integrated West Coast Science DMZ for Data-Intensive Research
Larry Smarr
 
Hattrick Simpers TMS Machine Learning Workshop Slides
Hattrick Simpers TMS Machine Learning Workshop SlidesHattrick Simpers TMS Machine Learning Workshop Slides
Hattrick Simpers TMS Machine Learning Workshop Slides
Jason Hattrick-Simpers
 
How to expand the Galaxy from genes to Earth in six simple steps (and live sm...
How to expand the Galaxy from genes to Earth in six simple steps (and live sm...How to expand the Galaxy from genes to Earth in six simple steps (and live sm...
How to expand the Galaxy from genes to Earth in six simple steps (and live sm...
Raffaele Montella
 
ML in materials discovery
ML in materials discovery ML in materials discovery
ML in materials discovery
Jason Hattrick-Simpers
 
grid mining
grid mininggrid mining
grid mining
ARNOLD
 
Advanced Research Computing at York
Advanced Research Computing at YorkAdvanced Research Computing at York
Advanced Research Computing at York
Ming Li
 
NSF Software @ ApacheConNA
NSF Software @ ApacheConNANSF Software @ ApacheConNA
NSF Software @ ApacheConNA
Daniel S. Katz
 
2016 06-07 data driven production
2016 06-07 data driven production2016 06-07 data driven production
2016 06-07 data driven production
Mark Reynolds
 
Fr1T101-Kuo-20110729 IGARSS ESC.pptx
Fr1T101-Kuo-20110729 IGARSS ESC.pptxFr1T101-Kuo-20110729 IGARSS ESC.pptx
Fr1T101-Kuo-20110729 IGARSS ESC.pptx
grssieee
 
2016 03-16 digital energy luncheon
2016 03-16 digital energy luncheon2016 03-16 digital energy luncheon
2016 03-16 digital energy luncheon
Mark Reynolds
 
Open Science Data Cloud - CCA 11
Open Science Data Cloud - CCA 11Open Science Data Cloud - CCA 11
Open Science Data Cloud - CCA 11
Robert Grossman
 
2016 04-19 machine learning
2016 04-19 machine learning2016 04-19 machine learning
2016 04-19 machine learning
Mark Reynolds
 
Building High Performance Computing Capability in the African Continent/Happy...
Building High Performance Computing Capability in the African Continent/Happy...Building High Performance Computing Capability in the African Continent/Happy...
Building High Performance Computing Capability in the African Continent/Happy...
Academy of Science of South Africa (ASSAf)
 

What's hot (20)

The Pacific Research Platform
The Pacific Research PlatformThe Pacific Research Platform
The Pacific Research Platform
 
Long Term Ecological Research Network
Long Term Ecological Research NetworkLong Term Ecological Research Network
Long Term Ecological Research Network
 
Exascale Computing Project (ECP) Update
Exascale Computing Project (ECP) UpdateExascale Computing Project (ECP) Update
Exascale Computing Project (ECP) Update
 
GlobusWorld 2021: Arecibo Observatory Data Movement
GlobusWorld 2021: Arecibo Observatory Data MovementGlobusWorld 2021: Arecibo Observatory Data Movement
GlobusWorld 2021: Arecibo Observatory Data Movement
 
SmartGeo/Eiagrid portal (Guido Satta, CRS4)
SmartGeo/Eiagrid portal (Guido Satta, CRS4)SmartGeo/Eiagrid portal (Guido Satta, CRS4)
SmartGeo/Eiagrid portal (Guido Satta, CRS4)
 
Future of hpc
Future of hpcFuture of hpc
Future of hpc
 
The Pacific Research Platform
The Pacific Research PlatformThe Pacific Research Platform
The Pacific Research Platform
 
An Integrated West Coast Science DMZ for Data-Intensive Research
An Integrated West Coast Science DMZ for Data-Intensive ResearchAn Integrated West Coast Science DMZ for Data-Intensive Research
An Integrated West Coast Science DMZ for Data-Intensive Research
 
Hattrick Simpers TMS Machine Learning Workshop Slides
Hattrick Simpers TMS Machine Learning Workshop SlidesHattrick Simpers TMS Machine Learning Workshop Slides
Hattrick Simpers TMS Machine Learning Workshop Slides
 
How to expand the Galaxy from genes to Earth in six simple steps (and live sm...
How to expand the Galaxy from genes to Earth in six simple steps (and live sm...How to expand the Galaxy from genes to Earth in six simple steps (and live sm...
How to expand the Galaxy from genes to Earth in six simple steps (and live sm...
 
ML in materials discovery
ML in materials discovery ML in materials discovery
ML in materials discovery
 
grid mining
grid mininggrid mining
grid mining
 
Advanced Research Computing at York
Advanced Research Computing at YorkAdvanced Research Computing at York
Advanced Research Computing at York
 
NSF Software @ ApacheConNA
NSF Software @ ApacheConNANSF Software @ ApacheConNA
NSF Software @ ApacheConNA
 
2016 06-07 data driven production
2016 06-07 data driven production2016 06-07 data driven production
2016 06-07 data driven production
 
Fr1T101-Kuo-20110729 IGARSS ESC.pptx
Fr1T101-Kuo-20110729 IGARSS ESC.pptxFr1T101-Kuo-20110729 IGARSS ESC.pptx
Fr1T101-Kuo-20110729 IGARSS ESC.pptx
 
2016 03-16 digital energy luncheon
2016 03-16 digital energy luncheon2016 03-16 digital energy luncheon
2016 03-16 digital energy luncheon
 
Open Science Data Cloud - CCA 11
Open Science Data Cloud - CCA 11Open Science Data Cloud - CCA 11
Open Science Data Cloud - CCA 11
 
2016 04-19 machine learning
2016 04-19 machine learning2016 04-19 machine learning
2016 04-19 machine learning
 
Building High Performance Computing Capability in the African Continent/Happy...
Building High Performance Computing Capability in the African Continent/Happy...Building High Performance Computing Capability in the African Continent/Happy...
Building High Performance Computing Capability in the African Continent/Happy...
 

Similar to big_data_casestudies_2.ppt

NIST Big Data Public Working Group NBD-PWG
NIST Big Data Public Working Group NBD-PWGNIST Big Data Public Working Group NBD-PWG
NIST Big Data Public Working Group NBD-PWG
Geoffrey Fox
 
High Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run TimeHigh Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run Time
Geoffrey Fox
 
Big Data HPC Convergence and a bunch of other things
Big Data HPC Convergence and a bunch of other thingsBig Data HPC Convergence and a bunch of other things
Big Data HPC Convergence and a bunch of other things
Geoffrey Fox
 
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Geoffrey Fox
 
The Department of Energy's Integrated Research Infrastructure (IRI)
The Department of Energy's Integrated Research Infrastructure (IRI)The Department of Energy's Integrated Research Infrastructure (IRI)
The Department of Energy's Integrated Research Infrastructure (IRI)
Globus
 
Shifting the Burden from the User to the Data Provider
Shifting the Burden from the User to the Data ProviderShifting the Burden from the User to the Data Provider
Shifting the Burden from the User to the Data Provider
The HDF-EOS Tools and Information Center
 
Big Data
Big Data Big Data
Internet2 Bio IT 2016 v2
Internet2 Bio IT 2016 v2Internet2 Bio IT 2016 v2
Internet2 Bio IT 2016 v2
Dan Taylor
 
Australia's Environmental Predictive Capability
Australia's Environmental Predictive CapabilityAustralia's Environmental Predictive Capability
Australia's Environmental Predictive Capability
TERN Australia
 
Big Data HPC Convergence
Big Data HPC ConvergenceBig Data HPC Convergence
Big Data HPC Convergence
Geoffrey Fox
 
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
National Information Standards Organization (NISO)
 
Accelerating Science, Technology and Innovation Through Open Data and Open Sc...
Accelerating Science, Technology and Innovation Through Open Data and Open Sc...Accelerating Science, Technology and Innovation Through Open Data and Open Sc...
Accelerating Science, Technology and Innovation Through Open Data and Open Sc...
African Open Science Platform
 
The Pacific Research Platform: a Science-Driven Big-Data Freeway System
The Pacific Research Platform: a Science-Driven Big-Data Freeway SystemThe Pacific Research Platform: a Science-Driven Big-Data Freeway System
The Pacific Research Platform: a Science-Driven Big-Data Freeway System
Larry Smarr
 
Data accessibility and the role of informatics in predicting the biosphere
Data accessibility and the role of informatics in predicting the biosphereData accessibility and the role of informatics in predicting the biosphere
Data accessibility and the role of informatics in predicting the biosphere
Alex Hardisty
 
Graham Pryor
Graham PryorGraham Pryor
Graham Pryor
Eduserv
 
The Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, EvolutionThe Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, Evolution
Ian Foster
 
Accelerating Science with Cloud Technologies in the ABoVE Science Cloud
Accelerating Science with Cloud Technologies in the ABoVE Science CloudAccelerating Science with Cloud Technologies in the ABoVE Science Cloud
Accelerating Science with Cloud Technologies in the ABoVE Science Cloud
Globus
 
BIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la Iglesia
BIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la IglesiaBIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la Iglesia
BIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la Iglesia
Maria de la Iglesia
 
NextGEOSS: The Next Generation European Data Hub and Cloud Platform for Earth...
NextGEOSS: The Next Generation European Data Hub and Cloud Platform for Earth...NextGEOSS: The Next Generation European Data Hub and Cloud Platform for Earth...
NextGEOSS: The Next Generation European Data Hub and Cloud Platform for Earth...
Wolfgang Ksoll
 
Victoria A. White Head, Computing Division Fermilab
Victoria A. White Head, Computing Division FermilabVictoria A. White Head, Computing Division Fermilab
Victoria A. White Head, Computing Division Fermilab
Videoguy
 

Similar to big_data_casestudies_2.ppt (20)

NIST Big Data Public Working Group NBD-PWG
NIST Big Data Public Working Group NBD-PWGNIST Big Data Public Working Group NBD-PWG
NIST Big Data Public Working Group NBD-PWG
 
High Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run TimeHigh Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run Time
 
Big Data HPC Convergence and a bunch of other things
Big Data HPC Convergence and a bunch of other thingsBig Data HPC Convergence and a bunch of other things
Big Data HPC Convergence and a bunch of other things
 
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
 
The Department of Energy's Integrated Research Infrastructure (IRI)
The Department of Energy's Integrated Research Infrastructure (IRI)The Department of Energy's Integrated Research Infrastructure (IRI)
The Department of Energy's Integrated Research Infrastructure (IRI)
 
Shifting the Burden from the User to the Data Provider
Shifting the Burden from the User to the Data ProviderShifting the Burden from the User to the Data Provider
Shifting the Burden from the User to the Data Provider
 
Big Data
Big Data Big Data
Big Data
 
Internet2 Bio IT 2016 v2
Internet2 Bio IT 2016 v2Internet2 Bio IT 2016 v2
Internet2 Bio IT 2016 v2
 
Australia's Environmental Predictive Capability
Australia's Environmental Predictive CapabilityAustralia's Environmental Predictive Capability
Australia's Environmental Predictive Capability
 
Big Data HPC Convergence
Big Data HPC ConvergenceBig Data HPC Convergence
Big Data HPC Convergence
 
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
 
Accelerating Science, Technology and Innovation Through Open Data and Open Sc...
Accelerating Science, Technology and Innovation Through Open Data and Open Sc...Accelerating Science, Technology and Innovation Through Open Data and Open Sc...
Accelerating Science, Technology and Innovation Through Open Data and Open Sc...
 
The Pacific Research Platform: a Science-Driven Big-Data Freeway System
The Pacific Research Platform: a Science-Driven Big-Data Freeway SystemThe Pacific Research Platform: a Science-Driven Big-Data Freeway System
The Pacific Research Platform: a Science-Driven Big-Data Freeway System
 
Data accessibility and the role of informatics in predicting the biosphere
Data accessibility and the role of informatics in predicting the biosphereData accessibility and the role of informatics in predicting the biosphere
Data accessibility and the role of informatics in predicting the biosphere
 
Graham Pryor
Graham PryorGraham Pryor
Graham Pryor
 
The Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, EvolutionThe Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, Evolution
 
Accelerating Science with Cloud Technologies in the ABoVE Science Cloud
Accelerating Science with Cloud Technologies in the ABoVE Science CloudAccelerating Science with Cloud Technologies in the ABoVE Science Cloud
Accelerating Science with Cloud Technologies in the ABoVE Science Cloud
 
BIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la Iglesia
BIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la IglesiaBIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la Iglesia
BIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la Iglesia
 
NextGEOSS: The Next Generation European Data Hub and Cloud Platform for Earth...
NextGEOSS: The Next Generation European Data Hub and Cloud Platform for Earth...NextGEOSS: The Next Generation European Data Hub and Cloud Platform for Earth...
NextGEOSS: The Next Generation European Data Hub and Cloud Platform for Earth...
 
Victoria A. White Head, Computing Division Fermilab
Victoria A. White Head, Computing Division FermilabVictoria A. White Head, Computing Division Fermilab
Victoria A. White Head, Computing Division Fermilab
 

More from vishal choudhary

SE-Lecture1.ppt
SE-Lecture1.pptSE-Lecture1.ppt
SE-Lecture1.ppt
vishal choudhary
 
SE-Testing.ppt
SE-Testing.pptSE-Testing.ppt
SE-Testing.ppt
vishal choudhary
 
SE-CyclomaticComplexityand Testing.ppt
SE-CyclomaticComplexityand Testing.pptSE-CyclomaticComplexityand Testing.ppt
SE-CyclomaticComplexityand Testing.ppt
vishal choudhary
 
SE-Lecture-7.pptx
SE-Lecture-7.pptxSE-Lecture-7.pptx
SE-Lecture-7.pptx
vishal choudhary
 
Se-Lecture-6.ppt
Se-Lecture-6.pptSe-Lecture-6.ppt
Se-Lecture-6.ppt
vishal choudhary
 
SE-Lecture-5.pptx
SE-Lecture-5.pptxSE-Lecture-5.pptx
SE-Lecture-5.pptx
vishal choudhary
 
XML.pptx
XML.pptxXML.pptx
SE-Lecture-8.pptx
SE-Lecture-8.pptxSE-Lecture-8.pptx
SE-Lecture-8.pptx
vishal choudhary
 
SE-coupling and cohesion.ppt
SE-coupling and cohesion.pptSE-coupling and cohesion.ppt
SE-coupling and cohesion.ppt
vishal choudhary
 
SE-Lecture-2.pptx
SE-Lecture-2.pptxSE-Lecture-2.pptx
SE-Lecture-2.pptx
vishal choudhary
 
SE-software design.ppt
SE-software design.pptSE-software design.ppt
SE-software design.ppt
vishal choudhary
 
SE1.ppt
SE1.pptSE1.ppt
SE-Lecture-4.pptx
SE-Lecture-4.pptxSE-Lecture-4.pptx
SE-Lecture-4.pptx
vishal choudhary
 
SE-Lecture=3.pptx
SE-Lecture=3.pptxSE-Lecture=3.pptx
SE-Lecture=3.pptx
vishal choudhary
 
Multimedia-Lecture-Animation.pptx
Multimedia-Lecture-Animation.pptxMultimedia-Lecture-Animation.pptx
Multimedia-Lecture-Animation.pptx
vishal choudhary
 
MultimediaLecture5.pptx
MultimediaLecture5.pptxMultimediaLecture5.pptx
MultimediaLecture5.pptx
vishal choudhary
 
Multimedia-Lecture-7.pptx
Multimedia-Lecture-7.pptxMultimedia-Lecture-7.pptx
Multimedia-Lecture-7.pptx
vishal choudhary
 
MultiMedia-Lecture-4.pptx
MultiMedia-Lecture-4.pptxMultiMedia-Lecture-4.pptx
MultiMedia-Lecture-4.pptx
vishal choudhary
 
Multimedia-Lecture-6.pptx
Multimedia-Lecture-6.pptxMultimedia-Lecture-6.pptx
Multimedia-Lecture-6.pptx
vishal choudhary
 
Multimedia-Lecture-3.pptx
Multimedia-Lecture-3.pptxMultimedia-Lecture-3.pptx
Multimedia-Lecture-3.pptx
vishal choudhary
 

More from vishal choudhary (20)

SE-Lecture1.ppt
SE-Lecture1.pptSE-Lecture1.ppt
SE-Lecture1.ppt
 
SE-Testing.ppt
SE-Testing.pptSE-Testing.ppt
SE-Testing.ppt
 
SE-CyclomaticComplexityand Testing.ppt
SE-CyclomaticComplexityand Testing.pptSE-CyclomaticComplexityand Testing.ppt
SE-CyclomaticComplexityand Testing.ppt
 
SE-Lecture-7.pptx
SE-Lecture-7.pptxSE-Lecture-7.pptx
SE-Lecture-7.pptx
 
Se-Lecture-6.ppt
Se-Lecture-6.pptSe-Lecture-6.ppt
Se-Lecture-6.ppt
 
SE-Lecture-5.pptx
SE-Lecture-5.pptxSE-Lecture-5.pptx
SE-Lecture-5.pptx
 
XML.pptx
XML.pptxXML.pptx
XML.pptx
 
SE-Lecture-8.pptx
SE-Lecture-8.pptxSE-Lecture-8.pptx
SE-Lecture-8.pptx
 
SE-coupling and cohesion.ppt
SE-coupling and cohesion.pptSE-coupling and cohesion.ppt
SE-coupling and cohesion.ppt
 
SE-Lecture-2.pptx
SE-Lecture-2.pptxSE-Lecture-2.pptx
SE-Lecture-2.pptx
 
SE-software design.ppt
SE-software design.pptSE-software design.ppt
SE-software design.ppt
 
SE1.ppt
SE1.pptSE1.ppt
SE1.ppt
 
SE-Lecture-4.pptx
SE-Lecture-4.pptxSE-Lecture-4.pptx
SE-Lecture-4.pptx
 
SE-Lecture=3.pptx
SE-Lecture=3.pptxSE-Lecture=3.pptx
SE-Lecture=3.pptx
 
Multimedia-Lecture-Animation.pptx
Multimedia-Lecture-Animation.pptxMultimedia-Lecture-Animation.pptx
Multimedia-Lecture-Animation.pptx
 
MultimediaLecture5.pptx
MultimediaLecture5.pptxMultimediaLecture5.pptx
MultimediaLecture5.pptx
 
Multimedia-Lecture-7.pptx
Multimedia-Lecture-7.pptxMultimedia-Lecture-7.pptx
Multimedia-Lecture-7.pptx
 
MultiMedia-Lecture-4.pptx
MultiMedia-Lecture-4.pptxMultiMedia-Lecture-4.pptx
MultiMedia-Lecture-4.pptx
 
Multimedia-Lecture-6.pptx
Multimedia-Lecture-6.pptxMultimedia-Lecture-6.pptx
Multimedia-Lecture-6.pptx
 
Multimedia-Lecture-3.pptx
Multimedia-Lecture-3.pptxMultimedia-Lecture-3.pptx
Multimedia-Lecture-3.pptx
 

Recently uploaded

Final_SD_Session3_Ferriols, Ador Dionisio, Fajardo.pptx
Final_SD_Session3_Ferriols, Ador Dionisio, Fajardo.pptxFinal_SD_Session3_Ferriols, Ador Dionisio, Fajardo.pptx
Final_SD_Session3_Ferriols, Ador Dionisio, Fajardo.pptx
shimeathdelrosario1
 
Lecture_Notes_Unit4_Chapter_8_9_10_RDBMS for the students affiliated by alaga...
Lecture_Notes_Unit4_Chapter_8_9_10_RDBMS for the students affiliated by alaga...Lecture_Notes_Unit4_Chapter_8_9_10_RDBMS for the students affiliated by alaga...
Lecture_Notes_Unit4_Chapter_8_9_10_RDBMS for the students affiliated by alaga...
Murugan Solaiyappan
 
How to Install Theme in the Odoo 17 ERP
How to  Install Theme in the Odoo 17 ERPHow to  Install Theme in the Odoo 17 ERP
How to Install Theme in the Odoo 17 ERP
Celine George
 
Views in Odoo - Advanced Views - Pivot View in Odoo 17
Views in Odoo - Advanced Views - Pivot View in Odoo 17Views in Odoo - Advanced Views - Pivot View in Odoo 17
Views in Odoo - Advanced Views - Pivot View in Odoo 17
Celine George
 
How to Create Sequence Numbers in Odoo 17
How to Create Sequence Numbers in Odoo 17How to Create Sequence Numbers in Odoo 17
How to Create Sequence Numbers in Odoo 17
Celine George
 
AI Risk Management: ISO/IEC 42001, the EU AI Act, and ISO/IEC 23894
AI Risk Management: ISO/IEC 42001, the EU AI Act, and ISO/IEC 23894AI Risk Management: ISO/IEC 42001, the EU AI Act, and ISO/IEC 23894
AI Risk Management: ISO/IEC 42001, the EU AI Act, and ISO/IEC 23894
PECB
 
How to Configure Time Off Types in Odoo 17
How to Configure Time Off Types in Odoo 17How to Configure Time Off Types in Odoo 17
How to Configure Time Off Types in Odoo 17
Celine George
 
National Learning Camp( Reading Intervention for grade1)
National Learning Camp( Reading Intervention for grade1)National Learning Camp( Reading Intervention for grade1)
National Learning Camp( Reading Intervention for grade1)
SaadaGrijaldo1
 
Understanding and Interpreting Teachers’ TPACK for Teaching Multimodalities i...
Understanding and Interpreting Teachers’ TPACK for Teaching Multimodalities i...Understanding and Interpreting Teachers’ TPACK for Teaching Multimodalities i...
Understanding and Interpreting Teachers’ TPACK for Teaching Multimodalities i...
Neny Isharyanti
 
(T.L.E.) Agriculture: Essentials of Gardening
(T.L.E.) Agriculture: Essentials of Gardening(T.L.E.) Agriculture: Essentials of Gardening
(T.L.E.) Agriculture: Essentials of Gardening
MJDuyan
 
Front Desk Management in the Odoo 17 ERP
Front Desk  Management in the Odoo 17 ERPFront Desk  Management in the Odoo 17 ERP
Front Desk Management in the Odoo 17 ERP
Celine George
 
How to Show Sample Data in Tree and Kanban View in Odoo 17
How to Show Sample Data in Tree and Kanban View in Odoo 17How to Show Sample Data in Tree and Kanban View in Odoo 17
How to Show Sample Data in Tree and Kanban View in Odoo 17
Celine George
 
Credit limit improvement system in odoo 17
Credit limit improvement system in odoo 17Credit limit improvement system in odoo 17
Credit limit improvement system in odoo 17
Celine George
 
BRIGADA ESKWELA OPENING PROGRAM KICK OFF.pptx
BRIGADA ESKWELA OPENING PROGRAM KICK OFF.pptxBRIGADA ESKWELA OPENING PROGRAM KICK OFF.pptx
BRIGADA ESKWELA OPENING PROGRAM KICK OFF.pptx
kambal1234567890
 
L1 L2- NLC PPT for Grade 10 intervention
L1 L2- NLC PPT for Grade 10 interventionL1 L2- NLC PPT for Grade 10 intervention
L1 L2- NLC PPT for Grade 10 intervention
RHODAJANEAURESTILA
 
DANH SÁCH THÍ SINH XÉT TUYỂN SỚM ĐỦ ĐIỀU KIỆN TRÚNG TUYỂN ĐẠI HỌC CHÍNH QUY N...
DANH SÁCH THÍ SINH XÉT TUYỂN SỚM ĐỦ ĐIỀU KIỆN TRÚNG TUYỂN ĐẠI HỌC CHÍNH QUY N...DANH SÁCH THÍ SINH XÉT TUYỂN SỚM ĐỦ ĐIỀU KIỆN TRÚNG TUYỂN ĐẠI HỌC CHÍNH QUY N...
DANH SÁCH THÍ SINH XÉT TUYỂN SỚM ĐỦ ĐIỀU KIỆN TRÚNG TUYỂN ĐẠI HỌC CHÍNH QUY N...
thanhluan21
 
Configuring Single Sign-On (SSO) via Identity Management | MuleSoft Mysore Me...
Configuring Single Sign-On (SSO) via Identity Management | MuleSoft Mysore Me...Configuring Single Sign-On (SSO) via Identity Management | MuleSoft Mysore Me...
Configuring Single Sign-On (SSO) via Identity Management | MuleSoft Mysore Me...
MysoreMuleSoftMeetup
 
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
marianell3076
 
NAEYC Code of Ethical Conduct Resource Book
NAEYC Code of Ethical Conduct Resource BookNAEYC Code of Ethical Conduct Resource Book
NAEYC Code of Ethical Conduct Resource Book
lakitawilson
 
How to Add Colour Kanban Records in Odoo 17 Notebook
How to Add Colour Kanban Records in Odoo 17 NotebookHow to Add Colour Kanban Records in Odoo 17 Notebook
How to Add Colour Kanban Records in Odoo 17 Notebook
Celine George
 

Recently uploaded (20)

Final_SD_Session3_Ferriols, Ador Dionisio, Fajardo.pptx
Final_SD_Session3_Ferriols, Ador Dionisio, Fajardo.pptxFinal_SD_Session3_Ferriols, Ador Dionisio, Fajardo.pptx
Final_SD_Session3_Ferriols, Ador Dionisio, Fajardo.pptx
 
Lecture_Notes_Unit4_Chapter_8_9_10_RDBMS for the students affiliated by alaga...
Lecture_Notes_Unit4_Chapter_8_9_10_RDBMS for the students affiliated by alaga...Lecture_Notes_Unit4_Chapter_8_9_10_RDBMS for the students affiliated by alaga...
Lecture_Notes_Unit4_Chapter_8_9_10_RDBMS for the students affiliated by alaga...
 
How to Install Theme in the Odoo 17 ERP
How to  Install Theme in the Odoo 17 ERPHow to  Install Theme in the Odoo 17 ERP
How to Install Theme in the Odoo 17 ERP
 
Views in Odoo - Advanced Views - Pivot View in Odoo 17
Views in Odoo - Advanced Views - Pivot View in Odoo 17Views in Odoo - Advanced Views - Pivot View in Odoo 17
Views in Odoo - Advanced Views - Pivot View in Odoo 17
 
How to Create Sequence Numbers in Odoo 17
How to Create Sequence Numbers in Odoo 17How to Create Sequence Numbers in Odoo 17
How to Create Sequence Numbers in Odoo 17
 
AI Risk Management: ISO/IEC 42001, the EU AI Act, and ISO/IEC 23894
AI Risk Management: ISO/IEC 42001, the EU AI Act, and ISO/IEC 23894AI Risk Management: ISO/IEC 42001, the EU AI Act, and ISO/IEC 23894
AI Risk Management: ISO/IEC 42001, the EU AI Act, and ISO/IEC 23894
 
How to Configure Time Off Types in Odoo 17
How to Configure Time Off Types in Odoo 17How to Configure Time Off Types in Odoo 17
How to Configure Time Off Types in Odoo 17
 
National Learning Camp( Reading Intervention for grade1)
National Learning Camp( Reading Intervention for grade1)National Learning Camp( Reading Intervention for grade1)
National Learning Camp( Reading Intervention for grade1)
 
Understanding and Interpreting Teachers’ TPACK for Teaching Multimodalities i...
Understanding and Interpreting Teachers’ TPACK for Teaching Multimodalities i...Understanding and Interpreting Teachers’ TPACK for Teaching Multimodalities i...
Understanding and Interpreting Teachers’ TPACK for Teaching Multimodalities i...
 
(T.L.E.) Agriculture: Essentials of Gardening
(T.L.E.) Agriculture: Essentials of Gardening(T.L.E.) Agriculture: Essentials of Gardening
(T.L.E.) Agriculture: Essentials of Gardening
 
Front Desk Management in the Odoo 17 ERP
Front Desk  Management in the Odoo 17 ERPFront Desk  Management in the Odoo 17 ERP
Front Desk Management in the Odoo 17 ERP
 
How to Show Sample Data in Tree and Kanban View in Odoo 17
How to Show Sample Data in Tree and Kanban View in Odoo 17How to Show Sample Data in Tree and Kanban View in Odoo 17
How to Show Sample Data in Tree and Kanban View in Odoo 17
 
Credit limit improvement system in odoo 17
Credit limit improvement system in odoo 17Credit limit improvement system in odoo 17
Credit limit improvement system in odoo 17
 
BRIGADA ESKWELA OPENING PROGRAM KICK OFF.pptx
BRIGADA ESKWELA OPENING PROGRAM KICK OFF.pptxBRIGADA ESKWELA OPENING PROGRAM KICK OFF.pptx
BRIGADA ESKWELA OPENING PROGRAM KICK OFF.pptx
 
L1 L2- NLC PPT for Grade 10 intervention
L1 L2- NLC PPT for Grade 10 interventionL1 L2- NLC PPT for Grade 10 intervention
L1 L2- NLC PPT for Grade 10 intervention
 
DANH SÁCH THÍ SINH XÉT TUYỂN SỚM ĐỦ ĐIỀU KIỆN TRÚNG TUYỂN ĐẠI HỌC CHÍNH QUY N...
DANH SÁCH THÍ SINH XÉT TUYỂN SỚM ĐỦ ĐIỀU KIỆN TRÚNG TUYỂN ĐẠI HỌC CHÍNH QUY N...DANH SÁCH THÍ SINH XÉT TUYỂN SỚM ĐỦ ĐIỀU KIỆN TRÚNG TUYỂN ĐẠI HỌC CHÍNH QUY N...
DANH SÁCH THÍ SINH XÉT TUYỂN SỚM ĐỦ ĐIỀU KIỆN TRÚNG TUYỂN ĐẠI HỌC CHÍNH QUY N...
 
Configuring Single Sign-On (SSO) via Identity Management | MuleSoft Mysore Me...
Configuring Single Sign-On (SSO) via Identity Management | MuleSoft Mysore Me...Configuring Single Sign-On (SSO) via Identity Management | MuleSoft Mysore Me...
Configuring Single Sign-On (SSO) via Identity Management | MuleSoft Mysore Me...
 
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
 
NAEYC Code of Ethical Conduct Resource Book
NAEYC Code of Ethical Conduct Resource BookNAEYC Code of Ethical Conduct Resource Book
NAEYC Code of Ethical Conduct Resource Book
 
How to Add Colour Kanban Records in Odoo 17 Notebook
How to Add Colour Kanban Records in Odoo 17 NotebookHow to Add Colour Kanban Records in Odoo 17 Notebook
How to Add Colour Kanban Records in Odoo 17 Notebook
 

big_data_casestudies_2.ppt

  • 1. 1 Ilkay ALTINTAS and Geoffrey FOX - March, 2014 Big Data Use Cases and Requirements
  • 2. 2 Ilkay ALTINTAS and Geoffrey FOX - March, 2014 Requirements and Use Case Subgroup 2 The focus is to form a community of interest from industry, academia, and government, with the goal of developing a consensus list of Big Data requirements across all stakeholders. This includes gathering and understanding various use cases from diversified application domains. Tasks •Gather use case input from all stakeholders •Derive Big Data requirements from each use case. •Analyze/prioritize a list of challenging general requirements that may delay or prevent adoption of Big Data deployment •Work with Reference Architecture to validate requirements and reference architecture •Develop a set of general patterns capturing the “essence” of use cases (to do)
  • 3. 3 Ilkay ALTINTAS and Geoffrey FOX - March, 2014 Use Case Template • 26 fields completed for 51 usecases – Government Operation: 4 – Commercial: 8 – Defense: 3 – Healthcare and Life Sciences: 10 – Deep Learning and Social Media: 6 – The Ecosystem for Research: 4 – Astronomy and Physics: 5 – Earth, Environmental and Polar Science: 10 – Energy: 1
  • 4. 4 Ilkay ALTINTAS and Geoffrey FOX - March, 2014 51 Detailed Use Cases: Many TB’s to Many PB’s • Government Operation: National Archives and Records Administration, Census Bureau • Commercial: Finance in Cloud, Cloud Backup, Mendeley (Citations), Netflix, Web Search, Digital Materials, Cargo shipping (as in UPS) • Defense: Sensors, Image surveillance, Situation Assessment • Healthcare and Life Sciences: Medical records, Graph and Probabilistic analysis, Pathology, Bioimaging, Genomics, Epidemiology, People Activity models, Biodiversity • Deep Learning and Social Media: Driving Car, Geolocate images/cameras, Twitter, Crowd Sourcing, Network Science, NIST benchmark datasets • The Ecosystem for Research: Metadata, Collaboration, Language Translation, Light source experiments • Astronomy and Physics: Sky Surveys compared to simulation, Large Hadron Collider at CERN, Belle Accelerator II in Japan • Earth, Environmental and Polar Science: Radar Scattering in Atmosphere, Earthquake, Ocean, Earth Observation, Ice sheet Radar scattering, Earth radar mapping, Climate simulation datasets, Atmospheric turbulence identification, Subsurface Biogeochemistry (microbes to watersheds), AmeriFlux and FLUXNET gas sensors • Energy: Smart grid Next step involves matching extracted requirements and reference architecture. Alternatively develop a set of general patterns capturing the “essence” of use cases.
  • 5. 5 Ilkay ALTINTAS and Geoffrey FOX - March, 2014 Some Trends • Practitioners consider themselves Data Scientists • Images are a major source of Big Data – Radar – Light Synchrotrons – Phones – Bioimaging 5 • Hadoop and HDFS dominant • Business – main emphasis at NIST – interested in analytics and assume HDFS • Academia also extremely interested in data management • Clouds v. Grids
  • 6. 6 Ilkay ALTINTAS and Geoffrey FOX - March, 2014 Example Use Case I: Summary of Genomics • Application: 19: NIST Genome in a Bottle Consortium – integrates data from multiple sequencing technologies and methods – develops highly confident characterization of whole human genomes as reference materials, – develops methods to use these Reference Materials to assess performance of any genome sequencing run. • Current Approach: – The storage of ~40TB NFS at NIST is full; there are also PBs of genomics data at NIH/NCBI. – Use Open-source sequencing bioinformatics software from academic groups on a 72 core cluster at NIST supplemented by larger systems at collaborators. • Futures: – DNA sequencers can generate ~300GB compressed data/day which volume has increased much faster than Moore’s Law. – Future data could include other ‘omics’ measurements, which will be even larger than DNA sequencing. Clouds have been explored. Healthcare/Life Sciences
  • 7. 7 Ilkay ALTINTAS and Geoffrey FOX - March, 2014 Example Use Case II: Census Bureau Statistical Survey Response Improvement (Adaptive Design) • Application: Survey costs are increasing as survey response declines. – Uses advanced “recommendation system techniques” that are open and scientifically objective – Data mashed up from several sources and historical survey para-data (administrative data about the survey) to drive operational processes – The end goal is to increase quality and reduce the cost of field surveys • Current Approach: – ~1PB of data coming from surveys and other government administrative sources. – Data can be streamed with approximately 150 million records transmitted as field data streamed continuously, during the decennial census. – All data must be both confidential and secure. – All processes must be auditable for security and confidentiality as required by various legal statutes. – Data quality should be high and statistically checked for accuracy and reliability throughout the collection process. – Use Hadoop, Spark, Hive, R, SAS, Mahout, Allegrograph, MySQL, Oracle, Storm, BigMemory, Cassandra, Pig software. • Futures: – Analytics needs to be developed which give statistical estimations that provide more detail, on a more near real time basis for less cost. – The reliability of estimated statistics from such “mashed up” sources still must be evaluated. Government Operation
  • 8. 8 Ilkay ALTINTAS and Geoffrey FOX - March, 2014 Example Use Case III: 26: Large-scale Deep Learning • Application: 26: Large-scale Deep Learning – Large models (e.g., neural networks with more neurons and connections) combined with large datasets are increasingly the top performers in benchmark tasks for vision, speech, and Natural Language Processing. – One needs to train a deep neural network from a large (>>1TB) corpus of data (typically imagery, video, audio, or text). – Such training procedures often require customization of the neural network architecture, learning criteria, and dataset pre-processing. – In addition to the computational expense demanded by the learning algorithms, the need for rapid prototyping and ease of development is extremely high. • Current Approach: – The largest applications so far are to image recognition and scientific studies of unsupervised learning with 10 million images and up to 11 billion parameters on a 64 GPU HPC Infiniband cluster. – Both supervised (using existing classified images) and unsupervised applications investigated. • Futures: – Large datasets of 100TB or more may be necessary in order to exploit the representational power of the larger models. – Training a self-driving car could take 100 million images at megapixel resolution. – Deep Learning shares many characteristics with the broader field of machine learning. The paramount requirements are high computational throughput for mostly dense linear algebra operations, and extremely high productivity for researcher exploration. – One needs integration of high performance libraries with high level (python) prototyping environments. 8 Deep Learning and Social Media
  • 9. 9 Ilkay ALTINTAS and Geoffrey FOX - March, 2014 Example Use Case IV: EISCAT 3D incoherent scatter radar system • Application: EISCAT 3D incoherent scatter radar system – EISCAT: European Incoherent Scatter Scientific Association – Research on the lower, middle and upper atmosphere and ionosphere using the incoherent scatter radar technique. – This technique is the most powerful ground-based tool for these research applications. – EISCAT studies instabilities in the ionosphere, as well as investigating the structure and dynamics of the middle atmosphere. It is also a diagnostic instrument in ionospheric modification experiments with addition of a separate Heating facility. – Currently EISCAT operates 3 of the 10 major incoherent radar scattering instruments worldwide with its facilities in in the Scandinavian sector, north of the Arctic Circle. • Current Approach: – The current old EISCAT radar generates terabytes per year rates and no present special challenges. • Futures: – The next generation radar, EISCAT_3D, will consist of a core site with a transmitting and receiving radar arrays and four sites with receiving antenna arrays at some 100 km from the core. – The fully operational 5-site system will generate several thousand times data of current EISCAT system with 40 PB/year in 2022 and is expected to operate for 30 years. – EISCAT 3D data e-Infrastructure plans to use the high performance computers for central site data processing and high throughput computers for mirror sites data processing. – Downloading the full data is not time critical, but operations require real-time information about certain pre-defined events to be sent from the sites to the operation center and a real-time link from the operation center to the sites to set the mode of radar operation on with immediate action. 9 Astronomy and Physics
  • 10. 10 Ilkay ALTINTAS and Geoffrey FOX - March, 2014 • Application: 51: Consumption forecasting in Smart Grids – Predict energy consumption for customers, transformers, sub-stations and the electrical grid service area using smart meters providing measurements every 15-mins at the granularity of individual consumers within the service area of smart power utilities. – Combine Head-end of smart meters (distributed), Utility databases (Customer Information, Network topology; centralized), US Census data (distributed), NOAA weather data (distributed), Micro-grid building information system (centralized), Micro-grid sensor network (distributed). – This generalizes to real-time data-driven analytics for time series from cyber physical systems • Current Approach: – GIS based visualization. – Data is around 4 TB a year for a city with 1.4M sensors in Los Angeles. – Uses R/Matlab, Weka, Hadoop software. – Significant privacy issues requiring anonymization by aggregation. – Combine real time and historic data with machine learning for predicting consumption. • Futures: – Wide spread deployment of Smart Grids with new analytics integrating diverse data and supporting curtailment requests. Mobile applications for client interactions. Energy Example Use Case V: Consumption forecasting in Smart Grids
  • 11. 11 Ilkay ALTINTAS and Geoffrey FOX - March, 2014 • Application: 17:Pathology Imaging/ Digital Pathology II • Current Approach: – 1GB raw image data + 1.5GB analytical results per 2D image. – MPI for image analysis; MapReduce + Hive with spatial extension on supercomputers and clouds. – GPU’s used effectively. Figure below shows the architecture of Hadoop-GIS, a spatial data warehousing system over MapReduce to support spatial analytics for analytical pathology imaging. Example Use Case VI: Pathology Imaging Healthcare/Life Sciences Architecture of Hadoop-GIS, a spatial data warehousing system over MapReduce to support spatial analytics for analytical pathology imaging • Futures: Recently, 3D pathology imaging is made possible through 3D laser technologies or serially sectioning hundreds of tissue sections onto slides and scanning them into digital images. Segmenting 3D microanatomic objects from registered serial images could produce tens of millions of 3D objects from a single image. This provides a deep “map” of human tissues for next generation diagnosis. 1TB raw image data + 1TB analytical results per 3D image and 1PB data per moderated hospital per year.
  • 12. 12 Ilkay ALTINTAS and Geoffrey FOX - March, 2014 • Application: 20: Comparative analysis for metagenomes and genomes – Given a metagenomic sample, (1) determine the community composition in terms of other reference isolate genomes, (2) characterize the function of its genes, (3) begin to infer possible functional pathways, (4) characterize similarity or dissimilarity with other metagenomic samples, (5) begin to characterize changes in community composition and function due to changes in environmental pressures, (6) isolate sub-sections of data based on quality measures and community composition. • Current Approach: – Integrated comparative analysis system for metagenomes and genomes, front ended by an interactive Web UI with core data, backend precomputations, batch job computation submission from the UI. – Provide interface to standard bioinformatics tools (BLAST, HMMER, multiple alignment and phylogenetic tools, gene callers, sequence feature predictors…). • Futures: – Management of heterogeneity of biological data is currently performed by RDMS (Oracle). Unfortunately, it does not scale for even the current volume 50TB of data. – NoSQL solutions aim at providing an alternative but unfortunately they do not always lend themselves to real time interactive use, rapid and parallel bulk loading, and sometimes have issues regarding robustness. Example Use Case VII: Metagenomics Healthcare/Life Sciences
  • 13. 13 Ilkay ALTINTAS and Geoffrey FOX - March, 2014 • Application: 27: Organizing large-scale, unstructured collections of consumer photos – Produce 3D reconstructions of scenes using collections of millions to billions of consumer images, where neither the scene structure nor the camera positions are known a priori. – Use resulting 3d models to allow efficient browsing of large-scale photo collections by geographic position. – Geolocate new images by matching to 3d models. Perform object recognition on each image. 3d reconstruction posed as a robust non-linear least squares optimization problem where observed relations between images are constraints and unknowns are 6-d camera pose of each image and 3-d position of each point in the scene. • Current Approach: – Hadoop cluster with 480 cores processing data of initial applications. – Note over 500 billion images on Facebook and over 5 billion on Flickr with over 500 million images added to social media sites each day. 13 Deep Learning Social Networking Example Use Case VIII: Consumer photography
  • 14. 14 Ilkay ALTINTAS and Geoffrey FOX - March, 2014 27: Organizing large-scale, unstructured collections of consumer photos II • Futures: – Need many analytics including feature extraction, feature matching, and large-scale probabilistic inference, which appear in many or most computer vision and image processing problems, including recognition, stereo resolution, and image denoising. – Need to visualize large-scale 3-d reconstructions, and navigate large-scale collections of images that have been aligned to maps. Deep Learning Social Networking
  • 15. 15 Ilkay ALTINTAS and Geoffrey FOX - March, 2014 • Application: 28: Truthy: Information diffusion research from Twitter Data – Understanding how communication spreads on socio-technical networks. – Detecting potentially harmful information spread at the early stage (e.g., deceiving messages, orchestrated campaigns, untrustworthy information, etc.) • Current Approach: – 1) Acquisition and storage of a large volume (30 TB a year compressed) of continuous streaming data from Twitter (~100 million messages/day, ~500GB data/day increasing); – (2) near real-time analysis of such data, for anomaly detection, stream clustering, signal classification and online-learning; – 3) data retrieval, big data visualization, data-interactive Web interfaces, public API for data querying. Use Python/SciPy/NumPy/MPI for data analysis. Information diffusion, clustering, and dynamic network visualization capabilities already exist • Futures: – Truthy plans to expand incorporating Google+ and Facebook. – Need to move towards Hadoop/IndexedHBase & HDFS distributed storage. – Use Redis as an in-memory database to be a buffer for real-time analysis. – Need streaming clustering, anomaly detection and online learning. 15 Deep Learning Social Networking Example Use Case IX: Twitter Data
  • 16. 16 Ilkay ALTINTAS and Geoffrey FOX - March, 2014 1 Part of Property Summary Table
  • 17. 17 Ilkay ALTINTAS and Geoffrey FOX - March, 2014 Requirements Gathering • Data sources – data size, file formats, rate of grow, at rest or in motion, etc. • Data lifecycle management – curation, conversion, quality check, pre-analytic processing, etc. • Data transformation – data fusion/mashup, analytics • Capability infrastructure – software tools, platform tools, hardware resources such as storage and networking • Security & Privacy; and data usage – processed results in text, table, visual, and other formats A total of 437 specific requirements under 35 high-level generalized requirement summaries.
  • 18. 18 Ilkay ALTINTAS and Geoffrey FOX - March, 2014 Interaction Between Subgroups Technology Roadmap Requirements & Use Cases Definitions & Taxonomies  Reference Architecture    Security & Privacy   Due to time constraints, activities were carried out in parallel.
  • 19. 19 Ilkay ALTINTAS and Geoffrey FOX - March, 2014 Reference Architecture • Multiple stacks of technologies – Open and Proprietary • Provide example stacks for different applications • Come up with usage patterns and best practices
  • 20. 20 Ilkay ALTINTAS and Geoffrey FOX - March, 2014 Next Steps • Approach for RDA to implement use cases and NBD to identify abstract interface • Planning for implementation of usecases – Resource availability – Application-specific support – Computation and storage leverage • Multiple potential directions – Prioritization is one of the goals for this meeting.
  • 21. 21 Ilkay ALTINTAS and Geoffrey FOX - March, 2014 Key Links • Use cases listing: http://bigdatawg.nist.gov/usecases.php • Latest version of the document (Dated Oct 12, 2013): http://bigdatawg.nist.gov/_uploadfiles/M0245_ v5_6066621242.docx