SlideShare a Scribd company logo
By: Kiran Buriro
Assigned by: Sir Fida Chandio
What is KNIME ?
• KNIME Stands for Konstanz Information Miner.
• Developed at University of Konstanz in Germany 2004-2006 and focused
initially on pharmaceutical research.
• The KNIME is an open source platform for analytical data
modelling and processing.
• KNIME allows users to visually create data flows (or pipelines)
• Written in Java based on the Eclipse SDK platform .
• Modular platform for building and executing workflows using predefined
components, called nodes.
• Core functionality available for tasks such as standard data mining, analysis
and manipulation.
• GUI based with scripting integration.
• An especially powerful aspect of KNIME is its ability to integrate data from multiple
sources
• KNIME also offers extensions that allow it to interface with R, Python, Java, and SQL.
KNIME DATA ANALYTICS LIFECYCLE
READ
DATA
READ
DATA
READ
DATA
Extract,
Transform,
Load (ETL)
Data
Analytics or
Predictive
Analysis
Reporting
and/or
Injection
KNIME GUI/WORK BENCH
KNIME GUI/WORK BENCH
A node is the smallest programming unit in KNIME
Each node serves a dedicated task.
After being created, a node needs settings to exec
ute the task, this phase is called configuration.
After configuration, a node needs to be executed
to actually carry out the assigned task.
01
02
03
04
Node Status and Operations
Node Status and Operations
• A node can have 3 states:
Idle: The node is not yet configured and cannot be executed
with its current settings.
Configured: The node has been set up correctly, and may be
executed at any time
Executed: The node has been successfully executed. Results
may be viewed and used in downstream nodes.
Node Status and Operations
Input Output
Status
Partitioning
Not Configured
Idle
Executed
Error
Workflow
Workflow
Workflow
KNIME WORKFLOW
• KNIME provides huge repository of
modules for easy-to-use and for
modular:
KNIME
Data
Preprocessing
Data fusion
Data
Transformation
DATABASE
MySQL,
any JDBC (Oracle, DB2,
MySQL Server).
FILES
Csv, txt, Excel, Word,
PDF,
Images, texts.
WEB,CLOUD
Web services
Twitter, Google
FILESDATABASE WEB, CLOUD
Data Access
KNIME ETL FEATURES
ETL
Logical joins
Support for REGEX style
replacements
Rule-based filtering and
transformation
Linear correlation and dependency measures
Many nodes also support statistical standards such as count,
sum, mean, etc.
“Statistics” node has base measures of distribution
KNIME STATISTICS
Data partitioning and multiple
folds
These are extended through partner
implementations and scripting
languages (R, Python, Weka, etc.)
Base KNIME supports most
machine learning algorithms
KNIME MACHINE LEARNING
KNIME REPORTING
• Generates reports in office document formats, PDF, and
HTML
• BIRT Tool as part of the Eclipse framework
• Native part of the KNIME workbench
• Extends data visualization capabilities
• Auto-distribute by email, or publish to websites
 Process Mapping
 Process Analysis
IDEAS
DATA AGGREGATION
• Combine data from different
sources, local or remote
• ETL data into a single repository for
querying/analytics
BUSINESS INTELLIGENCE
• Data intelligence and reporting over large
aggregated datasets
• Automated reusable workflows for
standardized reporting
PREDICTIVE ANALYTICS
• Ability for insight across very large
datasets
KNIME ANALYTICS
• Advantage of being a data agnostic
aggregator
• Ability to work through very large
datasets with little hardware
• Access to complex algorithms with
easy tools
DATA ANALYTICS USE CASES
KNIME ADVANTAGES
• KNIMEs core-architecture allows processing of large data volumes that are only limited by the
available hard disk space (not limited to the available RAM). E.g. KNIME allows analysis of 300
million customer addresses, 20 million cell images and 10 million molecular structures.
• Additional plugins allows the integration of methods for Text mining, Image mining, as well as
time series analysis.
• KNIME integrates various other open-source projects, e.g. machine learning algorithms from
Weka, the statistics package R project, ImageJ, and the Chemistry Development Kit .
• KNIME is implemented in Java but also allows for wrappers calling other code in addition to
providing nodes that allow to run Java, Python, Perl and other code fragments
Knime (Konstanz Information Miner)

More Related Content

Knime (Konstanz Information Miner)

  • 1. By: Kiran Buriro Assigned by: Sir Fida Chandio
  • 2. What is KNIME ? • KNIME Stands for Konstanz Information Miner. • Developed at University of Konstanz in Germany 2004-2006 and focused initially on pharmaceutical research. • The KNIME is an open source platform for analytical data modelling and processing. • KNIME allows users to visually create data flows (or pipelines) • Written in Java based on the Eclipse SDK platform . • Modular platform for building and executing workflows using predefined components, called nodes. • Core functionality available for tasks such as standard data mining, analysis and manipulation. • GUI based with scripting integration. • An especially powerful aspect of KNIME is its ability to integrate data from multiple sources • KNIME also offers extensions that allow it to interface with R, Python, Java, and SQL.
  • 3. KNIME DATA ANALYTICS LIFECYCLE READ DATA READ DATA READ DATA Extract, Transform, Load (ETL) Data Analytics or Predictive Analysis Reporting and/or Injection
  • 6. A node is the smallest programming unit in KNIME Each node serves a dedicated task. After being created, a node needs settings to exec ute the task, this phase is called configuration. After configuration, a node needs to be executed to actually carry out the assigned task. 01 02 03 04 Node Status and Operations
  • 7. Node Status and Operations • A node can have 3 states: Idle: The node is not yet configured and cannot be executed with its current settings. Configured: The node has been set up correctly, and may be executed at any time Executed: The node has been successfully executed. Results may be viewed and used in downstream nodes.
  • 8. Node Status and Operations Input Output Status Partitioning Not Configured Idle Executed Error
  • 12. KNIME WORKFLOW • KNIME provides huge repository of modules for easy-to-use and for modular: KNIME Data Preprocessing Data fusion Data Transformation
  • 13. DATABASE MySQL, any JDBC (Oracle, DB2, MySQL Server). FILES Csv, txt, Excel, Word, PDF, Images, texts. WEB,CLOUD Web services Twitter, Google FILESDATABASE WEB, CLOUD Data Access
  • 14. KNIME ETL FEATURES ETL Logical joins Support for REGEX style replacements Rule-based filtering and transformation
  • 15. Linear correlation and dependency measures Many nodes also support statistical standards such as count, sum, mean, etc. “Statistics” node has base measures of distribution KNIME STATISTICS
  • 16. Data partitioning and multiple folds These are extended through partner implementations and scripting languages (R, Python, Weka, etc.) Base KNIME supports most machine learning algorithms KNIME MACHINE LEARNING
  • 17. KNIME REPORTING • Generates reports in office document formats, PDF, and HTML • BIRT Tool as part of the Eclipse framework • Native part of the KNIME workbench • Extends data visualization capabilities • Auto-distribute by email, or publish to websites
  • 18.  Process Mapping  Process Analysis IDEAS DATA AGGREGATION • Combine data from different sources, local or remote • ETL data into a single repository for querying/analytics BUSINESS INTELLIGENCE • Data intelligence and reporting over large aggregated datasets • Automated reusable workflows for standardized reporting PREDICTIVE ANALYTICS • Ability for insight across very large datasets KNIME ANALYTICS • Advantage of being a data agnostic aggregator • Ability to work through very large datasets with little hardware • Access to complex algorithms with easy tools DATA ANALYTICS USE CASES
  • 19. KNIME ADVANTAGES • KNIMEs core-architecture allows processing of large data volumes that are only limited by the available hard disk space (not limited to the available RAM). E.g. KNIME allows analysis of 300 million customer addresses, 20 million cell images and 10 million molecular structures. • Additional plugins allows the integration of methods for Text mining, Image mining, as well as time series analysis. • KNIME integrates various other open-source projects, e.g. machine learning algorithms from Weka, the statistics package R project, ImageJ, and the Chemistry Development Kit . • KNIME is implemented in Java but also allows for wrappers calling other code in addition to providing nodes that allow to run Java, Python, Perl and other code fragments