SlideShare a Scribd company logo
Software Analytics
with Jupyter, Pandas,
jQAssistant and Neo4j
Identifying Problems in Software Development
with Data Analysis
Markus Harrer
@feststelltaste
Neo4j Online Meetup
23rd November 2017
Markus Harrer
Software Development Analyst
Key Activities
Java Development, Data Analysis in Software
Development
Areas of Interest
Clean Code, Agile, Software Archeology, Software
Revival, Epistemology, Cognitive Psychology
@feststelltaste feststelltaste.de meetup@markusharrer.de
About me
Agenda
1. Motivation
2. Sofware Analytics
3. My impl of Software Analytics
4. Examples & Demos
5. Summary
6. Q&A
Motivation
Everything wrong with Software Development

Recommended for you

Unveiling the knowledge in knowledge graphs
Unveiling the knowledge in knowledge graphsUnveiling the knowledge in knowledge graphs
Unveiling the knowledge in knowledge graphs

- Davide Mottin is an assistant professor in the Department of Computer Science at Aarhus University who researches graph mining. - His talk discusses unveiling knowledge in knowledge graphs through personalized summarization techniques. Knowledge graphs contain entities and relationships between them. - He describes an approach for generating personalized summaries of a knowledge graph based on a user's query history. The algorithm aims to find a subgraph that maximizes the probability of answering future queries, subject to a size limit.

neo4jgraph databaselife science
AI, ML and Graph Algorithms: Real Life Use Cases with Neo4j
AI, ML and Graph Algorithms: Real Life Use Cases with Neo4jAI, ML and Graph Algorithms: Real Life Use Cases with Neo4j
AI, ML and Graph Algorithms: Real Life Use Cases with Neo4j

I gave this presentation at DataOps 19 in Barcelona. You will find information about Neo4j and how to use it with Graph Algorithms for Machine Learning and Artificial Intelligence.

neo4jaimachine learning
Graph technology meetup slides
Graph technology meetup slidesGraph technology meetup slides
Graph technology meetup slides

This document discusses graphs and graph databases. It provides examples of graphs and compares SQL queries to Gremlin queries on graphs. It also discusses different types of graph databases for online transaction processing (OLTP) and online analytical processing (OLAP). The document then discusses how a social and data graph could help address the problem of data going dark in life sciences research by enabling collaboration, data sharing and discovery of relevant experts and data. It proposes using bi-clustering algorithms to identify relevant groups within the social and data graph to facilitate data and expert discovery.

Software Analytics with Jupyter, Pandas, jQAssistant, and Neo4j [Neo4j Online Meetup]
Software Analytics with Jupyter, Pandas, jQAssistant, and Neo4j [Neo4j Online Meetup]
Meanwhile in the pub…
Symptom Fixing

Recommended for you

Neo4j GraphTalk Helsinki - Introduction and Graph Use Cases
Neo4j GraphTalk Helsinki - Introduction and Graph Use CasesNeo4j GraphTalk Helsinki - Introduction and Graph Use Cases
Neo4j GraphTalk Helsinki - Introduction and Graph Use Cases

This document provides an introduction to graphs and Neo4j. It discusses that Neo4j is a native graph database that allows organizations to leverage connections in data in real-time to create value. It then provides information on Neo4j as a company and as a product, including that it is the world's leading graph database. The document goes on to define what graphs are from a data structure perspective and provides examples of famous graphs like social networks. It discusses why graph databases are useful compared to relational databases for representing complex, connected data and provides examples of use cases for Neo4j like recommendations, fraud detection, and network analysis.

neo4jgraphtalknosql
The Total Economic ImpactTM (TEI) of Neo4j, Featuring Forrester
The Total Economic ImpactTM (TEI) of Neo4j, Featuring ForresterThe Total Economic ImpactTM (TEI) of Neo4j, Featuring Forrester
The Total Economic ImpactTM (TEI) of Neo4j, Featuring Forrester

Noel Yuhanna, VP, Principal Analyst, Forrester Mary Barton, Consultant, Forrester Blaise James, Analyst Relations, Neo4j

Introduction to Neo4j
Introduction to Neo4jIntroduction to Neo4j
Introduction to Neo4j

Data is both our most valuable asset and our biggest ongoing challenge. As data grows in volume, variety and complexity, across applications, clouds and siloed systems, traditional ways of working with data no longer work. Unlike traditional databases, which arrange data in rows, columns and tables, Neo4j has a flexible structure defined by stored relationships between data records. We'll discuss the primary use cases for graph databases Explore the properties of Neo4j that make those use cases possible Look into the visualisation of graphs Introduce how to write queries. Webinar, 23 July 2020

neo4jconnected datagraph database
Lack of
Communication$
Politics
Software Analytics with Jupyter, Pandas, jQAssistant, and Neo4j [Neo4j Online Meetup]
Why is software development
still so crazy?

Recommended for you

GraphTour - Neo4j Platform Overview
GraphTour - Neo4j Platform OverviewGraphTour - Neo4j Platform Overview
GraphTour - Neo4j Platform Overview

This document provides an overview of the Neo4j Graph Platform vision, including existing and upcoming products. It discusses Neo4j's long-term vision of being a graph platform beyond just a database, including tools for development and administration, analytics, and integrations. It also highlights some key existing products like the Neo4j browser and algorithms library, as well as upcoming capabilities like analytics integrations and better visibility of partner software.

graphtourgraphtourmadridneo4j
Graph Data Science at Scale
Graph Data Science at ScaleGraph Data Science at Scale
Graph Data Science at Scale

The document discusses graph data science and Neo4j's Graph Data Science (GDS) framework. GDS allows running graph algorithms and machine learning models at scale on large graph datasets. It discusses key aspects of GDS including architecture, data import, algorithm selection, and case studies of customers using GDS on graphs with billions of nodes and relationships. GDS runs on dedicated instances and supports features like enterprise graph compression, unlimited parallelization, and named graphs to optimize performance on large datasets.

Neo4j Graph Use Cases, Bruno Ungermann, Neo4j
Neo4j Graph Use Cases, Bruno Ungermann, Neo4jNeo4j Graph Use Cases, Bruno Ungermann, Neo4j
Neo4j Graph Use Cases, Bruno Ungermann, Neo4j

Neo4j GraphTour Europe 2019: Neo4j Graph Use Cases, Bruno Ungermann, Sales Director Germany & Switzerland, Neo4j

neo4jgraph databaseuse-case
WALL OF IGNORANCE
Janelle Klein: IDEAFLOW - How to Measure the PAIN in Software Development. Leanpub
WALL OF IGNORANCE
RISK
VISIBILITY
Janelle Klein: IDEAFLOW - How to Measure the PAIN in Software Development. Leanpub
RISK
DATA ANALYSIS
VISIBILITY
My wife
RISK
DATA ANALYSIS
VISIBILITY
Me

Recommended for you

Graph Data Science: The Secret to Accelerating Innovation with AI/ML
Graph Data Science: The Secret to Accelerating Innovation with AI/MLGraph Data Science: The Secret to Accelerating Innovation with AI/ML
Graph Data Science: The Secret to Accelerating Innovation with AI/ML

The document discusses how graph data science can accelerate AI and machine learning by leveraging relationships between data, which traditional approaches often ignore. It describes Neo4j's graph database and graph data science platform that allows users to perform queries, machine learning, and visualization on graph data to gain insights. Neo4j's graph data science library provides algorithms, embeddings, and in-graph machine learning models to make predictions that incorporate a graph's structural relationships.

Graph Databases and Graph Data Science in Neo4j
Graph Databases and Graph Data Science in Neo4jGraph Databases and Graph Data Science in Neo4j
Graph Databases and Graph Data Science in Neo4j

The document discusses graph databases, Neo4j graph database software, and graph data science algorithms. It provides an overview of graph databases and their components like nodes, edges, and properties. It then describes Neo4j's features including querying, visualization, hosting options, and the Graph Data Science library. Finally, it explains different types of graph data science algorithms in Neo4j like centrality, similarity, and pathfinding algorithms and provides an example of each.

other social networks
Making connections matter: 2 use cases on graphs & analytics solutions
Making connections matter: 2 use cases on graphs & analytics solutionsMaking connections matter: 2 use cases on graphs & analytics solutions
Making connections matter: 2 use cases on graphs & analytics solutions

The document discusses two use cases for graph technologies and analytics solutions: (1) bill of material and data quality control, and (2) online shopping assistant. For the first use case, a graph database is used to model bill of materials data and rules to detect inconsistencies and prioritize data cleansing. For the second use case, a conversational shopping assistant provides real-time product recommendations using embedded expert knowledge and customer feedback. Both use cases leverage the connections in data through graph technologies to provide faster insights, improved data management and more relevant recommendations.

graph databaseknowledge graphs
Software Analytics
Sober Problem Solving with Data Analysis based on Software Data
Software Analytics is...
“... analytics on software data
for managers and software engineers
with the aim of empowering software
development individuals and teams
to gain and share insight from their data
to make better decisions.”
Tim Menzies, Thomas Zimmermann: Software Analytics - So What?. IEEE Software Magazine
Frequency
Questions
Use standard tools
for everyday‘s questions
Use Software Analytics to
tackle high-risk problems
Risk/Value
Right Insights for better Decisions
Adopted from Tim Menzies, Thomas Zimmermann: Software Analytics - So What?. IEEE Software Magazine
Types of Software Data
Communitychrono-
logical
Runtimestatic
=> Problems are interconnected, so should be the data sources!

Recommended for you

5. Building the Cancer Research Data Commons with Neo4j: The Bento Framework
5. Building the Cancer Research Data Commons with Neo4j: The Bento Framework5. Building the Cancer Research Data Commons with Neo4j: The Bento Framework
5. Building the Cancer Research Data Commons with Neo4j: The Bento Framework

Mark Jensen, Director, Data Management and Interoperability, Frederick National Laboratory for Cancer Research Todd Pihl, Director, Technical Project Manager, Frederick National Laboratory for Cancer Research Ming Ying, Senior Software Engineer, Frederick National Laboratory for Cancer Research

Neo4j im Einsatz gegen Geldwäsche und Finanzbetrug
Neo4j im Einsatz gegen Geldwäsche und FinanzbetrugNeo4j im Einsatz gegen Geldwäsche und Finanzbetrug
Neo4j im Einsatz gegen Geldwäsche und Finanzbetrug

The document discusses how Neo4j can be used to combat money laundering and financial fraud. It introduces the presenters and provides an agenda for the seminar. Additionally, it outlines Neo4j's capabilities for connecting disparate data sources and exposing related information to support enhanced decision making, fraud prevention, and compliance. Neo4j allows users to explore network and transactional data across multiple "anchor points" to discover relationships and patterns that may indicate money laundering or fraud.

neo4jgraph databasefraud
Intro to Neo4j and Graph Databases
Intro to Neo4j and Graph DatabasesIntro to Neo4j and Graph Databases
Intro to Neo4j and Graph Databases

These webinar slides are an introduction to Neo4j and Graph Databases. They discuss the primary use cases for Graph Databases and the properties of Neo4j which make those use cases possible. They also cover the high-level steps of modeling, importing, and querying your data using Cypher and touch on RDBMS to Graph.

Tackling problems –
automated,
data-driven and
reproducible.
MyGuideline
Software Analytics
= Data Science on Software Data
Why does it work now?
• Domain-Driven Design brings business language into code
• Data Science enables problem analysis for developers
• New Tools can create high-level concepts
Code Problems
Business Language
abstract
detailed
Problems can be connected to concepts in business terms!
My impl of Software Analytics
How can Developers use the Power of Data Analysis in their Daily Work?
What can you do today?
• Visualize developer contributions over time
• Identify unused, error-prone or abandoned code
• Create a code and problem inventory for legacy systems
• Find performance bottlenecks by analyzing call trees
• Visualize unwanted dependencies between modules
Make specific problems in your software system visible!
e. g. Race Conditions, Architecture Smells, Build Breaker, Programming Errors

Recommended for you

Neo4j Aura on AWS: The Customer Choice for Graph Databases
Neo4j Aura on AWS: The Customer Choice for Graph DatabasesNeo4j Aura on AWS: The Customer Choice for Graph Databases
Neo4j Aura on AWS: The Customer Choice for Graph Databases

Neo4j, the leading enterprise graph platform, is now globally available on Amazon Web Services (AWS) as a fully managed, always-on database service. Neo4j Aura Enterprise on AWS empowers organizations to rapidly build mission-critical, intelligent cloud-based applications backed by the performance, scale, security, and reliability that only the most deployed and most trusted graph technology can provide. Customers like Levi Strauss & Co., Sainsbury’s, Siemens, The Orchard and Tourism Media are already using Aura Enterprise on AWS for fraud detection, regulatory compliance, recommendation engines, supply chain analysis, and much more. Join us for this exclusive digital event to learn more about Neo4j Aura Enterprise on AWS: - Understand the state of the data and analytics market and how investing in Neo4j and AWS fits in the big picture - Get insights into how Siemens and Tourism Media are unlocking the power of graph databases on AWS during a panel discussion - Discover how to build modern graph applications with Neo4j on AWS through a step-by-step presentation and demo

Graphs for Finance - AML with Neo4j Graph Data Science
Graphs for Finance - AML with Neo4j Graph Data Science Graphs for Finance - AML with Neo4j Graph Data Science
Graphs for Finance - AML with Neo4j Graph Data Science

This document discusses using graph data science and graph algorithms to detect fraud. It explains that graph data science uses relationships in data to power predictions. It provides examples of how graph algorithms like Louvain clustering, PageRank, connected components, and Jaccard similarity can be used to identify communities that frequently interact, measure influence, identify accounts sharing identifiers, and measure account similarity to detect fraud in applications like banking and financial services. The document also discusses using graph embeddings and feature engineering with graph networks to improve machine learning models for fraud detection by basing predictions on influential entities and their relationships.

neo4jfraud detectionfraud
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data Science

A changing market landscape and open source innovations are having a dramatic impact on the consumability and ease of use of data science tools. Join this session to learn about the impact these trends and changes will have on the future of data science. If you are a data scientist, or if your organization relies on cutting edge analytics, you won't want to miss this!

hadoopdataworks summit 2017dws17
Choose known tools
or tools for plan B*
Python
Neo4j, Pandas, Spark
* want to learn / profit from in near future
on a suitable platform.Jupyter, Zeppelin
=> Tools shouldn‘t stand in the way!
Notebookan open dialog with data
Context
Idea
Analysis
Conclusion
Problem
Context documented
Ideas, assumptions and
heuristics communicated
Preprocessing justified
Calculations understandable
Summaries conclusive
Everything automated
Notebook-Driven Data Analysis
Python
Data Scientist's Best Friend: Easy, effective, fast programming
language
Pandas
Pragmatic Data Analysis Framework: Great data structures &
integrations with machine learning libraries
D3
Visualization Library for Data-Driven Document: Just beautiful,
interactive graphics!
Jupyter
Interactive Notebook: Central hub for data analysis and
documentation
Basic Tooling

Recommended for you

Turbocharge your data science with python and r
Turbocharge your data science with python and rTurbocharge your data science with python and r
Turbocharge your data science with python and r

This document summarizes Kelli-Jean Chun's presentation on using Python and R for data science. It discusses data science roles, provides an overview of Python and R, compares them for different use cases, and outlines a plan to predict whether NYC dogs are spayed/neutered using both languages. R will be used for exploratory data analysis and visualization, while Python with Scikit-learn, Pandas and NumPy will be used to build and evaluate a predictive model. The languages will be connected using rpy2 to load data from R into Python and reticulate to run Python code in RMarkdown.

data sciencepythonr
PriyankaDighe_Resume_new
PriyankaDighe_Resume_newPriyankaDighe_Resume_new
PriyankaDighe_Resume_new

Priyanka Dighe received her M.S. in Computer Science and Engineering from UC San Diego in 2017 and B.E. in Computer Science from BITS Pilani in 2013. She has work experience as a Software Engineer at Microsoft and Bloomreach, developing applications for Word and implementing alerting services. She completed internships at HP Labs and Bloomreach focusing on predictive analytics using social media and implementing alerting pipelines. Her skills include programming in Java, C, Python, and technologies like Spark, Hadoop, and Play Framework.

Advanced Python Skills for Data Scientists
Advanced Python Skills for Data ScientistsAdvanced Python Skills for Data Scientists
Advanced Python Skills for Data Scientists

How data scientists can master advanced Python skills. It is a great way to boost and solidify your career.

python
Advanced Tooling: jQAssistant & Neo4j
+ =
scan document validate
https://jqassistant.org/
Advanced Tooling: jQAssistant & Neo4j
Main Ideas
• Scan software structures
• Store data in Neo4j database
• Execute queries
• Examine relationships
• Add high-level concepts
• Validate rules via constraints
• Generate reports
jQAssistant – Use Cases
Living,
self-validating
architecture
documentation
jQAssistant – Use Cases
Java Class
Business‘ Subdomain
Living,
self-validating
architecture
documentation
+
Find design &
code smells
+
Add business
perspectives

Recommended for you

Data science presentation
Data science presentationData science presentation
Data science presentation

This document provides an overview of getting started with data science using Python. It discusses what data science is, why it is in high demand, and the typical skills and backgrounds of data scientists. It then covers popular Python libraries for data science like NumPy, Pandas, Scikit-Learn, TensorFlow, and Keras. Common data science steps are outlined including data gathering, preparation, exploration, model building, validation, and deployment. Example applications and case studies are discussed along with resources for learning including podcasts, websites, communities, books, and TV shows.

The Semantic Knowledge Graph
The Semantic Knowledge GraphThe Semantic Knowledge Graph
The Semantic Knowledge Graph

Presentation of the Semantic Knowledge Graph research paper at the 2016 IEEE 3rd International Conference on Data Science and Advanced Analytics (Montreal, Canada - October 18th, 2016) Abstract—This paper describes a new kind of knowledge representation and mining system which we are calling the Semantic Knowledge Graph. At its heart, the Semantic Knowledge Graph leverages an inverted index, along with a complementary uninverted index, to represent nodes (terms) and edges (the documents within intersecting postings lists for multiple terms/nodes). This provides a layer of indirection between each pair of nodes and their corresponding edge, enabling edges to materialize dynamically from underlying corpus statistics. As a result, any combination of nodes can have edges to any other nodes materialize and be scored to reveal latent relationships between the nodes. This provides numerous benefits: the knowledge graph can be built automatically from a real-world corpus of data, new nodes - along with their combined edges - can be instantly materialized from any arbitrary combination of preexisting nodes (using set operations), and a full model of the semantic relationships between all entities within a domain can be represented and dynamically traversed using a highly compact representation of the graph. Such a system has widespread applications in areas as diverse as knowledge modeling and reasoning, natural language processing, anomaly detection, data cleansing, semantic search, analytics, data classification, root cause analysis, and recommendations systems. The main contribution of this paper is the introduction of a novel system - the Semantic Knowledge Graph - which is able to dynamically discover and score interesting relationships between any arbitrary combination of entities (words, phrases, or extracted concepts) through dynamically materializing nodes and edges from a compact graphical representation built automatically from a corpus of data representative of a knowledge domain.

semantic knowledge graphgraph traversalsemantic search
Basic of python for data analysis
Basic of python for data analysisBasic of python for data analysis
Basic of python for data analysis

Python is the choice llanguage for data analysis, The aim of this slide is to provide a comprehensive learning path to people new to python for data analysis. This path provides a comprehensive overview of the steps you need to learn to use Python for data analysis.

pythonpandasdata analysis
Neo4j Schema for Software Data
Node Labels
File
Class
Method
Commit
Relationship Types
CONTAINS
DEPENDS_ON
INVOKES
CONTAINS_CHANGE
Properties
name
fqn
signature
message
File Java
key value
name “Pet”
fileName “Pet.java”
fqn “foo.bar.Pet”
TypeFile
Cypher Query
Example
Spring PetClinic
“Give me all database objects”
MATCH
(t:Type)-[:ANNOTATED_BY]->()-[:OF_TYPE]->(a:Type)
WHERE
a.fqn="javax.persistence.Entity"
RETURN t AS JpaEntity
Toolchain
Python, Jupyter
XML/Graph
Tables
Text
Data
Pandas
jQAssistant
Input
Pandas,
Neo4j
Analysis
matplotlib
xlsx
E
pptx
P
Output
D3
Examples
The complete Toolchain in Action

Recommended for you

Dc python meetup
Dc python meetupDc python meetup
Dc python meetup

Pandas is an open source Python library that provides high-performance data structures and data analysis tools. It allows users to work with structured and unstructured data, clean and manipulate data sets, and perform complex analyses. The presentation will provide an overview of Pandas functionality, demonstrate how to download and install it, and showcase examples of using Pandas to clean and analyze financial data sets. There will be time for Q&A at the end.

Keynote at-icpc-2020
Keynote at-icpc-2020Keynote at-icpc-2020
Keynote at-icpc-2020

Developer workflow analysis and ownership management present comprehension challenges for software ecosystems and global software engineering. Dark matter exists because tools are not fully integrated, logging is not designed for analysis, and developer workflow is unstructured. Probabilistic models using machine learning and heuristics can help associate activities with work items to address this. Ownership management challenges include ownership decay, asset subclassing, team-level ownership, and providing explainable recommendations.

icpcicsefacebook
Data-Oriented Programming: making data a first-class citizen
Data-Oriented Programming: making data a first-class citizenData-Oriented Programming: making data a first-class citizen
Data-Oriented Programming: making data a first-class citizen

Eliminate the unavoidable complexity of object-oriented designs. Using the persistent data structures built into most modern programming languages, Data-oriented programming cleanly separates code and data, which simplifies state management and eases concurrency. Data-Oriented Programming teaches you to design applications using the data-oriented paradigm. These powerful new ideas are presented through conversations, code snippets, diagrams, and even songs to help you quickly grok what’s great about DOP. You’ll learn to write DOP code that can be implemented in languages like JavaScript, Ruby, Python, Clojure and also in traditional OO languages like Java or C#. Learn more about the book here: http://mng.bz/XdKl

dataclojuredata structures
Example JaCoCo  Pandas  D3
Production Coverage
1. Measure code coverage in
production
2. Calculate ratio of covered
lines to all lines
3. Visualize “usage hotspots”
with hierarchical bubble chart
https://www.feststelltaste.de/visualizing-production-coverage-with-jacoco-pandas-and-d3/
Example Git  Pandas  D3
Knowledge Island*
1. Take Git log with numstats
2. Calculate proportional
contributions for each
source code file per author
3. Visualize “ownership” with
hierarchical bubble chart
* heavily inspired by Adam Tornhillhttps://www.feststelltaste.de/knowledge-islands/
Example jQAssistant  Neo4j  Pandas  D3
Dependency Analysis between Bounded Contexts
https://www.feststelltaste.de/a-graphical-approach-towards-bounded-contexts/
Example jQAssistant  Neo4j  Pandas  D3
Dependency Analysis between Bounded Contexts
MATCH
(s1:Subdomain)<-[:BELONGS_TO]-
(type:Type)-[r:DEPENDS_ON*0..1]->
(dependency:Type)-[:BELONGS_TO]->(s2:Subdomain)
RETURN s1.name as type, s2.name as dep, COUNT(r) as number
https://www.feststelltaste.de/a-graphical-approach-towards-bounded-contexts/
Subdomains => Bounded Contexts that have meaning to business!

Recommended for you

Agile data science
Agile data scienceAgile data science
Agile data science

Agile Data Science is a lean methodology that is adopted from Agile Software Development. At the core it centers around people, interactions, and building minimally viable products to ship fast and often to solicit customer feedback. In this presentation, I describe how this work was done in the past with examples. Get started today with our help by visiting http://www.alpinenow.com

agilebig datadata science
PPT5: Neuron Introduction
PPT5: Neuron IntroductionPPT5: Neuron Introduction
PPT5: Neuron Introduction

Neuron is a server-less Deep Learning and AI experiment platform for analytics where you can build, deploy and visualise the data models. Practical lab on cloud access from anywhere.

deep learningmachine learningdata science
BDS14 Big Data Analytics to the masses
BDS14 Big Data Analytics to the massesBDS14 Big Data Analytics to the masses
BDS14 Big Data Analytics to the masses

Slides from my talk at Big Data Spain 2014 in Madrid. In this talk, we will discuss our approach to bring large scale deep analytics to the masses. R is an extremely popular numerical computer environment, but scientific data processing frequently hits its memory limits. On the other hand, system to execute data intensive tasks like Hadoop or Stratosphere are not popular among R users because writing programs using these paradigms is cumbersome. We present an innovative approach to overcome these limitations using the Stratosphere/Apache Flink big data platform by means of a R package and ready-to-use distributed algorithm. This solution allows the user, with small modifications in the R code, to easily execute distributed scenarios using popular machine learning techniques. We will cover the implementation details of the proposed solution including the architecture of the system, the functionality implemented and working examples. In addition, we will cover what are the differences between our approach and other solutions that integrate R with Hadoop or other large-scale analytics systems. Finally, the results of the performance tests show that this solution is competitive with the already existing R implementations for small amounts of data and able to scale-up to gigabyte level.

rapache flinkbig data
Example JProfiler  jQAssistant  Neo4j  Pandas
Mining performance hotspots
1. Record Call Trees
2. Identify which parts of
the application code
is responsible for most
of the DB operations
3. Trace problems back
to the root causes
https://www.feststelltaste.de/mining-performance-hotspots-with-jprofiler-jqassistant-neo4j-and-pandas-part-1-the-call-graph/
Requests
Incoming
Outgoing
SQL Calls
Example jQAssistant  Neo4j  Pandas
Recursive Method Calls
MATCH (m:Method)-[:INVOKES*]->(m)
RETURN m
Example jQAssistant  Neo4j  Pandas
Recursive Method Calls to Database
MATCH (m:Method)-[:INVOKES*]->(m)
-[:INVOKES]->(dbMethod:Method)
<-[:DECLARES]-(dbClass:Class)
WHERE dbClass.name = "Database"
RETURN m, dbMethod, dbClass
Example jQAssistant  Neo4j  Pandas
Identify possible Race Conditions
public class OwnerController {
...
private static int ownersIndexes;
MATCH
(c:Class)-[:DECLARES]->(f:Field)<-[w:WRITES]-(m:Method)
WHERE
EXISTS(f.static) AND NOT EXISTS(f.final)
RETURN c.name, f.name, w.lineNumber, m.name
static = same field for
all instances of that class

Recommended for you

PyCon Colombia 2020 Python for Data Analysis: Past, Present, and Future
PyCon Colombia 2020 Python for Data Analysis: Past, Present, and Future PyCon Colombia 2020 Python for Data Analysis: Past, Present, and Future
PyCon Colombia 2020 Python for Data Analysis: Past, Present, and Future

Wes McKinney gave a presentation on the past, present, and future of Python for data analysis. He discussed the origins and development of pandas over the past 12 years from the first open source release in 2009 to the current state. Key points included pandas receiving its first formal funding in 2019, its large community of contributors, and factors driving Python's growth for data science like its package ecosystem and education. McKinney also addressed early concerns about Python and looked to the future, highlighting projects like Apache Arrow that aim to improve performance and interoperability.

pythondata analysispandas
Neurodb Engr245 2021 Lessons Learned
Neurodb Engr245 2021 Lessons LearnedNeurodb Engr245 2021 Lessons Learned
Neurodb Engr245 2021 Lessons Learned

business model, business model canvas, mission model, mission model canvas, customer development, lean launchpad, lean startup, stanford, startup, steve blank, entrepreneurship, I-Corps, Stanford

business modelbusiness model canvasmission model
Python
PythonPython
Python

Learn Data Science with Python course for B.TECH, BCA, MCA, BSC, MSC, B.COM, and statistical students. Data Science with python online training course with certified industry experts. Get a 100 % pre-placement guarantee.

data science with python training in punedata science course in punebest tableau course in pune
Summary
Summary
• Tooling for data analysis in software development is here!
• First analyses are easy to do using tools you already know
• Specific in-depth analysis are powerful and worthwhile
• Connection between business and developers is possible!
• Problems can be attached to code that is business-related
• Making the impact of risk-taking visible is a must-have to improve!
• Jupyter/Pandas & jQAssistant/Neo4j are my favorites
• Provide many ways for identifying problems
• Help to figure out solutions as well!
Links
Markus Harrer
• Blog: https://feststelltaste.de
• Twitter: https://twitter.com/feststelltaste
• SlideShare: https://www.slideshare.net/feststelltaste
• Consulting: http://markusharrer.de
jQAssistant/Neo4j
• Demos: https://jqassistant.org/get-started/
• Guide: http://buschmais.github.io/jqassistant/doc/1.3.0/
• Talk by Dirk Mahler: https://vimeo.com/170797227
Q&A
Questions and Answers

Recommended for you

VanyaSehgal_Resume
VanyaSehgal_ResumeVanyaSehgal_Resume
VanyaSehgal_Resume

Vanya Sehgal is seeking a full-time position as a software developer. She has a Master's degree in Computer Science from Rochester Institute of Technology with a 3.75 GPA and relevant coursework including Java, C++, data management, and Android development. She has work experience as a software engineer intern at Intuit and Amazon where she worked on web and mobile applications, gained AWS experience, and performed testing. Her technical skills include Java, C++, SQL, Linux, and Android development and she has completed projects involving distributed systems, security evaluation, and mobile applications.

How Data Virtualization Adds Value to Your Data Science Stack
How Data Virtualization Adds Value to Your Data Science StackHow Data Virtualization Adds Value to Your Data Science Stack
How Data Virtualization Adds Value to Your Data Science Stack

Watch here: https://bit.ly/3cZGCxr For their machine learning and data science projects to be successful, data scientists need access to all of the enterprise data delivered through their myriad of data models. However, gaining access to all data, integrated into a central repository has been a challenge. Often 80% of the project time is spent on these tasks. But, a virtual layer can help the data scientist speed up some of the most tedious tasks, like data exploration and analysis. At the same time, it also integrates well with the data science ecosystem. There is no need to change tools and learn new languages. The data virtualization platform helps data scientists offload these data integration tasks, allowing them to focus on advanced analytics. In this session, you will learn how data virtualization: - Provides all of the enterprise data, in real-time, and without replication - Enables data scientists to create and share multiple logical models using simple drag and drop - Provides a catalog of all business definitions, lineage, and relationships

data analyticsdata virtualization
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - TrivadisTechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis

This document provides an overview of artificial intelligence trends and applications in development and operations. It discusses how AI is being used for rapid prototyping, intelligent programming assistants, automatic error handling and code refactoring, and strategic decision making. Examples are given of AI tools from Microsoft, Facebook, and Codota. The document also discusses challenges like interpretability of neural networks and outlines a vision of "Software 2.0" where programs are generated automatically to satisfy goals. It emphasizes that AI will transform software development over the next 10 years.

techeventartificial intelligencedevops

More Related Content

What's hot

Graphdatenbank Neo4j: Konzept, Positionierung, Status Region DACH - Bruno Un...
 Graphdatenbank Neo4j: Konzept, Positionierung, Status Region DACH - Bruno Un... Graphdatenbank Neo4j: Konzept, Positionierung, Status Region DACH - Bruno Un...
Graphdatenbank Neo4j: Konzept, Positionierung, Status Region DACH - Bruno Un...
Neo4j
 
How do You Graph
How do You GraphHow do You Graph
How do You Graph
Ben Krug
 
How Graph Technology is Changing AI
How Graph Technology is Changing AIHow Graph Technology is Changing AI
How Graph Technology is Changing AI
Databricks
 
Unveiling the knowledge in knowledge graphs
Unveiling the knowledge in knowledge graphsUnveiling the knowledge in knowledge graphs
Unveiling the knowledge in knowledge graphs
Neo4j
 
AI, ML and Graph Algorithms: Real Life Use Cases with Neo4j
AI, ML and Graph Algorithms: Real Life Use Cases with Neo4jAI, ML and Graph Algorithms: Real Life Use Cases with Neo4j
AI, ML and Graph Algorithms: Real Life Use Cases with Neo4j
Ivan Zoratti
 
Graph technology meetup slides
Graph technology meetup slidesGraph technology meetup slides
Graph technology meetup slides
Sean Mulvehill
 
Neo4j GraphTalk Helsinki - Introduction and Graph Use Cases
Neo4j GraphTalk Helsinki - Introduction and Graph Use CasesNeo4j GraphTalk Helsinki - Introduction and Graph Use Cases
Neo4j GraphTalk Helsinki - Introduction and Graph Use Cases
Neo4j
 
The Total Economic ImpactTM (TEI) of Neo4j, Featuring Forrester
The Total Economic ImpactTM (TEI) of Neo4j, Featuring ForresterThe Total Economic ImpactTM (TEI) of Neo4j, Featuring Forrester
The Total Economic ImpactTM (TEI) of Neo4j, Featuring Forrester
Neo4j
 
Introduction to Neo4j
Introduction to Neo4jIntroduction to Neo4j
Introduction to Neo4j
Neo4j
 
GraphTour - Neo4j Platform Overview
GraphTour - Neo4j Platform OverviewGraphTour - Neo4j Platform Overview
GraphTour - Neo4j Platform Overview
Neo4j
 
Graph Data Science at Scale
Graph Data Science at ScaleGraph Data Science at Scale
Graph Data Science at Scale
Neo4j
 
Neo4j Graph Use Cases, Bruno Ungermann, Neo4j
Neo4j Graph Use Cases, Bruno Ungermann, Neo4jNeo4j Graph Use Cases, Bruno Ungermann, Neo4j
Neo4j Graph Use Cases, Bruno Ungermann, Neo4j
Neo4j
 
Graph Data Science: The Secret to Accelerating Innovation with AI/ML
Graph Data Science: The Secret to Accelerating Innovation with AI/MLGraph Data Science: The Secret to Accelerating Innovation with AI/ML
Graph Data Science: The Secret to Accelerating Innovation with AI/ML
Neo4j
 
Graph Databases and Graph Data Science in Neo4j
Graph Databases and Graph Data Science in Neo4jGraph Databases and Graph Data Science in Neo4j
Graph Databases and Graph Data Science in Neo4j
ijtsrd
 
Making connections matter: 2 use cases on graphs & analytics solutions
Making connections matter: 2 use cases on graphs & analytics solutionsMaking connections matter: 2 use cases on graphs & analytics solutions
Making connections matter: 2 use cases on graphs & analytics solutions
Neo4j
 
5. Building the Cancer Research Data Commons with Neo4j: The Bento Framework
5. Building the Cancer Research Data Commons with Neo4j: The Bento Framework5. Building the Cancer Research Data Commons with Neo4j: The Bento Framework
5. Building the Cancer Research Data Commons with Neo4j: The Bento Framework
Neo4j
 
Neo4j im Einsatz gegen Geldwäsche und Finanzbetrug
Neo4j im Einsatz gegen Geldwäsche und FinanzbetrugNeo4j im Einsatz gegen Geldwäsche und Finanzbetrug
Neo4j im Einsatz gegen Geldwäsche und Finanzbetrug
Neo4j
 
Intro to Neo4j and Graph Databases
Intro to Neo4j and Graph DatabasesIntro to Neo4j and Graph Databases
Intro to Neo4j and Graph Databases
Neo4j
 
Neo4j Aura on AWS: The Customer Choice for Graph Databases
Neo4j Aura on AWS: The Customer Choice for Graph DatabasesNeo4j Aura on AWS: The Customer Choice for Graph Databases
Neo4j Aura on AWS: The Customer Choice for Graph Databases
Neo4j
 
Graphs for Finance - AML with Neo4j Graph Data Science
Graphs for Finance - AML with Neo4j Graph Data Science Graphs for Finance - AML with Neo4j Graph Data Science
Graphs for Finance - AML with Neo4j Graph Data Science
Neo4j
 

What's hot (20)

Graphdatenbank Neo4j: Konzept, Positionierung, Status Region DACH - Bruno Un...
 Graphdatenbank Neo4j: Konzept, Positionierung, Status Region DACH - Bruno Un... Graphdatenbank Neo4j: Konzept, Positionierung, Status Region DACH - Bruno Un...
Graphdatenbank Neo4j: Konzept, Positionierung, Status Region DACH - Bruno Un...
 
How do You Graph
How do You GraphHow do You Graph
How do You Graph
 
How Graph Technology is Changing AI
How Graph Technology is Changing AIHow Graph Technology is Changing AI
How Graph Technology is Changing AI
 
Unveiling the knowledge in knowledge graphs
Unveiling the knowledge in knowledge graphsUnveiling the knowledge in knowledge graphs
Unveiling the knowledge in knowledge graphs
 
AI, ML and Graph Algorithms: Real Life Use Cases with Neo4j
AI, ML and Graph Algorithms: Real Life Use Cases with Neo4jAI, ML and Graph Algorithms: Real Life Use Cases with Neo4j
AI, ML and Graph Algorithms: Real Life Use Cases with Neo4j
 
Graph technology meetup slides
Graph technology meetup slidesGraph technology meetup slides
Graph technology meetup slides
 
Neo4j GraphTalk Helsinki - Introduction and Graph Use Cases
Neo4j GraphTalk Helsinki - Introduction and Graph Use CasesNeo4j GraphTalk Helsinki - Introduction and Graph Use Cases
Neo4j GraphTalk Helsinki - Introduction and Graph Use Cases
 
The Total Economic ImpactTM (TEI) of Neo4j, Featuring Forrester
The Total Economic ImpactTM (TEI) of Neo4j, Featuring ForresterThe Total Economic ImpactTM (TEI) of Neo4j, Featuring Forrester
The Total Economic ImpactTM (TEI) of Neo4j, Featuring Forrester
 
Introduction to Neo4j
Introduction to Neo4jIntroduction to Neo4j
Introduction to Neo4j
 
GraphTour - Neo4j Platform Overview
GraphTour - Neo4j Platform OverviewGraphTour - Neo4j Platform Overview
GraphTour - Neo4j Platform Overview
 
Graph Data Science at Scale
Graph Data Science at ScaleGraph Data Science at Scale
Graph Data Science at Scale
 
Neo4j Graph Use Cases, Bruno Ungermann, Neo4j
Neo4j Graph Use Cases, Bruno Ungermann, Neo4jNeo4j Graph Use Cases, Bruno Ungermann, Neo4j
Neo4j Graph Use Cases, Bruno Ungermann, Neo4j
 
Graph Data Science: The Secret to Accelerating Innovation with AI/ML
Graph Data Science: The Secret to Accelerating Innovation with AI/MLGraph Data Science: The Secret to Accelerating Innovation with AI/ML
Graph Data Science: The Secret to Accelerating Innovation with AI/ML
 
Graph Databases and Graph Data Science in Neo4j
Graph Databases and Graph Data Science in Neo4jGraph Databases and Graph Data Science in Neo4j
Graph Databases and Graph Data Science in Neo4j
 
Making connections matter: 2 use cases on graphs & analytics solutions
Making connections matter: 2 use cases on graphs & analytics solutionsMaking connections matter: 2 use cases on graphs & analytics solutions
Making connections matter: 2 use cases on graphs & analytics solutions
 
5. Building the Cancer Research Data Commons with Neo4j: The Bento Framework
5. Building the Cancer Research Data Commons with Neo4j: The Bento Framework5. Building the Cancer Research Data Commons with Neo4j: The Bento Framework
5. Building the Cancer Research Data Commons with Neo4j: The Bento Framework
 
Neo4j im Einsatz gegen Geldwäsche und Finanzbetrug
Neo4j im Einsatz gegen Geldwäsche und FinanzbetrugNeo4j im Einsatz gegen Geldwäsche und Finanzbetrug
Neo4j im Einsatz gegen Geldwäsche und Finanzbetrug
 
Intro to Neo4j and Graph Databases
Intro to Neo4j and Graph DatabasesIntro to Neo4j and Graph Databases
Intro to Neo4j and Graph Databases
 
Neo4j Aura on AWS: The Customer Choice for Graph Databases
Neo4j Aura on AWS: The Customer Choice for Graph DatabasesNeo4j Aura on AWS: The Customer Choice for Graph Databases
Neo4j Aura on AWS: The Customer Choice for Graph Databases
 
Graphs for Finance - AML with Neo4j Graph Data Science
Graphs for Finance - AML with Neo4j Graph Data Science Graphs for Finance - AML with Neo4j Graph Data Science
Graphs for Finance - AML with Neo4j Graph Data Science
 

Similar to Software Analytics with Jupyter, Pandas, jQAssistant, and Neo4j [Neo4j Online Meetup]

The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data Science
DataWorks Summit
 
Turbocharge your data science with python and r
Turbocharge your data science with python and rTurbocharge your data science with python and r
Turbocharge your data science with python and r
Kelli-Jean Chun
 
PriyankaDighe_Resume_new
PriyankaDighe_Resume_newPriyankaDighe_Resume_new
PriyankaDighe_Resume_new
Priyanka Dighe
 
Advanced Python Skills for Data Scientists
Advanced Python Skills for Data ScientistsAdvanced Python Skills for Data Scientists
Advanced Python Skills for Data Scientists
Serhii Kushchenko
 
Data science presentation
Data science presentationData science presentation
Data science presentation
MSDEVMTL
 
The Semantic Knowledge Graph
The Semantic Knowledge GraphThe Semantic Knowledge Graph
The Semantic Knowledge Graph
Trey Grainger
 
Basic of python for data analysis
Basic of python for data analysisBasic of python for data analysis
Basic of python for data analysis
Pramod Toraskar
 
Dc python meetup
Dc python meetupDc python meetup
Dc python meetup
Jeffrey Clark
 
Keynote at-icpc-2020
Keynote at-icpc-2020Keynote at-icpc-2020
Keynote at-icpc-2020
Ralf Laemmel
 
Data-Oriented Programming: making data a first-class citizen
Data-Oriented Programming: making data a first-class citizenData-Oriented Programming: making data a first-class citizen
Data-Oriented Programming: making data a first-class citizen
Manning Publications
 
Agile data science
Agile data scienceAgile data science
Agile data science
Joel Horwitz
 
PPT5: Neuron Introduction
PPT5: Neuron IntroductionPPT5: Neuron Introduction
PPT5: Neuron Introduction
akira-ai
 
BDS14 Big Data Analytics to the masses
BDS14 Big Data Analytics to the massesBDS14 Big Data Analytics to the masses
BDS14 Big Data Analytics to the masses
Jose Luis Lopez Pino
 
PyCon Colombia 2020 Python for Data Analysis: Past, Present, and Future
PyCon Colombia 2020 Python for Data Analysis: Past, Present, and Future PyCon Colombia 2020 Python for Data Analysis: Past, Present, and Future
PyCon Colombia 2020 Python for Data Analysis: Past, Present, and Future
Wes McKinney
 
Neurodb Engr245 2021 Lessons Learned
Neurodb Engr245 2021 Lessons LearnedNeurodb Engr245 2021 Lessons Learned
Neurodb Engr245 2021 Lessons Learned
Stanford University
 
Python
PythonPython
VanyaSehgal_Resume
VanyaSehgal_ResumeVanyaSehgal_Resume
VanyaSehgal_Resume
VANYA SEHGAL
 
How Data Virtualization Adds Value to Your Data Science Stack
How Data Virtualization Adds Value to Your Data Science StackHow Data Virtualization Adds Value to Your Data Science Stack
How Data Virtualization Adds Value to Your Data Science Stack
Denodo
 
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - TrivadisTechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
Trivadis
 
Big Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache SparkBig Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache Spark
Kenny Bastani
 

Similar to Software Analytics with Jupyter, Pandas, jQAssistant, and Neo4j [Neo4j Online Meetup] (20)

The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data Science
 
Turbocharge your data science with python and r
Turbocharge your data science with python and rTurbocharge your data science with python and r
Turbocharge your data science with python and r
 
PriyankaDighe_Resume_new
PriyankaDighe_Resume_newPriyankaDighe_Resume_new
PriyankaDighe_Resume_new
 
Advanced Python Skills for Data Scientists
Advanced Python Skills for Data ScientistsAdvanced Python Skills for Data Scientists
Advanced Python Skills for Data Scientists
 
Data science presentation
Data science presentationData science presentation
Data science presentation
 
The Semantic Knowledge Graph
The Semantic Knowledge GraphThe Semantic Knowledge Graph
The Semantic Knowledge Graph
 
Basic of python for data analysis
Basic of python for data analysisBasic of python for data analysis
Basic of python for data analysis
 
Dc python meetup
Dc python meetupDc python meetup
Dc python meetup
 
Keynote at-icpc-2020
Keynote at-icpc-2020Keynote at-icpc-2020
Keynote at-icpc-2020
 
Data-Oriented Programming: making data a first-class citizen
Data-Oriented Programming: making data a first-class citizenData-Oriented Programming: making data a first-class citizen
Data-Oriented Programming: making data a first-class citizen
 
Agile data science
Agile data scienceAgile data science
Agile data science
 
PPT5: Neuron Introduction
PPT5: Neuron IntroductionPPT5: Neuron Introduction
PPT5: Neuron Introduction
 
BDS14 Big Data Analytics to the masses
BDS14 Big Data Analytics to the massesBDS14 Big Data Analytics to the masses
BDS14 Big Data Analytics to the masses
 
PyCon Colombia 2020 Python for Data Analysis: Past, Present, and Future
PyCon Colombia 2020 Python for Data Analysis: Past, Present, and Future PyCon Colombia 2020 Python for Data Analysis: Past, Present, and Future
PyCon Colombia 2020 Python for Data Analysis: Past, Present, and Future
 
Neurodb Engr245 2021 Lessons Learned
Neurodb Engr245 2021 Lessons LearnedNeurodb Engr245 2021 Lessons Learned
Neurodb Engr245 2021 Lessons Learned
 
Python
PythonPython
Python
 
VanyaSehgal_Resume
VanyaSehgal_ResumeVanyaSehgal_Resume
VanyaSehgal_Resume
 
How Data Virtualization Adds Value to Your Data Science Stack
How Data Virtualization Adds Value to Your Data Science StackHow Data Virtualization Adds Value to Your Data Science Stack
How Data Virtualization Adds Value to Your Data Science Stack
 
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - TrivadisTechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
 
Big Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache SparkBig Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache Spark
 

More from Markus Harrer

Datenanalysen in der Softwareentwicklung (IMPROVE Workshop Wien)
Datenanalysen in der Softwareentwicklung (IMPROVE Workshop Wien)Datenanalysen in der Softwareentwicklung (IMPROVE Workshop Wien)
Datenanalysen in der Softwareentwicklung (IMPROVE Workshop Wien)
Markus Harrer
 
Software Analytics - Datenanalysen in der Softwareentwicklung (BigDataMeetup)
Software Analytics - Datenanalysen in der Softwareentwicklung (BigDataMeetup)Software Analytics - Datenanalysen in der Softwareentwicklung (BigDataMeetup)
Software Analytics - Datenanalysen in der Softwareentwicklung (BigDataMeetup)
Markus Harrer
 
Datenanalysen in der Softwareentwicklung mit Software Analytics
Datenanalysen in der Softwareentwicklung mit Software AnalyticsDatenanalysen in der Softwareentwicklung mit Software Analytics
Datenanalysen in der Softwareentwicklung mit Software Analytics
Markus Harrer
 
Philosophy screws it all up (Pecha Kucha) [Java Forum Stuttgart 2017]
Philosophy screws it all up (Pecha Kucha) [Java Forum Stuttgart 2017]Philosophy screws it all up (Pecha Kucha) [Java Forum Stuttgart 2017]
Philosophy screws it all up (Pecha Kucha) [Java Forum Stuttgart 2017]
Markus Harrer
 
Architektur und Code im Einklang [JUG Nürnberg]
Architektur und Code im Einklang [JUG Nürnberg]Architektur und Code im Einklang [JUG Nürnberg]
Architektur und Code im Einklang [JUG Nürnberg]
Markus Harrer
 
Architektur und Code im Einklang [DeveloperCamp 2017]
Architektur und Code im Einklang [DeveloperCamp 2017]Architektur und Code im Einklang [DeveloperCamp 2017]
Architektur und Code im Einklang [DeveloperCamp 2017]
Markus Harrer
 
Nachvollziehbare, datengetriebene, automatisierte Analysen der Softwareentwic...
Nachvollziehbare, datengetriebene, automatisierte Analysen der Softwareentwic...Nachvollziehbare, datengetriebene, automatisierte Analysen der Softwareentwic...
Nachvollziehbare, datengetriebene, automatisierte Analysen der Softwareentwic...
Markus Harrer
 
Software Analytics for Pragmatists [DevOps Camp 2017]
Software Analytics for Pragmatists [DevOps Camp 2017]Software Analytics for Pragmatists [DevOps Camp 2017]
Software Analytics for Pragmatists [DevOps Camp 2017]
Markus Harrer
 
Einsatzmöglichkeiten der automatisierten Analyse von Artefakten und Metadaten...
Einsatzmöglichkeiten der automatisierten Analyse von Artefakten und Metadaten...Einsatzmöglichkeiten der automatisierten Analyse von Artefakten und Metadaten...
Einsatzmöglichkeiten der automatisierten Analyse von Artefakten und Metadaten...
Markus Harrer
 
Erkenntnistheoretische Beurteilung von Extreme Programming
Erkenntnistheoretische Beurteilung von Extreme ProgrammingErkenntnistheoretische Beurteilung von Extreme Programming
Erkenntnistheoretische Beurteilung von Extreme Programming
Markus Harrer
 
An interactive form-based mobile software system with a sample application in...
An interactive form-based mobile software system with a sample application in...An interactive form-based mobile software system with a sample application in...
An interactive form-based mobile software system with a sample application in...
Markus Harrer
 
Erkenntnistheoretische Beurteilung von Extreme Programming
Erkenntnistheoretische Beurteilung von Extreme ProgrammingErkenntnistheoretische Beurteilung von Extreme Programming
Erkenntnistheoretische Beurteilung von Extreme Programming
Markus Harrer
 

More from Markus Harrer (12)

Datenanalysen in der Softwareentwicklung (IMPROVE Workshop Wien)
Datenanalysen in der Softwareentwicklung (IMPROVE Workshop Wien)Datenanalysen in der Softwareentwicklung (IMPROVE Workshop Wien)
Datenanalysen in der Softwareentwicklung (IMPROVE Workshop Wien)
 
Software Analytics - Datenanalysen in der Softwareentwicklung (BigDataMeetup)
Software Analytics - Datenanalysen in der Softwareentwicklung (BigDataMeetup)Software Analytics - Datenanalysen in der Softwareentwicklung (BigDataMeetup)
Software Analytics - Datenanalysen in der Softwareentwicklung (BigDataMeetup)
 
Datenanalysen in der Softwareentwicklung mit Software Analytics
Datenanalysen in der Softwareentwicklung mit Software AnalyticsDatenanalysen in der Softwareentwicklung mit Software Analytics
Datenanalysen in der Softwareentwicklung mit Software Analytics
 
Philosophy screws it all up (Pecha Kucha) [Java Forum Stuttgart 2017]
Philosophy screws it all up (Pecha Kucha) [Java Forum Stuttgart 2017]Philosophy screws it all up (Pecha Kucha) [Java Forum Stuttgart 2017]
Philosophy screws it all up (Pecha Kucha) [Java Forum Stuttgart 2017]
 
Architektur und Code im Einklang [JUG Nürnberg]
Architektur und Code im Einklang [JUG Nürnberg]Architektur und Code im Einklang [JUG Nürnberg]
Architektur und Code im Einklang [JUG Nürnberg]
 
Architektur und Code im Einklang [DeveloperCamp 2017]
Architektur und Code im Einklang [DeveloperCamp 2017]Architektur und Code im Einklang [DeveloperCamp 2017]
Architektur und Code im Einklang [DeveloperCamp 2017]
 
Nachvollziehbare, datengetriebene, automatisierte Analysen der Softwareentwic...
Nachvollziehbare, datengetriebene, automatisierte Analysen der Softwareentwic...Nachvollziehbare, datengetriebene, automatisierte Analysen der Softwareentwic...
Nachvollziehbare, datengetriebene, automatisierte Analysen der Softwareentwic...
 
Software Analytics for Pragmatists [DevOps Camp 2017]
Software Analytics for Pragmatists [DevOps Camp 2017]Software Analytics for Pragmatists [DevOps Camp 2017]
Software Analytics for Pragmatists [DevOps Camp 2017]
 
Einsatzmöglichkeiten der automatisierten Analyse von Artefakten und Metadaten...
Einsatzmöglichkeiten der automatisierten Analyse von Artefakten und Metadaten...Einsatzmöglichkeiten der automatisierten Analyse von Artefakten und Metadaten...
Einsatzmöglichkeiten der automatisierten Analyse von Artefakten und Metadaten...
 
Erkenntnistheoretische Beurteilung von Extreme Programming
Erkenntnistheoretische Beurteilung von Extreme ProgrammingErkenntnistheoretische Beurteilung von Extreme Programming
Erkenntnistheoretische Beurteilung von Extreme Programming
 
An interactive form-based mobile software system with a sample application in...
An interactive form-based mobile software system with a sample application in...An interactive form-based mobile software system with a sample application in...
An interactive form-based mobile software system with a sample application in...
 
Erkenntnistheoretische Beurteilung von Extreme Programming
Erkenntnistheoretische Beurteilung von Extreme ProgrammingErkenntnistheoretische Beurteilung von Extreme Programming
Erkenntnistheoretische Beurteilung von Extreme Programming
 

Recently uploaded

What is OCR Technology and How to Extract Text from Any Image for Free
What is OCR Technology and How to Extract Text from Any Image for FreeWhat is OCR Technology and How to Extract Text from Any Image for Free
What is OCR Technology and How to Extract Text from Any Image for Free
TwisterTools
 
WEBINAR SLIDES: CCX for Cloud Service Providers
WEBINAR SLIDES: CCX for Cloud Service ProvidersWEBINAR SLIDES: CCX for Cloud Service Providers
WEBINAR SLIDES: CCX for Cloud Service Providers
Severalnines
 
Top 10 Tips To Get Google AdSense For Your Website
Top 10 Tips To Get Google AdSense For Your WebsiteTop 10 Tips To Get Google AdSense For Your Website
Top 10 Tips To Get Google AdSense For Your Website
e-Definers Technology
 
Intro to Amazon Web Services (AWS) and Gen AI
Intro to Amazon Web Services (AWS) and Gen AIIntro to Amazon Web Services (AWS) and Gen AI
Intro to Amazon Web Services (AWS) and Gen AI
Ortus Solutions, Corp
 
Independence Day Hasn’t Always Been a U.S. Holiday.pdf
Independence Day Hasn’t Always Been a U.S. Holiday.pdfIndependence Day Hasn’t Always Been a U.S. Holiday.pdf
Independence Day Hasn’t Always Been a U.S. Holiday.pdf
Livetecs LLC
 
active-directory-auditing-solution (2).pptx
active-directory-auditing-solution (2).pptxactive-directory-auditing-solution (2).pptx
active-directory-auditing-solution (2).pptx
sudsdeep
 
How we built TryBoxLang in under 48 hours
How we built TryBoxLang in under 48 hoursHow we built TryBoxLang in under 48 hours
How we built TryBoxLang in under 48 hours
Ortus Solutions, Corp
 
introduction of Ansys software and basic and advance knowledge of modelling s...
introduction of Ansys software and basic and advance knowledge of modelling s...introduction of Ansys software and basic and advance knowledge of modelling s...
introduction of Ansys software and basic and advance knowledge of modelling s...
sachin chaurasia
 
dachnug51 - Whats new in domino 14 .pdf
dachnug51 - Whats new in domino 14  .pdfdachnug51 - Whats new in domino 14  .pdf
dachnug51 - Whats new in domino 14 .pdf
DNUG e.V.
 
Splunk_Remote_Work_Insights_Overview.pptx
Splunk_Remote_Work_Insights_Overview.pptxSplunk_Remote_Work_Insights_Overview.pptx
Splunk_Remote_Work_Insights_Overview.pptx
sudsdeep
 
Migrate your Infrastructure to the AWS Cloud
Migrate your Infrastructure to the AWS CloudMigrate your Infrastructure to the AWS Cloud
Migrate your Infrastructure to the AWS Cloud
Ortus Solutions, Corp
 
CViewSurvey Digitech Pvt Ltd that works on a proven C.A.A.G. model.
CViewSurvey Digitech Pvt Ltd that  works on a proven C.A.A.G. model.CViewSurvey Digitech Pvt Ltd that  works on a proven C.A.A.G. model.
CViewSurvey Digitech Pvt Ltd that works on a proven C.A.A.G. model.
bhatinidhi2001
 
AWS Cloud Practitioner Essentials (Second Edition) (Arabic) Course Introducti...
AWS Cloud Practitioner Essentials (Second Edition) (Arabic) Course Introducti...AWS Cloud Practitioner Essentials (Second Edition) (Arabic) Course Introducti...
AWS Cloud Practitioner Essentials (Second Edition) (Arabic) Course Introducti...
karim wahed
 
NBFC Software: Optimize Your Non-Banking Financial Company
NBFC Software: Optimize Your Non-Banking Financial CompanyNBFC Software: Optimize Your Non-Banking Financial Company
NBFC Software: Optimize Your Non-Banking Financial Company
NBFC Softwares
 
ThaiPy meetup - Indexes and Django
ThaiPy meetup - Indexes and DjangoThaiPy meetup - Indexes and Django
ThaiPy meetup - Indexes and Django
akshesh doshi
 
Wired_2.0_Create_AmsterdamJUG_09072024.pptx
Wired_2.0_Create_AmsterdamJUG_09072024.pptxWired_2.0_Create_AmsterdamJUG_09072024.pptx
Wired_2.0_Create_AmsterdamJUG_09072024.pptx
SimonedeGijt
 
WhatsApp Tracker - Tracking WhatsApp to Boost Online Safety.pdf
WhatsApp Tracker -  Tracking WhatsApp to Boost Online Safety.pdfWhatsApp Tracker -  Tracking WhatsApp to Boost Online Safety.pdf
WhatsApp Tracker - Tracking WhatsApp to Boost Online Safety.pdf
onemonitarsoftware
 
ENISA Threat Landscape 2023 documentation
ENISA Threat Landscape 2023 documentationENISA Threat Landscape 2023 documentation
ENISA Threat Landscape 2023 documentation
sofiafernandezon
 
NYC 26-Jun-2024 Combined Presentations.pdf
NYC 26-Jun-2024 Combined Presentations.pdfNYC 26-Jun-2024 Combined Presentations.pdf
NYC 26-Jun-2024 Combined Presentations.pdf
AUGNYC
 
Seamless PostgreSQL to Snowflake Data Transfer in 8 Simple Steps
Seamless PostgreSQL to Snowflake Data Transfer in 8 Simple StepsSeamless PostgreSQL to Snowflake Data Transfer in 8 Simple Steps
Seamless PostgreSQL to Snowflake Data Transfer in 8 Simple Steps
Estuary Flow
 

Recently uploaded (20)

What is OCR Technology and How to Extract Text from Any Image for Free
What is OCR Technology and How to Extract Text from Any Image for FreeWhat is OCR Technology and How to Extract Text from Any Image for Free
What is OCR Technology and How to Extract Text from Any Image for Free
 
WEBINAR SLIDES: CCX for Cloud Service Providers
WEBINAR SLIDES: CCX for Cloud Service ProvidersWEBINAR SLIDES: CCX for Cloud Service Providers
WEBINAR SLIDES: CCX for Cloud Service Providers
 
Top 10 Tips To Get Google AdSense For Your Website
Top 10 Tips To Get Google AdSense For Your WebsiteTop 10 Tips To Get Google AdSense For Your Website
Top 10 Tips To Get Google AdSense For Your Website
 
Intro to Amazon Web Services (AWS) and Gen AI
Intro to Amazon Web Services (AWS) and Gen AIIntro to Amazon Web Services (AWS) and Gen AI
Intro to Amazon Web Services (AWS) and Gen AI
 
Independence Day Hasn’t Always Been a U.S. Holiday.pdf
Independence Day Hasn’t Always Been a U.S. Holiday.pdfIndependence Day Hasn’t Always Been a U.S. Holiday.pdf
Independence Day Hasn’t Always Been a U.S. Holiday.pdf
 
active-directory-auditing-solution (2).pptx
active-directory-auditing-solution (2).pptxactive-directory-auditing-solution (2).pptx
active-directory-auditing-solution (2).pptx
 
How we built TryBoxLang in under 48 hours
How we built TryBoxLang in under 48 hoursHow we built TryBoxLang in under 48 hours
How we built TryBoxLang in under 48 hours
 
introduction of Ansys software and basic and advance knowledge of modelling s...
introduction of Ansys software and basic and advance knowledge of modelling s...introduction of Ansys software and basic and advance knowledge of modelling s...
introduction of Ansys software and basic and advance knowledge of modelling s...
 
dachnug51 - Whats new in domino 14 .pdf
dachnug51 - Whats new in domino 14  .pdfdachnug51 - Whats new in domino 14  .pdf
dachnug51 - Whats new in domino 14 .pdf
 
Splunk_Remote_Work_Insights_Overview.pptx
Splunk_Remote_Work_Insights_Overview.pptxSplunk_Remote_Work_Insights_Overview.pptx
Splunk_Remote_Work_Insights_Overview.pptx
 
Migrate your Infrastructure to the AWS Cloud
Migrate your Infrastructure to the AWS CloudMigrate your Infrastructure to the AWS Cloud
Migrate your Infrastructure to the AWS Cloud
 
CViewSurvey Digitech Pvt Ltd that works on a proven C.A.A.G. model.
CViewSurvey Digitech Pvt Ltd that  works on a proven C.A.A.G. model.CViewSurvey Digitech Pvt Ltd that  works on a proven C.A.A.G. model.
CViewSurvey Digitech Pvt Ltd that works on a proven C.A.A.G. model.
 
AWS Cloud Practitioner Essentials (Second Edition) (Arabic) Course Introducti...
AWS Cloud Practitioner Essentials (Second Edition) (Arabic) Course Introducti...AWS Cloud Practitioner Essentials (Second Edition) (Arabic) Course Introducti...
AWS Cloud Practitioner Essentials (Second Edition) (Arabic) Course Introducti...
 
NBFC Software: Optimize Your Non-Banking Financial Company
NBFC Software: Optimize Your Non-Banking Financial CompanyNBFC Software: Optimize Your Non-Banking Financial Company
NBFC Software: Optimize Your Non-Banking Financial Company
 
ThaiPy meetup - Indexes and Django
ThaiPy meetup - Indexes and DjangoThaiPy meetup - Indexes and Django
ThaiPy meetup - Indexes and Django
 
Wired_2.0_Create_AmsterdamJUG_09072024.pptx
Wired_2.0_Create_AmsterdamJUG_09072024.pptxWired_2.0_Create_AmsterdamJUG_09072024.pptx
Wired_2.0_Create_AmsterdamJUG_09072024.pptx
 
WhatsApp Tracker - Tracking WhatsApp to Boost Online Safety.pdf
WhatsApp Tracker -  Tracking WhatsApp to Boost Online Safety.pdfWhatsApp Tracker -  Tracking WhatsApp to Boost Online Safety.pdf
WhatsApp Tracker - Tracking WhatsApp to Boost Online Safety.pdf
 
ENISA Threat Landscape 2023 documentation
ENISA Threat Landscape 2023 documentationENISA Threat Landscape 2023 documentation
ENISA Threat Landscape 2023 documentation
 
NYC 26-Jun-2024 Combined Presentations.pdf
NYC 26-Jun-2024 Combined Presentations.pdfNYC 26-Jun-2024 Combined Presentations.pdf
NYC 26-Jun-2024 Combined Presentations.pdf
 
Seamless PostgreSQL to Snowflake Data Transfer in 8 Simple Steps
Seamless PostgreSQL to Snowflake Data Transfer in 8 Simple StepsSeamless PostgreSQL to Snowflake Data Transfer in 8 Simple Steps
Seamless PostgreSQL to Snowflake Data Transfer in 8 Simple Steps
 

Software Analytics with Jupyter, Pandas, jQAssistant, and Neo4j [Neo4j Online Meetup]

  • 1. Software Analytics with Jupyter, Pandas, jQAssistant and Neo4j Identifying Problems in Software Development with Data Analysis Markus Harrer @feststelltaste Neo4j Online Meetup 23rd November 2017
  • 2. Markus Harrer Software Development Analyst Key Activities Java Development, Data Analysis in Software Development Areas of Interest Clean Code, Agile, Software Archeology, Software Revival, Epistemology, Cognitive Psychology @feststelltaste feststelltaste.de meetup@markusharrer.de About me
  • 3. Agenda 1. Motivation 2. Sofware Analytics 3. My impl of Software Analytics 4. Examples & Demos 5. Summary 6. Q&A
  • 4. Motivation Everything wrong with Software Development
  • 12. Why is software development still so crazy?
  • 13. WALL OF IGNORANCE Janelle Klein: IDEAFLOW - How to Measure the PAIN in Software Development. Leanpub
  • 14. WALL OF IGNORANCE RISK VISIBILITY Janelle Klein: IDEAFLOW - How to Measure the PAIN in Software Development. Leanpub
  • 17. Software Analytics Sober Problem Solving with Data Analysis based on Software Data
  • 18. Software Analytics is... “... analytics on software data for managers and software engineers with the aim of empowering software development individuals and teams to gain and share insight from their data to make better decisions.” Tim Menzies, Thomas Zimmermann: Software Analytics - So What?. IEEE Software Magazine
  • 19. Frequency Questions Use standard tools for everyday‘s questions Use Software Analytics to tackle high-risk problems Risk/Value Right Insights for better Decisions Adopted from Tim Menzies, Thomas Zimmermann: Software Analytics - So What?. IEEE Software Magazine
  • 20. Types of Software Data Communitychrono- logical Runtimestatic => Problems are interconnected, so should be the data sources!
  • 21. Tackling problems – automated, data-driven and reproducible. MyGuideline Software Analytics = Data Science on Software Data
  • 22. Why does it work now? • Domain-Driven Design brings business language into code • Data Science enables problem analysis for developers • New Tools can create high-level concepts Code Problems Business Language abstract detailed Problems can be connected to concepts in business terms!
  • 23. My impl of Software Analytics How can Developers use the Power of Data Analysis in their Daily Work?
  • 24. What can you do today? • Visualize developer contributions over time • Identify unused, error-prone or abandoned code • Create a code and problem inventory for legacy systems • Find performance bottlenecks by analyzing call trees • Visualize unwanted dependencies between modules Make specific problems in your software system visible! e. g. Race Conditions, Architecture Smells, Build Breaker, Programming Errors
  • 25. Choose known tools or tools for plan B* Python Neo4j, Pandas, Spark * want to learn / profit from in near future on a suitable platform.Jupyter, Zeppelin => Tools shouldn‘t stand in the way!
  • 26. Notebookan open dialog with data Context Idea Analysis Conclusion Problem Context documented Ideas, assumptions and heuristics communicated Preprocessing justified Calculations understandable Summaries conclusive Everything automated
  • 28. Python Data Scientist's Best Friend: Easy, effective, fast programming language Pandas Pragmatic Data Analysis Framework: Great data structures & integrations with machine learning libraries D3 Visualization Library for Data-Driven Document: Just beautiful, interactive graphics! Jupyter Interactive Notebook: Central hub for data analysis and documentation Basic Tooling
  • 29. Advanced Tooling: jQAssistant & Neo4j + = scan document validate https://jqassistant.org/
  • 30. Advanced Tooling: jQAssistant & Neo4j Main Ideas • Scan software structures • Store data in Neo4j database • Execute queries • Examine relationships • Add high-level concepts • Validate rules via constraints • Generate reports
  • 31. jQAssistant – Use Cases Living, self-validating architecture documentation
  • 32. jQAssistant – Use Cases Java Class Business‘ Subdomain Living, self-validating architecture documentation + Find design & code smells + Add business perspectives
  • 33. Neo4j Schema for Software Data Node Labels File Class Method Commit Relationship Types CONTAINS DEPENDS_ON INVOKES CONTAINS_CHANGE Properties name fqn signature message File Java key value name “Pet” fileName “Pet.java” fqn “foo.bar.Pet” TypeFile
  • 34. Cypher Query Example Spring PetClinic “Give me all database objects” MATCH (t:Type)-[:ANNOTATED_BY]->()-[:OF_TYPE]->(a:Type) WHERE a.fqn="javax.persistence.Entity" RETURN t AS JpaEntity
  • 37. Example JaCoCo  Pandas  D3 Production Coverage 1. Measure code coverage in production 2. Calculate ratio of covered lines to all lines 3. Visualize “usage hotspots” with hierarchical bubble chart https://www.feststelltaste.de/visualizing-production-coverage-with-jacoco-pandas-and-d3/
  • 38. Example Git  Pandas  D3 Knowledge Island* 1. Take Git log with numstats 2. Calculate proportional contributions for each source code file per author 3. Visualize “ownership” with hierarchical bubble chart * heavily inspired by Adam Tornhillhttps://www.feststelltaste.de/knowledge-islands/
  • 39. Example jQAssistant  Neo4j  Pandas  D3 Dependency Analysis between Bounded Contexts https://www.feststelltaste.de/a-graphical-approach-towards-bounded-contexts/
  • 40. Example jQAssistant  Neo4j  Pandas  D3 Dependency Analysis between Bounded Contexts MATCH (s1:Subdomain)<-[:BELONGS_TO]- (type:Type)-[r:DEPENDS_ON*0..1]-> (dependency:Type)-[:BELONGS_TO]->(s2:Subdomain) RETURN s1.name as type, s2.name as dep, COUNT(r) as number https://www.feststelltaste.de/a-graphical-approach-towards-bounded-contexts/ Subdomains => Bounded Contexts that have meaning to business!
  • 41. Example JProfiler  jQAssistant  Neo4j  Pandas Mining performance hotspots 1. Record Call Trees 2. Identify which parts of the application code is responsible for most of the DB operations 3. Trace problems back to the root causes https://www.feststelltaste.de/mining-performance-hotspots-with-jprofiler-jqassistant-neo4j-and-pandas-part-1-the-call-graph/ Requests Incoming Outgoing SQL Calls
  • 42. Example jQAssistant  Neo4j  Pandas Recursive Method Calls MATCH (m:Method)-[:INVOKES*]->(m) RETURN m
  • 43. Example jQAssistant  Neo4j  Pandas Recursive Method Calls to Database MATCH (m:Method)-[:INVOKES*]->(m) -[:INVOKES]->(dbMethod:Method) <-[:DECLARES]-(dbClass:Class) WHERE dbClass.name = "Database" RETURN m, dbMethod, dbClass
  • 44. Example jQAssistant  Neo4j  Pandas Identify possible Race Conditions public class OwnerController { ... private static int ownersIndexes; MATCH (c:Class)-[:DECLARES]->(f:Field)<-[w:WRITES]-(m:Method) WHERE EXISTS(f.static) AND NOT EXISTS(f.final) RETURN c.name, f.name, w.lineNumber, m.name static = same field for all instances of that class
  • 46. Summary • Tooling for data analysis in software development is here! • First analyses are easy to do using tools you already know • Specific in-depth analysis are powerful and worthwhile • Connection between business and developers is possible! • Problems can be attached to code that is business-related • Making the impact of risk-taking visible is a must-have to improve! • Jupyter/Pandas & jQAssistant/Neo4j are my favorites • Provide many ways for identifying problems • Help to figure out solutions as well!
  • 47. Links Markus Harrer • Blog: https://feststelltaste.de • Twitter: https://twitter.com/feststelltaste • SlideShare: https://www.slideshare.net/feststelltaste • Consulting: http://markusharrer.de jQAssistant/Neo4j • Demos: https://jqassistant.org/get-started/ • Guide: http://buschmais.github.io/jqassistant/doc/1.3.0/ • Talk by Dirk Mahler: https://vimeo.com/170797227