Software Analytics
with Jupyter, Pandas,
jQAssistant and Neo4j
Identifying Problems in Software Development
with Data Analysis
Markus Harrer
Neo4j Online Meetup
23rd November 2017
Markus Harrer
Software Development Analyst
Key Activities
Java Development, Data Analysis in Software
Areas of Interest
Clean Code, Agile, Software Archeology, Software
Revival, Epistemology, Cognitive Psychology
About me
1. Motivation
2. Sofware Analytics
3. My impl of Software Analytics
4. Examples & Demos
5. Summary
6. Q&A
Everything wrong with Software Development

Neo4j GraphTalk Helsinki - Introduction and Graph Use Cases
Neo4j GraphTalk Helsinki - Introduction and Graph Use CasesNeo4j GraphTalk Helsinki - Introduction and Graph Use Cases
Neo4j GraphTalk Helsinki - Introduction and Graph Use Cases

This document provides an introduction to graphs and Neo4j. It discusses that Neo4j is a native graph database that allows organizations to leverage connections in data in real-time to create value. It then provides information on Neo4j as a company and as a product, including that it is the world's leading graph database. The document goes on to define what graphs are from a data structure perspective and provides examples of famous graphs like social networks. It discusses why graph databases are useful compared to relational databases for representing complex, connected data and provides examples of use cases for Neo4j like recommendations, fraud detection, and network analysis.

The Total Economic ImpactTM (TEI) of Neo4j, Featuring Forrester
The Total Economic ImpactTM (TEI) of Neo4j, Featuring ForresterThe Total Economic ImpactTM (TEI) of Neo4j, Featuring Forrester
The Total Economic ImpactTM (TEI) of Neo4j, Featuring Forrester

Noel Yuhanna, VP, Principal Analyst, Forrester Mary Barton, Consultant, Forrester Blaise James, Analyst Relations, Neo4j

Introduction to Neo4j
Introduction to Neo4jIntroduction to Neo4j
Introduction to Neo4j

Data is both our most valuable asset and our biggest ongoing challenge. As data grows in volume, variety and complexity, across applications, clouds and siloed systems, traditional ways of working with data no longer work. Unlike traditional databases, which arrange data in rows, columns and tables, Neo4j has a flexible structure defined by stored relationships between data records. We'll discuss the primary use cases for graph databases Explore the properties of Neo4j that make those use cases possible Look into the visualisation of graphs Introduce how to write queries. Webinar, 23 July 2020

neo4jconnected datagraph database
Lack of
Software Analytics with Jupyter, Pandas, jQAssistant, and Neo4j [Neo4j Online Meetup]
Why is software development
still so crazy?

Software Analytics
Sober Problem Solving with Data Analysis based on Software Data
Software Analytics is...
“... analytics on software data
for managers and software engineers
with the aim of empowering software
development individuals and teams
to gain and share insight from their data
to make better decisions.”
Tim Menzies, Thomas Zimmermann: Software Analytics - So What?. IEEE Software Magazine
Use standard tools
for everyday‘s questions
Use Software Analytics to
tackle high-risk problems
Right Insights for better Decisions
Adopted from Tim Menzies, Thomas Zimmermann: Software Analytics - So What?. IEEE Software Magazine
Types of Software Data
=> Problems are interconnected, so should be the data sources!

Questions and Answers

  Software Analytics with Jupyter, Pandas, jQAssistant and Neo4j Identifying Problems in Software Development with Data Analysis Markus Harrer Neo4j Online Meetup 23rd November 2017
  Markus Harrer Software Development Analyst Key Activities Java Development, Data Analysis in Software Development Areas of Interest Clean Code, Agile, Software Archeology, Software Revival, Epistemology, Cognitive Psychology About me
  Agenda 1. Motivation 2. Sofware Analytics 3. My impl of Software Analytics 4. Examples & Demos 5. Summary 6. Q&A
  Motivation Everything wrong with Software Development
  Why is software development still so crazy?
  WALL OF IGNORANCE Janelle Klein: IDEAFLOW - How to Measure the PAIN in Software Development. Leanpub
  WALL OF IGNORANCE RISK VISIBILITY Janelle Klein: IDEAFLOW - How to Measure the PAIN in Software Development. Leanpub
  Software Analytics Sober Problem Solving with Data Analysis based on Software Data
  Software Analytics is... "... analytics on software data for managers and software engineers with the aim of empowering software development individuals and teams to gain and share insight from their data to make better decisions." Tim Menzies, Thomas Zimmermann: Software Analytics - So What?. IEEE Software Magazine
  Frequency Questions Use standard tools for everyday's questions Use Software Analytics to tackle high-risk problems Risk/Value Right Insights for better Decisions Adopted from Tim Menzies, Thomas Zimmermann: Software Analytics - So What?. IEEE Software Magazine
  Types of Software Data Community chronological Runtime static => Problems are interconnected, so should be the data sources!
  Tackling problems – automated, data-driven and reproducible. Software Analytics = Data Science on Software Data
  Why does it work now? • Domain-Driven Design brings business language into code • Data Science enables problem analysis for developers • New Tools can create high-level concepts Code Problems Business Language abstract detailed Problems can be connected to concepts in business terms!
  My impl of Software Analytics How can Developers use the Power of Data Analysis in their Daily Work?
  What can you do today? • Visualize developer contributions over time • Identify unused, error-prone or abandoned code • Create a code and problem inventory for legacy systems • Find performance bottlenecks by analyzing call trees • Visualize unwanted dependencies between modules Make specific problems in your software system visible! e. g. Race Conditions, Architecture Smells, Build Breaker, Programming Errors
  Choose known tools or tools for plan B* Python Neo4j, Pandas, Spark * want to learn / profit from in near future on a suitable platform. Jupyter, Zeppelin => Tools shouldn't stand in the way!
  Notebook an open dialog with data Context Idea Analysis Conclusion Problem Context documented Ideas, assumptions and heuristics communicated Preprocessing justified Calculations understandable Summaries conclusive Everything automated
  • 28. Python Data Scientist's Best Friend: Easy, effective, fast programming language Pandas Pragmatic Data Analysis Framework: Great data structures & integrations with machine learning libraries D3 Visualization Library for Data-Driven Document: Just beautiful, interactive graphics! Jupyter Interactive Notebook: Central hub for data analysis and documentation Basic Tooling
  • 29. Advanced Tooling: jQAssistant & Neo4j + = scan document validate
  Advanced Tooling: jQAssistant & Neo4j + = scan document validate
  • 31. jQAssistant – Use Cases Living, self-validating architecture documentation
  jQAssistant – Use Cases Living, self-validating architecture documentation
  jQAssistant – Use Cases Java Class Business' Subdomain Living, self-validating architecture documentation + Find design & code smells + Add business perspectives
  Neo4j Schema for Software Data Node Labels File Class Method Commit Relationship Types CONTAINS DEPENDS_ON INVOKES CONTAINS_CHANGE Properties name fqn signature message File Java key value name "Pet" fileName "" fqn "" TypeFile
  Cypher Query Example Spring PetClinic "Give me all database objects" MATCH (t:Type)-[:ANNOTATED_BY]->()-[:OF_TYPE]->(a:Type) WHERE a.fqn="javax.persistence.Entity" RETURN t AS JpaEntity
  Example JaCoCo  Pandas  D3 Production Coverage 1. Measure code coverage in production 2. Calculate ratio of covered lines to all lines 3. Visualize "usage hotspots" with hierarchical bubble chart
  • 39. Example jQAssistant  Neo4j  Pandas  D3 Dependency Analysis between Bounded Contexts
  Example jQAssistant  Neo4j  Pandas  D3 Dependency Analysis between Bounded Contexts
  Example jQAssistant  Neo4j  Pandas  D3 Dependency Analysis between Bounded Contexts MATCH (s1:Subdomain)<-[:BELONGS_TO]- (type:Type)-[r:DEPENDS_ON*0..1]-> (dependency:Type)-[:BELONGS_TO]->(s2:Subdomain) RETURN as type, as dep, COUNT(r) as number Subdomains => Bounded Contexts that have meaning to business!
  • 42. Example jQAssistant  Neo4j  Pandas Recursive Method Calls MATCH (m:Method)-[:INVOKES*]->(m) RETURN m
  Example jQAssistant  Neo4j  Pandas Recursive Method Calls MATCH (m:Method)-[:INVOKES*]->(m) RETURN m
  Example jQAssistant  Neo4j  Pandas Recursive Method Calls to Database MATCH (m:Method)-[:INVOKES*]->(m) -[:INVOKES]->(dbMethod:Method) <-[:DECLARES]-(dbClass:Class) WHERE = "Database" RETURN m, dbMethod, dbClass
  Example jQAssistant  Neo4j  Pandas Identify possible Race Conditions public class OwnerController { ... private static int ownersIndexes; MATCH (c:Class)-[:DECLARES]->(f:Field)<-[w:WRITES]-(m:Method) WHERE EXISTS(f.static) AND NOT EXISTS( RETURN,, w.lineNumber, static = same field for all instances of that class
  • 47. Links Markus Harrer • Blog: • Twitter: • SlideShare: • Consulting: jQAssistant/Neo4j • Demos: • Guide: • Talk by Dirk Mahler: