Paco Nathan

Sebastopol, California, United States Contact Info

Sign in to view Paco’s full profile

Welcome back

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

New to LinkedIn? Join now

or

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

New to LinkedIn? Join now

10K followers 500+ connections

View mutual connections with Paco

Welcome back

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

New to LinkedIn? Join now

or

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

New to LinkedIn? Join now

Join to view profile

Senzing

Stanford University

Contact Paco for services

Public Speaking

About

check out the "Graph Data Science" group: https://www.linkedin.com/groups/6725785/

Activity

To quote Hamel H., "Its the least sexiest but most important topic" - cleaning, curating and looking at your data. Daniel van Strien and I spent…

To quote Hamel H., "Its the least sexiest but most important topic" - cleaning, curating and looking at your data. Daniel van Strien and I spent…

Liked by Paco Nathan
The growing utilization of finite resources is causing environmental and societal challenges to escalate worldwide. Join us 7/18 for an exciting…

The growing utilization of finite resources is causing environmental and societal challenges to escalate worldwide. Join us 7/18 for an exciting…

Liked by Paco Nathan
I love this presentation by my colleague Kathe Todd-Brown. Such a great description of how collaboration and communities go hand in hand…

I love this presentation by my colleague Kathe Todd-Brown. Such a great description of how collaboration and communities go hand in hand…

Liked by Paco Nathan

Join now to see all activity

Experience & Education

Senzing

******, ***.

**** *** *********
******** **********

**

1983 - 1986
******** **********

**

1982 - 1986

View Paco’s full experience

See their title, tenure and more.

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

Publications

Entity Resolved Knowledge Graphs: A Tutorial

Neo4j April 22, 2024

Using the Python API for Senzing to run entity resolution on three datasets about businesses in the Las Vegas metro area: SafeGraph, WHISGARD wage compliance from US Dept of Labor, PPP loans from US Chamber of Commerce. We build a knowledge graph in Neo4j from the results, then use Jupyter, Pandas, Seaborn, PyVis to compare the before/after of resolving duplicate records.

See publication
Latent Space

Derwen May 7, 2023

A f*ck-around-and-find-out whodunit tale of neo-noir gore and messy flip-the-script cli-fi about artificial intelligence, animism, national security liberals, insurrection, climate guilt, weaponized media, advanced mathematics, conspiracism, global cyberwar, overlapping polycrisis, and the strangest of bedfellows.

See publication
NLP Entity Linking for Medical Transcripts

Manning May 2, 2022

In this liveProject, you’re a data scientist at a healthcare provider that deals with large volumes of incoming text. Your task is to analyze a large dataset containing medical transcriptions. Leveraging technologies including pandas, the IBM Project Debater API, and Seaborn, you’ll explore a Kaggle dataset, segment text data into known categories, and extract key points.

You’ll finish by building an interactive data visualization dashboard for analysis in the open-source framework…

In this liveProject, you’re a data scientist at a healthcare provider that deals with large volumes of incoming text. Your task is to analyze a large dataset containing medical transcriptions. Leveraging technologies including pandas, the IBM Project Debater API, and Seaborn, you’ll explore a Kaggle dataset, segment text data into known categories, and extract key points.

You’ll finish by building an interactive data visualization dashboard for analysis in the open-source framework Streamlit. When you’re done, you’ll have leveled up your NLP toolbox with skills that are highly sought not only in healthcare but in law, customer support, market intelligence, media, and many other fields.

See publication
2022 AI in Healthcare Survey Report

Gradient Flow March 28, 2022
Applications of AI in Healthcare pose a number of challenges and considerations which differ substantially from other business verticals. We conducted an industry survey specifically about AI in healthcare, to understand more about current trends and issues. A total of 321 respondents from 41 countries participated in the survey. A quarter of all respondents (27%) held Technical Leadership roles. This survey was conducted in collaboration with John Snow Labs.

Other authors
See publication
Recommender Systems Best Practices

NVIDIA September 28, 2021

Building, deploying, and optimizing recommender systems that effectively engages users and impacts business value, including revenue, is hard. Data scientists, machine learning engineers, and leads within global e-commerce, media, and on-demand domains have successfully designed, built, and deployed recommendation systems that impact business value. Download this paper to get insights, best practices, and advice from expert interviews and uncover how recommender systems teams handle…

Building, deploying, and optimizing recommender systems that effectively engages users and impacts business value, including revenue, is hard. Data scientists, machine learning engineers, and leads within global e-commerce, media, and on-demand domains have successfully designed, built, and deployed recommendation systems that impact business value. Download this paper to get insights, best practices, and advice from expert interviews and uncover how recommender systems teams handle preprocessing, feature engineering, training models, evaluating models, selecting which appropriate technologies to integrate, interoperability with open source, and more. Learn insights from leaders and technical experts at global companies such as The New York Times, Tencent, Meituan, NVIDIA, and more.

See publication
2021 NLP Survey Report

Gradient Flow September 20, 2021
Our 2021 NLP Industry Survey report is informed by several important contrasts: organizations with years of history deploying NLP applications in production compared to those which are exploring NLP, responses from Technical Leaders versus general practitioners, and company size. We draw insights and indicate trends based on those contrasts. This survey was conducted in collaboration with John Snow Labs.

Other authors
See publication
Graph Thinking

Knowledge Graph Conferene August 5, 2021

Graph Thinking, as a cognitive framework for approaching complex analytics problems which can be solved with graph technologies – with analogies from learning theory, about how people organize knowledge in graph-like cognitive structures as they progress from novice to expert in a given field.

See publication
Model Monitoring Enables Robust Machine Learning Applications

Gradient Flow May 26, 2021
Key features of ML monitoring solutions, why companies need a holistic MLOps platform that includes model monitoring, and challenges companies face in making that happen.

Other authors
See publication
Hardware > Software > Process: Data Science in a Post-Moore's Law World

Manning May 25, 2021
Learn why hardware innovations demand rethinking how data teams build analytics and ML applications.

Other authors
See publication
2021 AI in Healthcare Survey Report

Gradient Flow March 21, 2021
Applications of AI in Healthcare pose a number of challenges and considerations which differ substantially from other business verticals. We conducted an industry survey specifically about AI in healthcare, to understand more about current trends and issues. A total of 373 respondents from 49 countries participated in the survey. A quarter of all respondents (27%) held Technical Leadership roles. This survey was conducted in collaboration with John Snow Labs.

Other authors
See publication
Operationalizing AI

O'Reilly Media March 15, 2021
Across industry sectors, both management and leaders see a yawning gap between the promised and delivered impact of data science projects and wonder why the discrepancy exists. It's simple, really. Companies rely on highly skilled and expensive data scientists to help them build predictive capabilities into their products and workflows, but they often think the data science team alone can lead the change.

This report examines issues from several conversations the authors held with data…

Across industry sectors, both management and leaders see a yawning gap between the promised and delivered impact of data science projects and wonder why the discrepancy exists. It's simple, really. Companies rely on highly skilled and expensive data scientists to help them build predictive capabilities into their products and workflows, but they often think the data science team alone can lead the change.

This report examines issues from several conversations the authors held with data science teams across industries, as well as those issues they've witnessed in their own experience as builders and leaders. Among their findings, the authors agreed that to shorten the production process, lower overhead, and reduce risk, organizations need a comprehensive understanding of how to build AI in a repeatable fashion.

Other authors
See publication
2020 NLP Survey Report

Gradient Flow September 21, 2020
The Natural Language Processing (NLP) Industry Survey was an online survey which ran for 41 days (July 5 to August 14, 2020). A total of 571 respondents from more than 50 countries completed the survey. A quarter of all respondents hold technical leadership roles. Respondents were recruited via social media, online advertising, the Gradient Flow Newsletter, and through industry partners and contacts. This survey was sponsored by John Snow Labs.

Other authors
See publication
Intro to RLlib: Example Environments

Anyscale July 9, 2020

RLlib is an open-source library in Python, based on Ray, which is used for reinforcement learning (RL). This article provides a hands-on introduction to RLlib and reinforcement learning by working step-by-step through sample code. The material in this article, which comes from Anyscale Academy, provides a complement to the RLlib documentation.

See publication
Visualizing Geospatial Data in Python

Towards Data Science June 2, 2020
Open source tools and techniques for visualizing data on custom maps in Python.

Other authors
See publication
Rich Search and Discovery for Research Datasets

SAGE Publishing May 18, 2020
This ground-breaking book explores how automating the search for and discovery of datasets can help tackle irreproducibility in social science.

Other authors
See publication
Agile AI

O'Reilly Media October 10, 2019
As more companies work to adopt AI for business processes, project costs and failure rates are on the rise. Why? No standard practice exists for implementing AI in business applications, and many organizations don’t have the skills, processes, and tools to mitigate risk.

Other authors
See publication
Fifty Years of Data Management and Beyond

O'Reilly Media April 29, 2019

Every decade since the 1960s, researchers at companies like IBM, Amazon, and many others have introduced major new frameworks and techniques to handle rising data management problems. This concise ebook explains how these new systems helped data science evolve quickly—from hierarchical and relational databases to big data and cloud computing to streaming and graph data.

See publication
A landscape diagram for Python data

IBM Data Science Community March 13, 2019
What are the open source libraries in Python which are popularly used in data science work, and how do they fit together?

Other authors
See publication
AI Adoption in the Enterprise

O'Reilly Media February 20, 2019
While O’Reilly has identified several trends among enterprise companies for adopting artificial intelligence, we decided to drill down further to learn just how businesses worldwide are planning and prioritizing this work. In a recent survey, we asked respondents about revenue-bearing AI projects their organizations have in production. How might their AI adoption patterns change over the course of the next year?

Other authors
See publication
Evolving Data Infrastructure

O'Reilly Media January 25, 2019
How are companies using or exploring AI, big data, and the cloud for advanced analytics and automation? In an O’Reilly survey conducted in October 2018, more than 3,200 companies throughout the world—located primarily in North America, Europe, and Asia—revealed their choices of tools, technologies, and practices for pursuing sophisticated cloud-based data solutions.

Other authors
See publication
The State of Machine Learning Adoption in the Enterprise

O'Reilly Media August 7, 2018
While the use of machine learning (ML) in production started near the turn of the century, it’s taken roughly 20 years for the practice to become mainstream throughout industry. With this report, you’ll learn how more than 11,000 data specialists responded to a recent O’Reilly survey about their organization’s approach—or intended approach—to machine learning.

Other authors
See publication
Building Data Science Teams

O'Reilly Media November 19, 2015

Imagine cooking a stew with a single ingredient or growing a country garden with a single type of flower. One-dimensional efforts like these yield bland and boring results. Now imagine staffing a data science team with only PhDs in machine learning. In spite of the impressive pedigree, the result would be similar: bland, boring, and, possibly worse, ineffective.

But if not just data people, then who?

See publication
Introduction to Apache Spark

O'Reilly Media March 9, 2015

With its ability to perform fast, in-memory cluster computing, Apache Spark is emerging as a favorite technology for analytics on large datasets. This video workshop from Paco Nathan (host of the Just Enough Math workshop) provides developers with an introduction to Spark and its core APIs. By working with hands-on technical exercises, you’ll get up to speed on how to use Spark for data exploration, analysis, and building big data applications in Python, Java, or Scala.

See publication
Just Enough Math

O'Reilly Media June 4, 2014

The webcast introduces advanced math for business people — "just enough" to take advantage of open source frameworks — including graph theory, abstract algebra, optimization, bayesian statistics, and more advanced areas of linear algebra. These are needed for supply chain optimization, pricing models, and anti-fraud, especially given the increased data rates coming from the Internet of Things.

See publication
Intro to Apache Spark workshop

Databricks April 23, 2014

Authored a full-day, hands-on workshop introducing Apache Spark, led team + partners to deliver instruction worldwide.

See publication
Whitepaper: Agricultural Systems + Data Outlook

The Data Guild February 20, 2014
How can data be leveraged to make food production and distribution systems more responsive, resilient, and efficient? An ecosystem of agricultural data has been quietly evolving, and is rapidly becoming a vital component of global food security. The data rates and variety are vast: remote sensing via small satellites, sensor networks in the fields, tractors-as-drones, and more. Many issues implied by this category of data, however, are quite subtle and in some cases counterintuitive. Given…

How can data be leveraged to make food production and distribution systems more responsive, resilient, and efficient? An ecosystem of agricultural data has been quietly evolving, and is rapidly becoming a vital component of global food security. The data rates and variety are vast: remote sensing via small satellites, sensor networks in the fields, tractors-as-drones, and more. Many issues implied by this category of data, however, are quite subtle and in some cases counterintuitive. Given that this field is relatively new and not particularly organized yet, key learnings may be adapted from other sectors where large-scale data and analytics have already played a transformational role: finance, intelligence, e-commerce, telecom, energy, etc.

Other authors
See publication
Enterprise Data Workflows with Cascading

O'Reilly Media July 24, 2013

Despite its growing use in the enterprise, building applications for Hadoop is notoriously difficult. But there is a solution. This hands-on book introduces you to Cascading, the framework that enables you to build powerful data processing applications on Hadoop without having to spend months learning the intricacies of MapReduce.

Whether you’re a developer, data scientist, or system/IT administrator, you’ll quickly learn Cascading’s streamlined approach to data processing, data…

Despite its growing use in the enterprise, building applications for Hadoop is notoriously difficult. But there is a solution. This hands-on book introduces you to Cascading, the framework that enables you to build powerful data processing applications on Hadoop without having to spend months learning the intricacies of MapReduce.

Whether you’re a developer, data scientist, or system/IT administrator, you’ll quickly learn Cascading’s streamlined approach to data processing, data filtering, and workflow optimization, using sample apps based on Java, Scala, and Clojure. Companies such as Etsy, Razorfish, TeleNav, and Twitter already use Cascading for mission-critical applications. This book shows you how this framework can help your organization extract meaningful information from large amounts of distributed data.

See publication
Three Laws of Avatarics

Virtual Worlds Conference December 20, 2006

See publication
What "Countermeasures" Really Means

O'Reilly Media August 3, 2004

Building a case for use of risk metrics in determining reasonable countermeasures to network security attacks. Introduction to "OpenSIMS" open source project.

See publication
The Corporate Body: Liber 118 U.S. 394

Signum Press August 1, 2001

Review of "corporate metabolism" metaphor.

See publication
Corporate Metabolism

Tripzine October 22, 2000

An extensive analysis of the structure and function of the "corporate organism".

See publication
Jackson Wins, Feds Lose

Wired May 1, 1993

Coverage of federal court case in Steve Jackson Games vs. US Secret Service.

See publication
Pattern: PMML for Cascading and Hadoop

KDD Workshop 2013
Other authors
See publication

Projects

ERKG

Mar 2024
Hands-on tutorial in Python demonstrates integration of Senzing and Neo4j to construct an Entity Resolved Knowledge Graph:

Other creators
SofLiM4KG

Feb 2024
The Software Lifecycle Management for KG workshop (SofLiM4KG) aims to collect experiences in successful and abandoned knowledge graph projects from this perspective to (a) carve out the specifics in knowledge graph engineering that pose challenges beyond software engineering practices, (b) to establish best practices and anti-patterns for the community, and (c) build the foundations for the systematic investigation of the connection to software engineering, as well as qualitative and…

The Software Lifecycle Management for KG workshop (SofLiM4KG) aims to collect experiences in successful and abandoned knowledge graph projects from this perspective to (a) carve out the specifics in knowledge graph engineering that pose challenges beyond software engineering practices, (b) to establish best practices and anti-patterns for the community, and (c) build the foundations for the systematic investigation of the connection to software engineering, as well as qualitative and quantitative studies in project management of knowledge graphs.

This project originated at Dagstuhl 24061, in Feb 2024

Other creators
TextGraphs

Nov 2023

Using LLMs to boost the performance of NLP tasks in KG construction, introducing use of a "lemma graph" (linguistic provenance) for graph levels of detail, and exploring topological transforms to enhance graph ML capabilities. This research surveys and evaluates the open source model capabilities for named entity recognition, entity linking, relation extraction, and graph of relations.
MkRefs

May 2021

MkDocs plugin to generate "semantic reference" materials as Markdown pages, from a knowledge graph.

See project
kglab

Oct 2020
Graph Data Science: an abstraction layer in Python for building knowledge graphs, integrated with popular graph libraries – atop Pandas, RDFlib, pySHACL, NetworkX, iGraph, PyVis, pslpython, pyarrow, etc.

Other creators
See project
disparity_filter

Nov 2018 - Present

Implements a disparity filter in Python, based on graphs in NetworkX, to extract the multiscale backbone of a complex weighted network (Serrano, et al., 2009)

See project
PyTextRank

Oct 2016 - Present
Python implementation of TextRank for text document NLP parsing and extractive summarization, based atop spaCy, datasketch, NetworkX. Graph algorithms for advanced NLP and preparing text data to use in deep learning, etc.

Other creators
See project
Ray tutorial

Jan 2021 - 2021

An introductory tutorial about leveraging Ray core features for distributed patterns.

See project
richcontext.scholapi

Jul 2019 - Jul 2020

Rich Context API integrations for federating metadata discovery and exchange across multiple scholarly infrastructure providers.

See project
Apache Spark Developer Certification

Jun 2018

Authored exam, assisted on Databricks+O'Reilly Media partnership and publicity, led team executing on proctoring, evaluations, analysis, exam iteration, etc.

See project
Exelixi

Nov 2013 - Mar 2014
Exelixi is a distributed framework based on Apache Mesos, mostly implemented in Python using gevent for high-performance concurrency. It is intended to run cluster computing jobs (partitioned batch jobs, which include some messaging) in pure Python. By default, it runs genetic algorithms at scale.

Other creators
See project
Cascading Pattern

Dec 2012 - Aug 2013
Pattern sub-project for http://Cascading.org/ which uses flows as containers for machine learning models, importing PMML model descriptions from R, SAS, Weka, RapidMiner, KNIME, SQL Server, etc.

Other creators
See project
Cascading for the Impatient

Jun 2012 - Aug 2012
An introduction to programming with the Cascading API for MapReduce workflow orchestration. We start with the simplest possible Cascading app, a file copy, and progress up to a full implementation of TF-IDF in Cascading. Also showing best practices and test-driven development features for working with data at scale.

Other creators
See project
Cascading + City of Palo Alto open data

Jul 2012
An example of a "Big Data" application, based on Cascading, which leverages City of Palo Alto open data... find a shady spot on a hot day, to walk and take a phone call.

Other creators
See project

Honors & Awards

Top 30 People in Big Data and Analytics

Innovation Enterprise

Feb 2015

http://www.kdnuggets.com/2015/02/top-30-people-big-data-analytics.html
NISOD Excellence Award

Austin Community College

May 2003

As an adjunct professor at ACC, having developed a network security program for the Continuing Education department. https://www.nisod.org/forms/past_ea_recipients/

More activity by Paco

Happy second birthday to Open Source Science Initiative (OSSci)! We’ve been busy laying the groundwork, looking forward the next phase.

Happy second birthday to Open Source Science Initiative (OSSci)! We’ve been busy laying the groundwork, looking forward the next phase.

Liked by Paco Nathan
Ok, this is kind of mind-blowing. I've been telling people to keep an eye on WASM and the reason why may not be as obvious to some of you, but a few…

Ok, this is kind of mind-blowing. I've been telling people to keep an eye on WASM and the reason why may not be as obvious to some of you, but a few…

Liked by Paco Nathan
🖥️ Nvidia reigns supreme in AI chips, but the game is changing. From environmental impacts to on-device AI, the industry faces new challenges. As…

🖥️ Nvidia reigns supreme in AI chips, but the game is changing. From environmental impacts to on-device AI, the industry faces new challenges. As…

Liked by Paco Nathan
We just pushed an updated version of our website: https://kuzudb.com/. The updates include some quotes from members of the community or people who…

We just pushed an updated version of our website: https://kuzudb.com/. The updates include some quotes from members of the community or people who…

Liked by Paco Nathan
Excellent article. Moreover, Google Maps has been getting especially aggressive with rerouting while in transit. One could almost call the results…

Excellent article. Moreover, Google Maps has been getting especially aggressive with rerouting while in transit. One could almost call the results…

Shared by Paco Nathan

View Paco’s full profile

See who you know in common
Get introduced
Contact Paco directly

Join to view full profile

Sign in

Stay updated on your professional world

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

New to LinkedIn? Join now

Other similar profiles

Explore collaborative articles

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Explore More

Others named Paco Nathan

Paco Nathan

--

Manchester

1 other named Paco Nathan is on LinkedIn

See others named Paco Nathan

Add new skills with these courses

See all courses

Contact Paco for services

Public Speaking

About

Activity

To quote Hamel H., "Its the least sexiest but most important topic" - cleaning, curating and looking at your data. Daniel van Strien and I spent…

Liked by Paco Nathan

The growing utilization of finite resources is causing environmental and societal challenges to escalate worldwide. Join us 7/18 for an exciting…

Liked by Paco Nathan

I love this presentation by my colleague Kathe Todd-Brown. Such a great description of how collaboration and communities go hand in hand…

Liked by Paco Nathan

Experience & Education

Senzing

********* ********* ********* ********

View Paco’s full experience

See their title, tenure and more.

Publications

Neo4j April 22, 2024

Derwen May 7, 2023

Manning May 2, 2022

Gradient Flow March 28, 2022

NVIDIA September 28, 2021

Gradient Flow September 20, 2021

Knowledge Graph Conferene August 5, 2021

Gradient Flow May 26, 2021

Manning May 25, 2021

Gradient Flow March 21, 2021

O'Reilly Media March 15, 2021

Gradient Flow September 21, 2020

Anyscale July 9, 2020

Towards Data Science June 2, 2020

SAGE Publishing May 18, 2020

O'Reilly Media October 10, 2019

O'Reilly Media April 29, 2019

IBM Data Science Community March 13, 2019

O'Reilly Media February 20, 2019

O'Reilly Media January 25, 2019

O'Reilly Media August 7, 2018

O'Reilly Media November 19, 2015

O'Reilly Media March 9, 2015

O'Reilly Media June 4, 2014

Databricks April 23, 2014

The Data Guild February 20, 2014

O'Reilly Media July 24, 2013

Virtual Worlds Conference December 20, 2006

O'Reilly Media August 3, 2004

Signum Press August 1, 2001

Tripzine October 22, 2000

Wired May 1, 1993

KDD Workshop 2013

Projects

ERKG

Mar 2024

SofLiM4KG

Feb 2024

TextGraphs

Nov 2023

May 2021

Oct 2020

Nov 2018 - Present

Oct 2016 - Present

Jan 2021 - 2021

Jul 2019 - Jul 2020

Jun 2018

Nov 2013 - Mar 2014

Dec 2012 - Aug 2013

Jun 2012 - Aug 2012

Jul 2012

Honors & Awards

Top 30 People in Big Data and Analytics

Innovation Enterprise

NISOD Excellence Award

Austin Community College

More activity by Paco

Happy second birthday to Open Source Science Initiative (OSSci)! We’ve been busy laying the groundwork, looking forward the next phase.

Liked by Paco Nathan

Ok, this is kind of mind-blowing. I've been telling people to keep an eye on WASM and the reason why may not be as obvious to some of you, but a few…

Liked by Paco Nathan

🖥️ Nvidia reigns supreme in AI chips, but the game is changing. From environmental impacts to on-device AI, the industry faces new challenges. As…

Liked by Paco Nathan