Paco Nathan

Sebastopol, California, United States Contact Info
10K followers 500+ connections

Join to view profile

About

check out the "Graph Data Science" group: https://www.linkedin.com/groups/6725785/

Activity

Join now to see all activity

Experience & Education

  • Senzing

View Paco’s full experience

See their title, tenure and more.

or

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

Publications

  • Entity Resolved Knowledge Graphs: A Tutorial

    Neo4j

    Using the Python API for Senzing to run entity resolution on three datasets about businesses in the Las Vegas metro area: SafeGraph, WHISGARD wage compliance from US Dept of Labor, PPP loans from US Chamber of Commerce. We build a knowledge graph in Neo4j from the results, then use Jupyter, Pandas, Seaborn, PyVis to compare the before/after of resolving duplicate records.

    See publication
  • Latent Space

    Derwen

    A f*ck-around-and-find-out whodunit tale of neo-noir gore and messy flip-the-script cli-fi about artificial intelligence, animism, national security liberals, insurrection, climate guilt, weaponized media, advanced mathematics, conspiracism, global cyberwar, overlapping polycrisis, and the strangest of bedfellows.

    See publication
  • NLP Entity Linking for Medical Transcripts

    Manning

    In this liveProject, you’re a data scientist at a healthcare provider that deals with large volumes of incoming text. Your task is to analyze a large dataset containing medical transcriptions. Leveraging technologies including pandas, the IBM Project Debater API, and Seaborn, you’ll explore a Kaggle dataset, segment text data into known categories, and extract key points.

    You’ll finish by building an interactive data visualization dashboard for analysis in the open-source framework…

    In this liveProject, you’re a data scientist at a healthcare provider that deals with large volumes of incoming text. Your task is to analyze a large dataset containing medical transcriptions. Leveraging technologies including pandas, the IBM Project Debater API, and Seaborn, you’ll explore a Kaggle dataset, segment text data into known categories, and extract key points.

    You’ll finish by building an interactive data visualization dashboard for analysis in the open-source framework Streamlit. When you’re done, you’ll have leveled up your NLP toolbox with skills that are highly sought not only in healthcare but in law, customer support, market intelligence, media, and many other fields.

    See publication
  • 2022 AI in Healthcare Survey Report

    Gradient Flow

    Applications of AI in Healthcare ​pose a number of challenges and considerations which differ substantially from other business verticals. We conducted an industry survey specifically about AI in healthcare, to understand more about current trends and issues. A total of 321 respondents from 41 countries participated in the survey. A quarter of all respondents (27%) held ​Technical Leadership​ roles. This survey was conducted in collaboration with John Snow Labs.

    Other authors
    See publication
  • Recommender Systems Best Practices

    NVIDIA

    Building, deploying, and optimizing recommender systems that effectively engages users and impacts business value, including revenue, is hard. Data scientists, machine learning engineers, and leads within global e-commerce, media, and on-demand domains have successfully designed, built, and deployed recommendation systems that impact business value. Download this paper to get insights, best practices, and advice from expert interviews and uncover how recommender systems teams handle…

    Building, deploying, and optimizing recommender systems that effectively engages users and impacts business value, including revenue, is hard. Data scientists, machine learning engineers, and leads within global e-commerce, media, and on-demand domains have successfully designed, built, and deployed recommendation systems that impact business value. Download this paper to get insights, best practices, and advice from expert interviews and uncover how recommender systems teams handle preprocessing, feature engineering, training models, evaluating models, selecting which appropriate technologies to integrate, interoperability with open source, and more. Learn insights from leaders and technical experts at global companies such as The New York Times, Tencent, Meituan, NVIDIA, and more.

    See publication
  • 2021 NLP Survey Report

    Gradient Flow

    Our 2021 NLP Industry Survey report is informed by several important contrasts: organizations with years of history deploying NLP applications in production compared to those which are exploring NLP, responses from Technical Leaders versus general practitioners, and company size. We draw insights and indicate trends based on those contrasts. This survey was conducted in collaboration with John Snow Labs.

    Other authors
    See publication
  • Graph Thinking

    Knowledge Graph Conferene

    Graph Thinking, as a cognitive framework for approaching complex analytics problems which can be solved with graph technologies – with analogies from learning theory, about how people organize knowledge in graph-like cognitive structures as they progress from novice to expert in a given field.

    See publication
  • Model Monitoring Enables Robust Machine Learning Applications

    Gradient Flow

    Key features of ML monitoring solutions, why companies need a holistic MLOps platform that includes model monitoring, and challenges companies face in making that happen.

    Other authors
    See publication
  • Hardware > Software > Process: Data Science in a Post-Moore's Law World

    Manning

    Learn why hardware innovations demand rethinking how data teams build analytics and ML applications.

    Other authors
    See publication
  • 2021 AI in Healthcare Survey Report

    Gradient Flow

    Applications of AI in Healthcare ​pose a number of challenges and considerations which differ substantially from other business verticals. We conducted an industry survey specifically about AI in healthcare, to understand more about current trends and issues. A total of 373 respondents from 49 countries participated in the survey. A quarter of all respondents (27%) held ​Technical Leadership​ roles. This survey was conducted in collaboration with John Snow Labs.

    Other authors
    See publication
  • Operationalizing AI

    O'Reilly Media

    Across industry sectors, both management and leaders see a yawning gap between the promised and delivered impact of data science projects and wonder why the discrepancy exists. It's simple, really. Companies rely on highly skilled and expensive data scientists to help them build predictive capabilities into their products and workflows, but they often think the data science team alone can lead the change.

    This report examines issues from several conversations the authors held with data…

    Across industry sectors, both management and leaders see a yawning gap between the promised and delivered impact of data science projects and wonder why the discrepancy exists. It's simple, really. Companies rely on highly skilled and expensive data scientists to help them build predictive capabilities into their products and workflows, but they often think the data science team alone can lead the change.

    This report examines issues from several conversations the authors held with data science teams across industries, as well as those issues they've witnessed in their own experience as builders and leaders. Among their findings, the authors agreed that to shorten the production process, lower overhead, and reduce risk, organizations need a comprehensive understanding of how to build AI in a repeatable fashion.

    Other authors
    See publication
  • 2020 NLP Survey Report

    Gradient Flow

    The Natural Language Processing (NLP) Industry Survey was an online survey which ran for 41 days (July 5 to August 14, 2020). A total of 571 respondents from more than 50 countries completed the survey. A quarter of all respondents hold technical leadership roles. Respondents were recruited via social media, online advertising, the Gradient Flow Newsletter, and through industry partners and contacts. This survey was sponsored by John Snow Labs.

    Other authors
    See publication
  • Intro to RLlib: Example Environments

    Anyscale

    RLlib is an open-source library in Python, based on Ray, which is used for reinforcement learning (RL). This article provides a hands-on introduction to RLlib and reinforcement learning by working step-by-step through sample code. The material in this article, which comes from Anyscale Academy, provides a complement to the RLlib documentation.

    See publication
  • Visualizing Geospatial Data in Python

    Towards Data Science

    Open source tools and techniques for visualizing data on custom maps in Python.

    Other authors
    See publication
  • Rich Search and Discovery for Research Datasets

    SAGE Publishing

    This ground-breaking book explores how automating the search for and discovery of datasets can help tackle irreproducibility in social science.

    Other authors
    See publication
  • Agile AI

    O'Reilly Media

    As more companies work to adopt AI for business processes, project costs and failure rates are on the rise. Why? No standard practice exists for implementing AI in business applications, and many organizations don’t have the skills, processes, and tools to mitigate risk.

    Other authors
    See publication
  • Fifty Years of Data Management and Beyond

    O'Reilly Media

    Every decade since the 1960s, researchers at companies like IBM, Amazon, and many others have introduced major new frameworks and techniques to handle rising data management problems. This concise ebook explains how these new systems helped data science evolve quickly—from hierarchical and relational databases to big data and cloud computing to streaming and graph data.

    See publication
  • A landscape diagram for Python data

    IBM Data Science Community

    What are the open source libraries in Python which are popularly used in data science work, and how do they fit together?

    Other authors
    See publication
  • AI Adoption in the Enterprise

    O'Reilly Media

    While O’Reilly has identified several trends among enterprise companies for adopting artificial intelligence, we decided to drill down further to learn just how businesses worldwide are planning and prioritizing this work. In a recent survey, we asked respondents about revenue-bearing AI projects their organizations have in production. How might their AI adoption patterns change over the course of the next year?

    Other authors
    See publication
  • Evolving Data Infrastructure

    O'Reilly Media

    How are companies using or exploring AI, big data, and the cloud for advanced analytics and automation? In an O’Reilly survey conducted in October 2018, more than 3,200 companies throughout the world—located primarily in North America, Europe, and Asia—revealed their choices of tools, technologies, and practices for pursuing sophisticated cloud-based data solutions.

    Other authors
    See publication
  • The State of Machine Learning Adoption in the Enterprise

    O'Reilly Media

    While the use of machine learning (ML) in production started near the turn of the century, it’s taken roughly 20 years for the practice to become mainstream throughout industry. With this report, you’ll learn how more than 11,000 data specialists responded to a recent O’Reilly survey about their organization’s approach—or intended approach—to machine learning.

    Other authors
    See publication
  • Building Data Science Teams

    O'Reilly Media

    Imagine cooking a stew with a single ingredient or growing a country garden with a single type of flower. One-dimensional efforts like these yield bland and boring results. Now imagine staffing a data science team with only PhDs in machine learning. In spite of the impressive pedigree, the result would be similar: bland, boring, and, possibly worse, ineffective.

    But if not just data people, then who?

    See publication
  • Introduction to Apache Spark

    O'Reilly Media

    With its ability to perform fast, in-memory cluster computing, Apache Spark is emerging as a favorite technology for analytics on large datasets. This video workshop from Paco Nathan (host of the Just Enough Math workshop) provides developers with an introduction to Spark and its core APIs. By working with hands-on technical exercises, you’ll get up to speed on how to use Spark for data exploration, analysis, and building big data applications in Python, Java, or Scala.

    See publication
  • Just Enough Math

    O'Reilly Media

    The webcast introduces advanced math for business people — "just enough" to take advantage of open source frameworks — including graph theory, abstract algebra, optimization, bayesian statistics, and more advanced areas of linear algebra. These are needed for supply chain optimization, pricing models, and anti-fraud, especially given the increased data rates coming from the Internet of Things.

    See publication
  • Intro to Apache Spark workshop

    Databricks

    Authored a full-day, hands-on workshop introducing Apache Spark, led team + partners to deliver instruction worldwide.

    See publication
  • Whitepaper: Agricultural Systems + Data Outlook

    The Data Guild

    How can data be leveraged to make food production and distribution systems more responsive, resilient, and efficient? An ecosystem of agricultural data has been quietly evolving, and is rapidly becoming a vital component of global food security. The data rates and variety are vast: remote sensing via small satellites, sensor networks in the fields, tractors­-as-­drones, and more. Many issues implied by this category of data, however, are quite subtle and in some cases counter­intuitive. Given…

    How can data be leveraged to make food production and distribution systems more responsive, resilient, and efficient? An ecosystem of agricultural data has been quietly evolving, and is rapidly becoming a vital component of global food security. The data rates and variety are vast: remote sensing via small satellites, sensor networks in the fields, tractors­-as-­drones, and more. Many issues implied by this category of data, however, are quite subtle and in some cases counter­intuitive. Given that this field is relatively new and not particularly organized yet, key learnings may be adapted from other sectors where large-­scale data and analytics have already played a transformational role: finance, intelligence, e-­commerce, telecom, energy, etc.

    Other authors
    See publication
  • Enterprise Data Workflows with Cascading

    O'Reilly Media

    Despite its growing use in the enterprise, building applications for Hadoop is notoriously difficult. But there is a solution. This hands-on book introduces you to Cascading, the framework that enables you to build powerful data processing applications on Hadoop without having to spend months learning the intricacies of MapReduce.

    Whether you’re a developer, data scientist, or system/IT administrator, you’ll quickly learn Cascading’s streamlined approach to data processing, data…

    Despite its growing use in the enterprise, building applications for Hadoop is notoriously difficult. But there is a solution. This hands-on book introduces you to Cascading, the framework that enables you to build powerful data processing applications on Hadoop without having to spend months learning the intricacies of MapReduce.

    Whether you’re a developer, data scientist, or system/IT administrator, you’ll quickly learn Cascading’s streamlined approach to data processing, data filtering, and workflow optimization, using sample apps based on Java, Scala, and Clojure. Companies such as Etsy, Razorfish, TeleNav, and Twitter already use Cascading for mission-critical applications. This book shows you how this framework can help your organization extract meaningful information from large amounts of distributed data.

    See publication
  • Three Laws of Avatarics

    Virtual Worlds Conference

  • What "Countermeasures" Really Means

    O'Reilly Media

    Building a case for use of risk metrics in determining reasonable countermeasures to network security attacks. Introduction to "OpenSIMS" open source project.

    See publication
  • The Corporate Body: Liber 118 U.S. 394

    Signum Press

    Review of "corporate metabolism" metaphor.

    See publication
  • Corporate Metabolism

    Tripzine

    An extensive analysis of the structure and function of the "corporate organism".

    See publication
  • Jackson Wins, Feds Lose

    Wired

    Coverage of federal court case in Steve Jackson Games vs. US Secret Service.

    See publication

Projects

  • ERKG

    Hands-on tutorial in Python demonstrates integration of Senzing and Neo4j to construct an Entity Resolved Knowledge Graph:

    Other creators
  • SofLiM4KG

    The Software Lifecycle Management for KG workshop (SofLiM4KG) aims to collect experiences in successful and abandoned knowledge graph projects from this perspective to (a) carve out the specifics in knowledge graph engineering that pose challenges beyond software engineering practices, (b) to establish best practices and anti-patterns for the community, and (c) build the foundations for the systematic investigation of the connection to software engineering, as well as qualitative and…

    The Software Lifecycle Management for KG workshop (SofLiM4KG) aims to collect experiences in successful and abandoned knowledge graph projects from this perspective to (a) carve out the specifics in knowledge graph engineering that pose challenges beyond software engineering practices, (b) to establish best practices and anti-patterns for the community, and (c) build the foundations for the systematic investigation of the connection to software engineering, as well as qualitative and quantitative studies in project management of knowledge graphs.

    This project originated at Dagstuhl 24061, in Feb 2024

    Other creators
  • TextGraphs

    Using LLMs to boost the performance of NLP tasks in KG construction, introducing use of a "lemma graph" (linguistic provenance) for graph levels of detail, and exploring topological transforms to enhance graph ML capabilities. This research surveys and evaluates the open source model capabilities for named entity recognition, entity linking, relation extraction, and graph of relations.

  • MkRefs

    MkDocs plugin to generate "semantic reference" materials as Markdown pages, from a knowledge graph.

    See project
  • kglab

    Graph Data Science: an abstraction layer in Python for building knowledge graphs, integrated with popular graph libraries – atop Pandas, RDFlib, pySHACL, NetworkX, iGraph, PyVis, pslpython, pyarrow, etc.

    Other creators
    See project
  • disparity_filter

    - Present

    Implements a disparity filter in Python, based on graphs in NetworkX, to extract the multiscale backbone of a complex weighted network (Serrano, et al., 2009)

    See project
  • PyTextRank

    - Present

    Python implementation of TextRank for text document NLP parsing and extractive summarization, based atop spaCy, datasketch, NetworkX. Graph algorithms for advanced NLP and preparing text data to use in deep learning, etc.

    Other creators
    See project
  • Ray tutorial

    -

    An introductory tutorial about leveraging Ray core features for distributed patterns.

    See project
  • richcontext.scholapi

    -

    Rich Context API integrations for federating metadata discovery and exchange across multiple scholarly infrastructure providers.

    See project
  • Apache Spark Developer Certification

    Authored exam, assisted on Databricks+O'Reilly Media partnership and publicity, led team executing on proctoring, evaluations, analysis, exam iteration, etc.

    See project
  • Exelixi

    -

    Exelixi is a distributed framework based on Apache Mesos, mostly implemented in Python using gevent for high-performance concurrency. It is intended to run cluster computing jobs (partitioned batch jobs, which include some messaging) in pure Python. By default, it runs genetic algorithms at scale.

    Other creators
    See project
  • Cascading Pattern

    -

    Pattern sub-project for http://Cascading.org/ which uses flows as containers for machine learning models, importing PMML model descriptions from R, SAS, Weka, RapidMiner, KNIME, SQL Server, etc.

    Other creators
    See project
  • Cascading for the Impatient

    -

    An introduction to programming with the Cascading API for MapReduce workflow orchestration. We start with the simplest possible Cascading app, a file copy, and progress up to a full implementation of TF-IDF in Cascading. Also showing best practices and test-driven development features for working with data at scale.

    Other creators
    See project
  • Cascading + City of Palo Alto open data

    An example of a "Big Data" application, based on Cascading, which leverages City of Palo Alto open data... find a shady spot on a hot day, to walk and take a phone call.

    Other creators
    See project

Honors & Awards

  • Top 30 People in Big Data and Analytics

    Innovation Enterprise

    http://www.kdnuggets.com/2015/02/top-30-people-big-data-analytics.html

  • NISOD Excellence Award

    Austin Community College

    As an adjunct professor at ACC, having developed a network security program for the Continuing Education department. https://www.nisod.org/forms/past_ea_recipients/

More activity by Paco

View Paco’s full profile

  • See who you know in common
  • Get introduced
  • Contact Paco directly
Join to view full profile

Other similar profiles

Explore collaborative articles

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Explore More

Others named Paco Nathan

1 other named Paco Nathan is on LinkedIn

See others named Paco Nathan

Add new skills with these courses