SlideShare a Scribd company logo
Indexing and searching NuGet.org
with Azure Functions and Search
Maarten Balliauw
@maartenballiauw
“Find this type on NuGet.org”
“Find this type on NuGet.org”
In ReSharper and Rider
Search for namespaces
& types that are not yet referenced
“Find this type on NuGet.org”
Idea in 2013, introduced in ReSharper 9
(2015 - https://www.jetbrains.com/resharper/whatsnew/whatsnew_9.html)
Consists of
ReSharper functionality
A service that indexes packages and powers search
Azure Cloud Service (Web and Worker role)
Indexer uses NuGet OData feed
https://www.nuget.org/api/v2/Packages?$select=Id,Version,NormalizedVersion,LastEdited,Published&$
orderby=LastEdited%20desc&$filter=LastEdited%20gt%20datetime%272012-01-01%27

Recommended for you

Monitoring with Graylog - a modern approach to monitoring?
Monitoring with Graylog - a modern approach to monitoring?Monitoring with Graylog - a modern approach to monitoring?
Monitoring with Graylog - a modern approach to monitoring?

Speaker: Christoph Petrausch Event: inovex Brownbag 06. November 2015 weiter Vorträge von inovex: https://www.inovex.de/de/content-pool/vortraege

monitoringgraylog
Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...
Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...
Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...

Warren Strange, Principal Systems Engineer, ForgeRock, presents a Breakout Session on the ELK Stack at the 2014 IRM Summit in Phoenix, Arizona.

identity relationship managementelk stackelasticsearch
Logstash
LogstashLogstash
Logstash

Logstash is a tool for managing logs that allows for input, filter, and output plugins to collect, parse, and deliver logs and log data. It works by treating logs as events that are passed through the input, filter, and output phases, with popular plugins including file, redis, grok, elasticsearch and more. The document also provides guidance on using Logstash in a clustered configuration with an agent and server model to optimize log collection, processing, and storage.

message-passingnginxperl
NuGet over time...
https://twitter.com/controlflow/status/1067724815958777856
NuGet over time...
Repo-signing announced August 10, 2018
Big chunk of packages signed
over holidays 2018/2019
Re-download all metadata & binaries
Very slow over OData
Is there a better way?
https://blog.nuget.org/20180810/Introducing-Repository-Signatures.html
NuGet server-side API
NuGet talks to a repository
Can be on disk/network share or remote over HTTP(S)
HTTP(S) API’s
V2 – OData based (used by pretty much all NuGet servers out there)
V3 – JSON based (NuGet.org, TeamCity, MyGet, Azure DevOps, GitHub repos)

Recommended for you

Logstash + Elasticsearch + Kibana Presentation on Startit Tech Meetup
Logstash + Elasticsearch + Kibana Presentation on Startit Tech MeetupLogstash + Elasticsearch + Kibana Presentation on Startit Tech Meetup
Logstash + Elasticsearch + Kibana Presentation on Startit Tech Meetup

1. Logstash is an open source tool for collecting, processing, and storing logs and other event data. It allows centralized collection and parsing of logs from various sources before sending them to Elasticsearch for storage and indexing. 2. Kibana provides visualization and search capabilities on top of the logs stored in Elasticsearch, allowing users to easily explore and analyze log data. 3. The combination of Logstash, Elasticsearch, and Kibana provides a replacement for commercial log management tools like Splunk, with the ability to collect, parse, store, search, and visualize logs from many different sources in a centralized way.

startitlogstashelastic search
From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...
From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...
From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...

This talk covers the basics of centralizing logs in Elasticsearch and all the strategies that make it scale with billions of documents in production. Topics include: - Time-based indices and index templates to efficiently slice your data - Different node tiers to de-couple reading from writing, heavy traffic from low traffic - Tuning various Elasticsearch and OS settings to maximize throughput and search performance - Configuring tools such as logstash and rsyslog to maximize throughput and minimize overhead

rafal kucradu gheorgheelasticsearch
Reactive Functional Programming with Java 8 on Android N
Reactive Functional Programming with Java 8 on Android NReactive Functional Programming with Java 8 on Android N
Reactive Functional Programming with Java 8 on Android N

The document discusses reactive functional programming with Java 8 on Android N. It introduces reactive programming concepts like Observables and Subscribers. It provides an example of using RxJava to find PNG images in a folder and load them into a gallery, as compared to the vanilla Java approach. It also demonstrates creating Observables, Subscribers, transforming streams, handling REST responses, and subscribing to streams. Specifically, it shows an example of clicking a button to get a user's followers from GitHub, get details on each follower, filter by company, and update the UI with results.

androidreactivefunctional
V2 Protocol
Started as “OData-to-LINQ-to-Entities” (V1 protocol)
Optimizations added to reduce # of random DB queries (VS2013+ & NuGet 2.x)
Search – Package manager list/search
FindPackagesById – Package restore (Does it exist? Where to download?)
GetUpdates – Package manager updates
https://www.nuget.org/api/v2 (code in https://github.com/NuGet/NuGetGallery)
V3 Protocol
JSON based
A “resource provider” of various endpoints per purpose
Catalog (NuGet.org only) – append-only event log
Registrations – materialization of newest state of a package
Flat container – .NET Core package restore (and VS autocompletion)
Report abuse URL template
Statistics
…
https://api.nuget.org/v3/index.json (code in https://github.com/NuGet/NuGet.Services.Metadata)
How does NuGet.org work?
User uploads to NuGet.org
Data added to database
Data added to catalog (append-only data stream)
Various jobs run over catalog using a cursor
Registrations (last state of a package/version), reference catalog entry
Flatcontainer (fast restores)
Search index (search, autocomplete, NuGet Gallery search)
…
Catalog seems interesting!
Append-only stream of mutations on NuGet.org
Updates (add/update) and Deletes
Chronological
Can continue where left off (uses a timestamp cursor)
Can restore NuGet.org to a given point in time
Structure
Root https://api.nuget.org/v3/catalog0/index.json
+ Page https://api.nuget.org/v3/catalog0/page0.json
+ Leaf https://api.nuget.org/v3/catalog0/data/2015.02.01.06.22.45/adam.jsgenerator.1.1.0.json

Recommended for you

Introduction to ELK
Introduction to ELKIntroduction to ELK
Introduction to ELK

This document introduces the ELK stack, which consists of Elasticsearch, Logstash, and Kibana. It provides instructions on setting up each component and using them together. Elasticsearch is a search engine that stores and searches data in JSON format. Logstash is an agent that collects logs from various sources, applies filters, and outputs to Elasticsearch. Kibana visualizes and explores the logs stored in Elasticsearch. The document demonstrates setting up each component and running a proof of concept to analyze sample log data.

elk_stack_alexander_szalonnas
elk_stack_alexander_szalonnaselk_stack_alexander_szalonnas
elk_stack_alexander_szalonnas

This document provides an overview of the ELK stack, including Logstash for collecting and parsing logs, Elasticsearch for indexing logs, and Kibana for visualizing logs. It discusses using the open source ELK stack as an alternative to Splunk and provides instructions for getting started with a basic ELK implementation.

Retrofit
RetrofitRetrofit
Retrofit

This document provides an overview of Retrofit, an open source library for Android and Java that allows making REST API calls in a simple and efficient manner. It discusses how to initialize Retrofit with an endpoint URL and adapter, define API methods using annotations, handle requests and responses both synchronously and asynchronously, and convert JSON responses to Java objects using Gson. Code samples are provided throughout to demonstrate common Retrofit tasks like making GET requests, handling API parameters and headers, and subscribing to asynchronous Observable responses.

api rest retrofit java android droidsonroids micha
NuGet.org catalog
demo
“Find this type on NuGet.org”
Refactor from using OData to using V3?
Mostly done, one thing missing: download counts (using search now)
https://github.com/NuGet/NuGetGallery/issues/3532
Build a new version?
Welcome to this talk 
Building a new version
What do we need?
Watch the NuGet.org catalog for package changes
For every package change
Scan all assemblies
Store relation between package id+version and namespace+type
API compatible with all ReSharper and Rider versions
Bonus points!
Easy way to re-index later (copy .nupkg binaries + dump index to JSON blobs)

Recommended for you

Elk
Elk Elk
Elk

This document discusses the ELK stack, which consists of Elasticsearch, Logstash, and Kibana. It provides an overview of each component, including that Elasticsearch is a search and analytics engine, Logstash is a data collection engine, and Kibana is a data visualization platform. The document then discusses setting up an ELK stack to index and visualize application logs.

LogStash in action
LogStash in actionLogStash in action
LogStash in action

LogStash is a tool for ingesting, processing, and storing data from various sources into Elasticsearch. It includes plugins for input, filter, and output functionality. Common uses of LogStash include parsing log files, enriching events, and loading data into Elasticsearch for search and analysis. The document provides an overview of LogStash and demonstrates how to install it, configure input and output plugins, and create simple and advanced processing pipelines.

logstashelkdevops
Experiences in ELK with D3.js for Large Log Analysis and Visualization
Experiences in ELK with D3.js  for Large Log Analysis  and VisualizationExperiences in ELK with D3.js  for Large Log Analysis  and Visualization
Experiences in ELK with D3.js for Large Log Analysis and Visualization

This document discusses experiences using the ELK stack (Elasticsearch, Logstash, Kibana) and D3.js for large log analysis and visualization. It begins with an overview of network traffic logging at Kasetsart University, which generates over 30 terabytes of log data per day. It then demonstrates setting up an ELK testbed to index these logs in real-time for fast search and exploration in Kibana. Finally, it shows how D3.js can be used to create dynamic, real-time visualizations of the logged data.

d3.jselkelasticsearch
What do we need?
Watch the NuGet.org catalog for package changes periodic check
For every package change based on a queue
Scan all assemblies
Store relation between package id+version and namespace+type
API compatible with all ReSharper and Rider versions always up, flexible scale
Bonus points!
Easy way to re-index later (copy .nupkg binaries + dump index to JSON blobs)
Sounds like functions?
Sounds like functions!
NuGet.org catalog Watch catalog
Index command
Find type API
Find namespace API
Search index
Index package
Raw .nupkg
Index as JSON
Download packageDownload command
Collecting from catalog
demo

Recommended for you

Mobile Analytics mit Elasticsearch und Kibana
Mobile Analytics mit Elasticsearch und KibanaMobile Analytics mit Elasticsearch und Kibana
Mobile Analytics mit Elasticsearch und Kibana

Speaker: Dominik Helleberg, inovex MobileTech Conference 2015 September 2015 Mehr Vorträge: https://www.inovex.de/de/content-pool/vortraege/

elasticsearchmobile analyticskibana
Logstash: Get to know your logs
Logstash: Get to know your logsLogstash: Get to know your logs
Logstash: Get to know your logs

This document discusses Logstash, an open source tool for collecting, parsing, and storing log files. It can ingest logs from various sources using inputs, apply filters to parse and transform log events, and output the structured data to destinations like Elasticsearch for search and analysis. The document provides an overview of Logstash's core functionality and components, demonstrates simple usage examples, and discusses integrating it with Kibana for visualizing and exploring log data. It also shares some lessons learned in production usage and points to additional resources.

web application developmentmobile application developmentruby on rails
Retrofit
RetrofitRetrofit
Retrofit

Retrofit is a type-safe REST client library for Android and Java that allows defining REST APIs as Java interfaces. It simplifies HTTP communication by converting remote APIs into declarative interfaces. It supports synchronous, asynchronous, and observable API consumption. The Retrofit library was created by Square.

thjugjavaandroid
Functions best practices
@PaulDJohnston https://medium.com/@PaulDJohnston/serverless-best-practices-b3c97d551535
Each function should do only one thing
Easier error handling & scaling
Learn to use messages and queues
Asynchronous means of communicating, helps scale and avoid direct coupling
...
Collecting from catalog
(better version)
demo
Bindings
Help a function do only one thing
Trigger, provide input/output
Function code bridges those
Build your own!*
SQL Server binding
Dropbox binding
...
NuGet Catalog
*Custom triggers not officially supported (yet?)
Trigger Input Output
Timer ✔
HTTP ✔ ✔
Blob ✔ ✔ ✔
Queue ✔ ✔
Table ✔ ✔
Service Bus ✔ ✔
EventHub ✔ ✔
EventGrid ✔
CosmosDB ✔ ✔ ✔
IoT Hub ✔
SendGrid, Twilio ✔
... ✔
Creating a trigger
binding
demo

Recommended for you

The Gradle in Ratpack: Dissected
The Gradle in Ratpack: DissectedThe Gradle in Ratpack: Dissected
The Gradle in Ratpack: Dissected

A case study of the usage of Gradle in the Ratpack web framework. First, we'll examine the Ratpack Gradle plugins, including their functionality, implementation, and testing. Next, we'll examine the build script for the Ratpack project itself. Here, we'll discuss various details of the project's build, including handling multiple projects, multiple types of testing, support for multiple styles of target hardware (developer workstations, cloud CI), and more. For each, we'll go over the desired behavior, how it was achieved, and why it was necessary.

open sourcegradletesting
Elastic - ELK, Logstash & Kibana
Elastic - ELK, Logstash & KibanaElastic - ELK, Logstash & Kibana
Elastic - ELK, Logstash & Kibana

ELK Stack workshop covers real-world use cases and works with the participants to - implement them. This includes Elastic overview, Logstash configuration, creation of dashboards in Kibana, guidelines and tips on processing custom log formats, designing a system to scale, choosing hardware, and managing the lifecycle of your logs.

elasticsearchkibanaelk
Indexing and searching NuGet.org with Azure Functions and Search - .NET fwday...
Indexing and searching NuGet.org with Azure Functions and Search - .NET fwday...Indexing and searching NuGet.org with Azure Functions and Search - .NET fwday...
Indexing and searching NuGet.org with Azure Functions and Search - .NET fwday...

Which NuGet package was that type in again? In this session, let's build a "reverse package search" that helps finding the correct NuGet package based on a public type. Together, we will create a highly-scalable serverless search engine using Azure Functions and Azure Search that performs 3 tasks: listening for new packages on NuGet.org (using a custom binding), indexing packages in a distributed way, and exposing an API that accepts queries and gives our clients the best result.

azure functionsazureserverless
We’re making progress!
NuGet.org catalog Watch catalog
Index command
Find type API
Find namespace API
Search index
Index package
Raw .nupkg
Index as JSON
Download packageDownload command
Downloading packages
demo
Next up: indexing
NuGet.org catalog Watch catalog
Index command
Find type API
Find namespace API
Search index
Index package
Raw .nupkg
Index as JSON
Download packageDownload command
Indexing
Opening up the .nupkg and reflecting on assemblies
System.Reflection.Metadata
Does not load the assembly being reflected into application process
Provides access to Portable Executable (PE) metadata in assembly
Store relation between package id+version and namespace+type
Azure Search? A database? Redis? Other?

Recommended for you

Maarten Balliauw "Indexing and searching NuGet.org with Azure Functions and S...
Maarten Balliauw "Indexing and searching NuGet.org with Azure Functions and S...Maarten Balliauw "Indexing and searching NuGet.org with Azure Functions and S...
Maarten Balliauw "Indexing and searching NuGet.org with Azure Functions and S...

Which NuGet package was that type in again? In this session, let's build a "reverse package search" that helps to find the correct NuGet package based on a public type name. Together, we will create a highly-scalable serverless search engine using Azure Functions and Azure Search that performs 3 tasks: listening for new packages on NuGet.org (using a custom binding), indexing packages in a distributed way, and exposing an API that accepts queries and gives our clients the best result. Expect code, insights into the service design process, and more!

fwdaysconferenceonline
.NET Conf 2019 - Indexing and searching NuGet.org with Azure Functions and Se...
.NET Conf 2019 - Indexing and searching NuGet.org with Azure Functions and Se....NET Conf 2019 - Indexing and searching NuGet.org with Azure Functions and Se...
.NET Conf 2019 - Indexing and searching NuGet.org with Azure Functions and Se...

Which NuGet package was that type in again? In this session, let's build a "reverse package search" that helps finding the correct NuGet package based on a public type. Together, we will create a highly-scalable serverless search engine using Azure Functions and Azure Search that performs 3 tasks: listening for new packages on NuGet.org (using a custom binding), indexing packages in a distributed way, and exposing an API that accepts queries and gives our clients the best result. https://blog.maartenballiauw.be/post/2019/07/30/indexing-searching-nuget-with-azure-functions-and-search.html

.net conf 2019.net conf.net
NuGet beyond Hello World - DotNext Piter 2017
NuGet beyond Hello World - DotNext Piter 2017NuGet beyond Hello World - DotNext Piter 2017
NuGet beyond Hello World - DotNext Piter 2017

Everybody is consuming NuGet packages these days. It’s easy, right? But how can we create and share our own packages? What is .NET Standard? How should we version, create, publish and share our package? Once we have those things covered, we’ll look beyond what everyone is doing. How can we use the NuGet client API to fetch data from NuGet? Can we build an application plugin system based on NuGet? What hidden gems are there in the NuGet server API? Can we create a full copy of NuGet.org? Good questions! In this talk, we will get them answered.

versioningapiclient
System.Reflection.Metadata Free decompiler
www.jetbrains.com/dotpeek
System.Reflection.Metadata
using (var portableExecutableReader = new PEReader(assemblySeekableStream))
{
var metadataReader = portableExecutableReader.GetMetadataReader();
foreach (var typeDefinition in metadataReader.TypeDefinitions.Select(metadataReader
.GetTypeDefinition))
{
if (!typeDefinition.Attributes.HasFlag(TypeAttributes.Public)) continue;
var typeNamespace = metadataReader.GetString(typeDefinition.Namespace);
var typeName = metadataReader.GetString(typeDefinition.Name);
if (typeName.StartsWith("<") || typeName.StartsWith("__Static") ||
typeName.Contains("c__DisplayClass")) continue;
typeNames.Add($"{typeNamespace}.{typeName}");
}
}
Azure Search
“Search-as-a-Service”
Scales across partitions and replicas
Define an index that will hold documents consisting of fields
Fields can be searchable, facetable, filterable, sortable, retrievable
Can’t be changed easily, think upfront!
Have to define what we want to search, and what we want to display
My function will also write documents to a JSON blob
Can re-index using Azure Search importer in case needed
Indexing packages
demo

Recommended for you

ConFoo - NuGet beyond Hello World
ConFoo - NuGet beyond Hello WorldConFoo - NuGet beyond Hello World
ConFoo - NuGet beyond Hello World

Everybody is consuming or producing NuGet packages these days. It’s easy, right? We’ll look beyond what everyone is doing. How can we use the NuGet client API to fetch data from NuGet? Can we build an application plugin system based on NuGet? What hidden gems are there in the NuGet server API? Can we create a full copy of NuGet.org?

nugetdotnetoctopus
Let's build Developer Portal with Backstage
Let's build Developer Portal with BackstageLet's build Developer Portal with Backstage
Let's build Developer Portal with Backstage

- What are Internal Developer Portal (IDP) and Platform Engineering? - What is Backstage? - How Backstage can help dev to build developer portal to make their job easier Jirayut Nimsaeng Founder & CEO Opsta (Thailand) Co., Ltd. Youtube Record: https://youtu.be/u_nLbgWDwsA?t=850 Dev Mountain Tech Festival @ Chiang Mai November 12, 2022

backstagedevopsdevsecops
Release 8.1 - Breakfast Paris
Release 8.1 - Breakfast ParisRelease 8.1 - Breakfast Paris
Release 8.1 - Breakfast Paris

This document summarizes Nuxeo's Release 8.1 including new tools for launching performance tests on Nuxeo clusters, an instant share feature for temporarily granting access without account creation, Live Connect integration for Box file sharing, and expanded Elasticsearch integration. It also discusses Nuxeo Docker images, a Nuxeo code generator, a Polymer sample app, updated REST and automation clients, and upcoming branch management features.

nuxeo fast track
“Do one thing well”
Our function shouldn’t care about creating a search index.
Better: return index operations, have something else handle those
Custom output binding?
Indexing packages
(better version)
demo
Almost there…
NuGet.org catalog Watch catalog
Index command
Find type API
Find namespace API
Search index
Index package
Raw .nupkg
Index as JSON
Download packageDownload command
HTTP trigger binding
[HttpTrigger(AuthorizationLevel.Anonymous,
"get", Route = "v1/find-type")] HttpRequest request
Options for trigger
Authentication (anonymous, a function/host key, a user token)
HTTP method
What the route looks like

Recommended for you

Django deployment with PaaS
Django deployment with PaaSDjango deployment with PaaS
Django deployment with PaaS

This presentation was given at the Boston Django meetup on November 16, and surveyed several leading PaaS providers including Stackato, Dotcloud, OpenShift and Heroku. For each PaaS provider, I documented the steps necessary to deploy Mezzanine, a popular Django-based CMS and blogging platform. At the end of the presentation, I do a wrap-up of the different providers and provide a comparison matrix showing which providers have which features. This matrix is likely to go out-of-date quickly because these providers are adding new features all the time.

stackatocloudfoundryplatform as a service
OGCE Project Overview
OGCE Project OverviewOGCE Project Overview
OGCE Project Overview

The document provides an overview of OGCE (Open Grid Computing Environment), which develops and packages reusable software components for science portals. Key components described include services, gadgets, tags, and how they fit together. Installation and usage of the various OGCE components is discussed at a high level.

portalgatewayjava
Distributed Tracing
Distributed TracingDistributed Tracing
Distributed Tracing

With distributed tracing, we can track requests as they pass through multiple services, emitting timing and other metadata throughout, and this information can then be reassembled to provide a complete picture of the application’s behavior at runtime - Read more in https://blog.buoyant.io/2016/05/17/distributed-tracing-for-polyglot-microservices/ and https://www.rookout.com/

distributed tracingwhat is distributed tracingapplication’s behavior at runtime
Making search work
with ReSharper and Rider
demo
One issue left...
Download counts - used for sorting and scoring search results
Change continuously on NuGet
Not part of V3 catalog
Could use search but that’s N(packages) queries
https://github.com/NuGet/NuGetGallery/issues/3532
If that data existed, how to update search?
Merge data! new PackageDocumentDownloads(key, downloadcount)
We’re done!
NuGet.org catalog Watch catalog
Index command
Find type API
Find namespace API
Search index
Index package
Raw .nupkg
Index as JSON
Download packageDownload command
We’re done!
Functions
Collect changes from NuGet catalog
Download binaries
Index binaries using PE Header
Make search index available in API
Trigger, input and output bindings
Each function should do only one thing
NuGet.org catalog Watch catalog
Index command
Find type API
Find namespace API
Search index
Index package
Raw .nupkg
Index as JSON
Download packageDownload command

Recommended for you

China Science Challenge
China Science ChallengeChina Science Challenge
China Science Challenge

This document discusses Elsevier's SciVerse platform and developer network. It introduces SciVerse as a social network for scientific search and content that uses OpenSocial standards. It describes how SciVerse extends Apache Shindig to make apps contextual. It also discusses SciVerse's framework and content APIs that allow apps to access scientific content and metadata. Finally, it provides examples of object-oriented JavaScript coding and using the APIs to build mashups with third-party services.

opensocial
SgCodeJam24 Workshop
SgCodeJam24 WorkshopSgCodeJam24 Workshop
SgCodeJam24 Workshop

This document discusses Elsevier's SciVerse platform and developer network. It introduces SciVerse as a social network for scientific search and content that uses OpenSocial standards. It describes how SciVerse extends Apache Shindig to make apps contextual. It also discusses SciVerse's framework and content APIs that allow apps to access scientific content and metadata. Finally, it provides examples of object-oriented JavaScript coding and using the APIs to build mashups with third-party services.

opensocial codejam hackathon workshop sciverse els
Harnessing the power of Nutch with Scala
Harnessing the power of Nutch with ScalaHarnessing the power of Nutch with Scala
Harnessing the power of Nutch with Scala

This document discusses using Nutch, an open source web crawler, with Scala. It provides an overview of Nutch's architecture and how plugins can be written in Scala to extend its functionality. As an example, it describes how Scala was used to build a plugin for an aggregator application that crawls multiple suppliers, parses content to extract details, and passes this data to an actor for processing. The solution was able to crawl 5 suppliers and collect over 500k records using Nutch and 823 lines of Scala code.

pluginscalanutch
We’re done!
All our functions can scale (and fail)
independently
Full index in May 2019 took ~12h on 2 B1 instances
Can be faster on more CPU’s
~ 1.7mio packages (NuGet.org homepage says)
~ 2.1mio packages (the catalog says )
~ 8 400 catalog pages
with ~ 4 200 000 catalog leaves
(hint: repo signing)
NuGet.org catalog Watch catalog
Index command
Find type API
Find namespace API
Search index
Index package
Raw .nupkg
Index as JSON
Download packageDownload command
Closing thoughts…
Would deploy in separate function apps for cost
Trigger binding collects all the time so needs dedicated capacity (and thus, cost)
Others can scale within bounds (think of $$$)
Would deploy in separate function apps for failure boundaries
Trigger, indexing, downloading should not affect health of API
Are bindings portable...?
Avoid them if (framework) lock-in matters to you
They áre nice in terms of programming model…
Thank you!
https://blog.maartenballiauw.be
@maartenballiauw

More Related Content

What's hot

Elk stack
Elk stackElk stack
Elk stack
Jilles van Gurp
 
Logstash-Elasticsearch-Kibana
Logstash-Elasticsearch-KibanaLogstash-Elasticsearch-Kibana
Logstash-Elasticsearch-Kibana
dknx01
 
Machine Learning in a Twitter ETL using ELK
Machine Learning in a Twitter ETL using ELK Machine Learning in a Twitter ETL using ELK
Machine Learning in a Twitter ETL using ELK
hypto
 
Monitoring with Graylog - a modern approach to monitoring?
Monitoring with Graylog - a modern approach to monitoring?Monitoring with Graylog - a modern approach to monitoring?
Monitoring with Graylog - a modern approach to monitoring?
inovex GmbH
 
Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...
Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...
Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...
ForgeRock
 
Logstash
LogstashLogstash
Logstash
琛琳 饶
 
Logstash + Elasticsearch + Kibana Presentation on Startit Tech Meetup
Logstash + Elasticsearch + Kibana Presentation on Startit Tech MeetupLogstash + Elasticsearch + Kibana Presentation on Startit Tech Meetup
Logstash + Elasticsearch + Kibana Presentation on Startit Tech Meetup
Startit
 
From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...
From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...
From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...
Sematext Group, Inc.
 
Reactive Functional Programming with Java 8 on Android N
Reactive Functional Programming with Java 8 on Android NReactive Functional Programming with Java 8 on Android N
Reactive Functional Programming with Java 8 on Android N
Shipeng Xu
 
Introduction to ELK
Introduction to ELKIntroduction to ELK
Introduction to ELK
Harshakumar Ummerpillai
 
elk_stack_alexander_szalonnas
elk_stack_alexander_szalonnaselk_stack_alexander_szalonnas
elk_stack_alexander_szalonnas
Alexander Szalonnas
 
Retrofit
RetrofitRetrofit
Retrofit
bresiu
 
Elk
Elk Elk
LogStash in action
LogStash in actionLogStash in action
LogStash in action
Manuj Aggarwal
 
Experiences in ELK with D3.js for Large Log Analysis and Visualization
Experiences in ELK with D3.js  for Large Log Analysis  and VisualizationExperiences in ELK with D3.js  for Large Log Analysis  and Visualization
Experiences in ELK with D3.js for Large Log Analysis and Visualization
Surasak Sanguanpong
 
Mobile Analytics mit Elasticsearch und Kibana
Mobile Analytics mit Elasticsearch und KibanaMobile Analytics mit Elasticsearch und Kibana
Mobile Analytics mit Elasticsearch und Kibana
inovex GmbH
 
Logstash: Get to know your logs
Logstash: Get to know your logsLogstash: Get to know your logs
Logstash: Get to know your logs
SmartLogic
 
Retrofit
RetrofitRetrofit
Retrofit
Amin Cheloh
 
The Gradle in Ratpack: Dissected
The Gradle in Ratpack: DissectedThe Gradle in Ratpack: Dissected
The Gradle in Ratpack: Dissected
David Carr
 
Elastic - ELK, Logstash & Kibana
Elastic - ELK, Logstash & KibanaElastic - ELK, Logstash & Kibana
Elastic - ELK, Logstash & Kibana
SpringPeople
 

What's hot (20)

Elk stack
Elk stackElk stack
Elk stack
 
Logstash-Elasticsearch-Kibana
Logstash-Elasticsearch-KibanaLogstash-Elasticsearch-Kibana
Logstash-Elasticsearch-Kibana
 
Machine Learning in a Twitter ETL using ELK
Machine Learning in a Twitter ETL using ELK Machine Learning in a Twitter ETL using ELK
Machine Learning in a Twitter ETL using ELK
 
Monitoring with Graylog - a modern approach to monitoring?
Monitoring with Graylog - a modern approach to monitoring?Monitoring with Graylog - a modern approach to monitoring?
Monitoring with Graylog - a modern approach to monitoring?
 
Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...
Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...
Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...
 
Logstash
LogstashLogstash
Logstash
 
Logstash + Elasticsearch + Kibana Presentation on Startit Tech Meetup
Logstash + Elasticsearch + Kibana Presentation on Startit Tech MeetupLogstash + Elasticsearch + Kibana Presentation on Startit Tech Meetup
Logstash + Elasticsearch + Kibana Presentation on Startit Tech Meetup
 
From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...
From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...
From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...
 
Reactive Functional Programming with Java 8 on Android N
Reactive Functional Programming with Java 8 on Android NReactive Functional Programming with Java 8 on Android N
Reactive Functional Programming with Java 8 on Android N
 
Introduction to ELK
Introduction to ELKIntroduction to ELK
Introduction to ELK
 
elk_stack_alexander_szalonnas
elk_stack_alexander_szalonnaselk_stack_alexander_szalonnas
elk_stack_alexander_szalonnas
 
Retrofit
RetrofitRetrofit
Retrofit
 
Elk
Elk Elk
Elk
 
LogStash in action
LogStash in actionLogStash in action
LogStash in action
 
Experiences in ELK with D3.js for Large Log Analysis and Visualization
Experiences in ELK with D3.js  for Large Log Analysis  and VisualizationExperiences in ELK with D3.js  for Large Log Analysis  and Visualization
Experiences in ELK with D3.js for Large Log Analysis and Visualization
 
Mobile Analytics mit Elasticsearch und Kibana
Mobile Analytics mit Elasticsearch und KibanaMobile Analytics mit Elasticsearch und Kibana
Mobile Analytics mit Elasticsearch und Kibana
 
Logstash: Get to know your logs
Logstash: Get to know your logsLogstash: Get to know your logs
Logstash: Get to know your logs
 
Retrofit
RetrofitRetrofit
Retrofit
 
The Gradle in Ratpack: Dissected
The Gradle in Ratpack: DissectedThe Gradle in Ratpack: Dissected
The Gradle in Ratpack: Dissected
 
Elastic - ELK, Logstash & Kibana
Elastic - ELK, Logstash & KibanaElastic - ELK, Logstash & Kibana
Elastic - ELK, Logstash & Kibana
 

Similar to CloudBurst 2019 - Indexing and searching NuGet.org with Azure Functions and Search

Indexing and searching NuGet.org with Azure Functions and Search - .NET fwday...
Indexing and searching NuGet.org with Azure Functions and Search - .NET fwday...Indexing and searching NuGet.org with Azure Functions and Search - .NET fwday...
Indexing and searching NuGet.org with Azure Functions and Search - .NET fwday...
Maarten Balliauw
 
Maarten Balliauw "Indexing and searching NuGet.org with Azure Functions and S...
Maarten Balliauw "Indexing and searching NuGet.org with Azure Functions and S...Maarten Balliauw "Indexing and searching NuGet.org with Azure Functions and S...
Maarten Balliauw "Indexing and searching NuGet.org with Azure Functions and S...
Fwdays
 
.NET Conf 2019 - Indexing and searching NuGet.org with Azure Functions and Se...
.NET Conf 2019 - Indexing and searching NuGet.org with Azure Functions and Se....NET Conf 2019 - Indexing and searching NuGet.org with Azure Functions and Se...
.NET Conf 2019 - Indexing and searching NuGet.org with Azure Functions and Se...
Maarten Balliauw
 
NuGet beyond Hello World - DotNext Piter 2017
NuGet beyond Hello World - DotNext Piter 2017NuGet beyond Hello World - DotNext Piter 2017
NuGet beyond Hello World - DotNext Piter 2017
Maarten Balliauw
 
ConFoo - NuGet beyond Hello World
ConFoo - NuGet beyond Hello WorldConFoo - NuGet beyond Hello World
ConFoo - NuGet beyond Hello World
Maarten Balliauw
 
Let's build Developer Portal with Backstage
Let's build Developer Portal with BackstageLet's build Developer Portal with Backstage
Let's build Developer Portal with Backstage
Opsta
 
Release 8.1 - Breakfast Paris
Release 8.1 - Breakfast ParisRelease 8.1 - Breakfast Paris
Release 8.1 - Breakfast Paris
Nuxeo
 
Django deployment with PaaS
Django deployment with PaaSDjango deployment with PaaS
Django deployment with PaaS
Appsembler
 
OGCE Project Overview
OGCE Project OverviewOGCE Project Overview
OGCE Project Overview
marpierc
 
Distributed Tracing
Distributed TracingDistributed Tracing
Distributed Tracing
distributedtracing
 
China Science Challenge
China Science ChallengeChina Science Challenge
China Science Challenge
remko caprio
 
SgCodeJam24 Workshop
SgCodeJam24 WorkshopSgCodeJam24 Workshop
SgCodeJam24 Workshop
remko caprio
 
Harnessing the power of Nutch with Scala
Harnessing the power of Nutch with ScalaHarnessing the power of Nutch with Scala
Harnessing the power of Nutch with Scala
Knoldus Inc.
 
Beyond Wordcount with spark datasets (and scalaing) - Nide PDX Jan 2018
Beyond Wordcount  with spark datasets (and scalaing) - Nide PDX Jan 2018Beyond Wordcount  with spark datasets (and scalaing) - Nide PDX Jan 2018
Beyond Wordcount with spark datasets (and scalaing) - Nide PDX Jan 2018
Holden Karau
 
Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)
Julian Hyde
 
Distributed tracing 101
Distributed tracing 101Distributed tracing 101
Distributed tracing 101
Itiel Shwartz
 
Plugin-based software design with Ruby and RubyGems
Plugin-based software design with Ruby and RubyGemsPlugin-based software design with Ruby and RubyGems
Plugin-based software design with Ruby and RubyGems
Sadayuki Furuhashi
 
Examiness hints and tips from the trenches
Examiness hints and tips from the trenchesExaminess hints and tips from the trenches
Examiness hints and tips from the trenches
Ismail Mayat
 
Large Scale Crawling with Apache Nutch and Friends
Large Scale Crawling with Apache Nutch and FriendsLarge Scale Crawling with Apache Nutch and Friends
Large Scale Crawling with Apache Nutch and Friends
lucenerevolution
 
Large Scale Crawling with Apache Nutch and Friends
Large Scale Crawling with Apache Nutch and FriendsLarge Scale Crawling with Apache Nutch and Friends
Large Scale Crawling with Apache Nutch and Friends
Julien Nioche
 

Similar to CloudBurst 2019 - Indexing and searching NuGet.org with Azure Functions and Search (20)

Indexing and searching NuGet.org with Azure Functions and Search - .NET fwday...
Indexing and searching NuGet.org with Azure Functions and Search - .NET fwday...Indexing and searching NuGet.org with Azure Functions and Search - .NET fwday...
Indexing and searching NuGet.org with Azure Functions and Search - .NET fwday...
 
Maarten Balliauw "Indexing and searching NuGet.org with Azure Functions and S...
Maarten Balliauw "Indexing and searching NuGet.org with Azure Functions and S...Maarten Balliauw "Indexing and searching NuGet.org with Azure Functions and S...
Maarten Balliauw "Indexing and searching NuGet.org with Azure Functions and S...
 
.NET Conf 2019 - Indexing and searching NuGet.org with Azure Functions and Se...
.NET Conf 2019 - Indexing and searching NuGet.org with Azure Functions and Se....NET Conf 2019 - Indexing and searching NuGet.org with Azure Functions and Se...
.NET Conf 2019 - Indexing and searching NuGet.org with Azure Functions and Se...
 
NuGet beyond Hello World - DotNext Piter 2017
NuGet beyond Hello World - DotNext Piter 2017NuGet beyond Hello World - DotNext Piter 2017
NuGet beyond Hello World - DotNext Piter 2017
 
ConFoo - NuGet beyond Hello World
ConFoo - NuGet beyond Hello WorldConFoo - NuGet beyond Hello World
ConFoo - NuGet beyond Hello World
 
Let's build Developer Portal with Backstage
Let's build Developer Portal with BackstageLet's build Developer Portal with Backstage
Let's build Developer Portal with Backstage
 
Release 8.1 - Breakfast Paris
Release 8.1 - Breakfast ParisRelease 8.1 - Breakfast Paris
Release 8.1 - Breakfast Paris
 
Django deployment with PaaS
Django deployment with PaaSDjango deployment with PaaS
Django deployment with PaaS
 
OGCE Project Overview
OGCE Project OverviewOGCE Project Overview
OGCE Project Overview
 
Distributed Tracing
Distributed TracingDistributed Tracing
Distributed Tracing
 
China Science Challenge
China Science ChallengeChina Science Challenge
China Science Challenge
 
SgCodeJam24 Workshop
SgCodeJam24 WorkshopSgCodeJam24 Workshop
SgCodeJam24 Workshop
 
Harnessing the power of Nutch with Scala
Harnessing the power of Nutch with ScalaHarnessing the power of Nutch with Scala
Harnessing the power of Nutch with Scala
 
Beyond Wordcount with spark datasets (and scalaing) - Nide PDX Jan 2018
Beyond Wordcount  with spark datasets (and scalaing) - Nide PDX Jan 2018Beyond Wordcount  with spark datasets (and scalaing) - Nide PDX Jan 2018
Beyond Wordcount with spark datasets (and scalaing) - Nide PDX Jan 2018
 
Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)
 
Distributed tracing 101
Distributed tracing 101Distributed tracing 101
Distributed tracing 101
 
Plugin-based software design with Ruby and RubyGems
Plugin-based software design with Ruby and RubyGemsPlugin-based software design with Ruby and RubyGems
Plugin-based software design with Ruby and RubyGems
 
Examiness hints and tips from the trenches
Examiness hints and tips from the trenchesExaminess hints and tips from the trenches
Examiness hints and tips from the trenches
 
Large Scale Crawling with Apache Nutch and Friends
Large Scale Crawling with Apache Nutch and FriendsLarge Scale Crawling with Apache Nutch and Friends
Large Scale Crawling with Apache Nutch and Friends
 
Large Scale Crawling with Apache Nutch and Friends
Large Scale Crawling with Apache Nutch and FriendsLarge Scale Crawling with Apache Nutch and Friends
Large Scale Crawling with Apache Nutch and Friends
 

More from Maarten Balliauw

Bringing nullability into existing code - dammit is not the answer.pptx
Bringing nullability into existing code - dammit is not the answer.pptxBringing nullability into existing code - dammit is not the answer.pptx
Bringing nullability into existing code - dammit is not the answer.pptx
Maarten Balliauw
 
Nerd sniping myself into a rabbit hole... Streaming online audio to a Sonos s...
Nerd sniping myself into a rabbit hole... Streaming online audio to a Sonos s...Nerd sniping myself into a rabbit hole... Streaming online audio to a Sonos s...
Nerd sniping myself into a rabbit hole... Streaming online audio to a Sonos s...
Maarten Balliauw
 
Building a friendly .NET SDK to connect to Space
Building a friendly .NET SDK to connect to SpaceBuilding a friendly .NET SDK to connect to Space
Building a friendly .NET SDK to connect to Space
Maarten Balliauw
 
Microservices for building an IDE - The innards of JetBrains Rider - NDC Oslo...
Microservices for building an IDE - The innards of JetBrains Rider - NDC Oslo...Microservices for building an IDE - The innards of JetBrains Rider - NDC Oslo...
Microservices for building an IDE - The innards of JetBrains Rider - NDC Oslo...
Maarten Balliauw
 
NDC Sydney 2019 - Microservices for building an IDE – The innards of JetBrain...
NDC Sydney 2019 - Microservices for building an IDE – The innards of JetBrain...NDC Sydney 2019 - Microservices for building an IDE – The innards of JetBrain...
NDC Sydney 2019 - Microservices for building an IDE – The innards of JetBrain...
Maarten Balliauw
 
JetBrains Australia 2019 - Exploring .NET’s memory management – a trip down m...
JetBrains Australia 2019 - Exploring .NET’s memory management – a trip down m...JetBrains Australia 2019 - Exploring .NET’s memory management – a trip down m...
JetBrains Australia 2019 - Exploring .NET’s memory management – a trip down m...
Maarten Balliauw
 
Approaches for application request throttling - Cloud Developer Days Poland
Approaches for application request throttling - Cloud Developer Days PolandApproaches for application request throttling - Cloud Developer Days Poland
Approaches for application request throttling - Cloud Developer Days Poland
Maarten Balliauw
 
Approaches for application request throttling - dotNetCologne
Approaches for application request throttling - dotNetCologneApproaches for application request throttling - dotNetCologne
Approaches for application request throttling - dotNetCologne
Maarten Balliauw
 
CodeStock - Exploring .NET memory management - a trip down memory lane
CodeStock - Exploring .NET memory management - a trip down memory laneCodeStock - Exploring .NET memory management - a trip down memory lane
CodeStock - Exploring .NET memory management - a trip down memory lane
Maarten Balliauw
 
ConFoo Montreal - Microservices for building an IDE - The innards of JetBrain...
ConFoo Montreal - Microservices for building an IDE - The innards of JetBrain...ConFoo Montreal - Microservices for building an IDE - The innards of JetBrain...
ConFoo Montreal - Microservices for building an IDE - The innards of JetBrain...
Maarten Balliauw
 
ConFoo Montreal - Approaches for application request throttling
ConFoo Montreal - Approaches for application request throttlingConFoo Montreal - Approaches for application request throttling
ConFoo Montreal - Approaches for application request throttling
Maarten Balliauw
 
Microservices for building an IDE – The innards of JetBrains Rider - TechDays...
Microservices for building an IDE – The innards of JetBrains Rider - TechDays...Microservices for building an IDE – The innards of JetBrains Rider - TechDays...
Microservices for building an IDE – The innards of JetBrains Rider - TechDays...
Maarten Balliauw
 
JetBrains Day Seoul - Exploring .NET’s memory management – a trip down memory...
JetBrains Day Seoul - Exploring .NET’s memory management – a trip down memory...JetBrains Day Seoul - Exploring .NET’s memory management – a trip down memory...
JetBrains Day Seoul - Exploring .NET’s memory management – a trip down memory...
Maarten Balliauw
 
DotNetFest - Let’s refresh our memory! Memory management in .NET
DotNetFest - Let’s refresh our memory! Memory management in .NETDotNetFest - Let’s refresh our memory! Memory management in .NET
DotNetFest - Let’s refresh our memory! Memory management in .NET
Maarten Balliauw
 
VISUG - Approaches for application request throttling
VISUG - Approaches for application request throttlingVISUG - Approaches for application request throttling
VISUG - Approaches for application request throttling
Maarten Balliauw
 
What is going on - Application diagnostics on Azure - TechDays Finland
What is going on - Application diagnostics on Azure - TechDays FinlandWhat is going on - Application diagnostics on Azure - TechDays Finland
What is going on - Application diagnostics on Azure - TechDays Finland
Maarten Balliauw
 
ConFoo - Exploring .NET’s memory management – a trip down memory lane
ConFoo - Exploring .NET’s memory management – a trip down memory laneConFoo - Exploring .NET’s memory management – a trip down memory lane
ConFoo - Exploring .NET’s memory management – a trip down memory lane
Maarten Balliauw
 
Approaches to application request throttling
Approaches to application request throttlingApproaches to application request throttling
Approaches to application request throttling
Maarten Balliauw
 
Exploring .NET memory management (iSense)
Exploring .NET memory management (iSense)Exploring .NET memory management (iSense)
Exploring .NET memory management (iSense)
Maarten Balliauw
 
Exploring .NET memory management - JetBrains webinar
Exploring .NET memory management - JetBrains webinarExploring .NET memory management - JetBrains webinar
Exploring .NET memory management - JetBrains webinar
Maarten Balliauw
 

More from Maarten Balliauw (20)

Bringing nullability into existing code - dammit is not the answer.pptx
Bringing nullability into existing code - dammit is not the answer.pptxBringing nullability into existing code - dammit is not the answer.pptx
Bringing nullability into existing code - dammit is not the answer.pptx
 
Nerd sniping myself into a rabbit hole... Streaming online audio to a Sonos s...
Nerd sniping myself into a rabbit hole... Streaming online audio to a Sonos s...Nerd sniping myself into a rabbit hole... Streaming online audio to a Sonos s...
Nerd sniping myself into a rabbit hole... Streaming online audio to a Sonos s...
 
Building a friendly .NET SDK to connect to Space
Building a friendly .NET SDK to connect to SpaceBuilding a friendly .NET SDK to connect to Space
Building a friendly .NET SDK to connect to Space
 
Microservices for building an IDE - The innards of JetBrains Rider - NDC Oslo...
Microservices for building an IDE - The innards of JetBrains Rider - NDC Oslo...Microservices for building an IDE - The innards of JetBrains Rider - NDC Oslo...
Microservices for building an IDE - The innards of JetBrains Rider - NDC Oslo...
 
NDC Sydney 2019 - Microservices for building an IDE – The innards of JetBrain...
NDC Sydney 2019 - Microservices for building an IDE – The innards of JetBrain...NDC Sydney 2019 - Microservices for building an IDE – The innards of JetBrain...
NDC Sydney 2019 - Microservices for building an IDE – The innards of JetBrain...
 
JetBrains Australia 2019 - Exploring .NET’s memory management – a trip down m...
JetBrains Australia 2019 - Exploring .NET’s memory management – a trip down m...JetBrains Australia 2019 - Exploring .NET’s memory management – a trip down m...
JetBrains Australia 2019 - Exploring .NET’s memory management – a trip down m...
 
Approaches for application request throttling - Cloud Developer Days Poland
Approaches for application request throttling - Cloud Developer Days PolandApproaches for application request throttling - Cloud Developer Days Poland
Approaches for application request throttling - Cloud Developer Days Poland
 
Approaches for application request throttling - dotNetCologne
Approaches for application request throttling - dotNetCologneApproaches for application request throttling - dotNetCologne
Approaches for application request throttling - dotNetCologne
 
CodeStock - Exploring .NET memory management - a trip down memory lane
CodeStock - Exploring .NET memory management - a trip down memory laneCodeStock - Exploring .NET memory management - a trip down memory lane
CodeStock - Exploring .NET memory management - a trip down memory lane
 
ConFoo Montreal - Microservices for building an IDE - The innards of JetBrain...
ConFoo Montreal - Microservices for building an IDE - The innards of JetBrain...ConFoo Montreal - Microservices for building an IDE - The innards of JetBrain...
ConFoo Montreal - Microservices for building an IDE - The innards of JetBrain...
 
ConFoo Montreal - Approaches for application request throttling
ConFoo Montreal - Approaches for application request throttlingConFoo Montreal - Approaches for application request throttling
ConFoo Montreal - Approaches for application request throttling
 
Microservices for building an IDE – The innards of JetBrains Rider - TechDays...
Microservices for building an IDE – The innards of JetBrains Rider - TechDays...Microservices for building an IDE – The innards of JetBrains Rider - TechDays...
Microservices for building an IDE – The innards of JetBrains Rider - TechDays...
 
JetBrains Day Seoul - Exploring .NET’s memory management – a trip down memory...
JetBrains Day Seoul - Exploring .NET’s memory management – a trip down memory...JetBrains Day Seoul - Exploring .NET’s memory management – a trip down memory...
JetBrains Day Seoul - Exploring .NET’s memory management – a trip down memory...
 
DotNetFest - Let’s refresh our memory! Memory management in .NET
DotNetFest - Let’s refresh our memory! Memory management in .NETDotNetFest - Let’s refresh our memory! Memory management in .NET
DotNetFest - Let’s refresh our memory! Memory management in .NET
 
VISUG - Approaches for application request throttling
VISUG - Approaches for application request throttlingVISUG - Approaches for application request throttling
VISUG - Approaches for application request throttling
 
What is going on - Application diagnostics on Azure - TechDays Finland
What is going on - Application diagnostics on Azure - TechDays FinlandWhat is going on - Application diagnostics on Azure - TechDays Finland
What is going on - Application diagnostics on Azure - TechDays Finland
 
ConFoo - Exploring .NET’s memory management – a trip down memory lane
ConFoo - Exploring .NET’s memory management – a trip down memory laneConFoo - Exploring .NET’s memory management – a trip down memory lane
ConFoo - Exploring .NET’s memory management – a trip down memory lane
 
Approaches to application request throttling
Approaches to application request throttlingApproaches to application request throttling
Approaches to application request throttling
 
Exploring .NET memory management (iSense)
Exploring .NET memory management (iSense)Exploring .NET memory management (iSense)
Exploring .NET memory management (iSense)
 
Exploring .NET memory management - JetBrains webinar
Exploring .NET memory management - JetBrains webinarExploring .NET memory management - JetBrains webinar
Exploring .NET memory management - JetBrains webinar
 

Recently uploaded

Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Chris Swan
 
Advanced Techniques for Cyber Security Analysis and Anomaly Detection
Advanced Techniques for Cyber Security Analysis and Anomaly DetectionAdvanced Techniques for Cyber Security Analysis and Anomaly Detection
Advanced Techniques for Cyber Security Analysis and Anomaly Detection
Bert Blevins
 
Best Programming Language for Civil Engineers
Best Programming Language for Civil EngineersBest Programming Language for Civil Engineers
Best Programming Language for Civil Engineers
Awais Yaseen
 
Coordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar SlidesCoordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar Slides
Safe Software
 
Observability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetryObservability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetry
Eric D. Schabell
 
DealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 editionDealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 edition
Yevgen Sysoyev
 
20240705 QFM024 Irresponsible AI Reading List June 2024
20240705 QFM024 Irresponsible AI Reading List June 202420240705 QFM024 Irresponsible AI Reading List June 2024
20240705 QFM024 Irresponsible AI Reading List June 2024
Matthew Sinclair
 
The Rise of Supernetwork Data Intensive Computing
The Rise of Supernetwork Data Intensive ComputingThe Rise of Supernetwork Data Intensive Computing
The Rise of Supernetwork Data Intensive Computing
Larry Smarr
 
Quality Patents: Patents That Stand the Test of Time
Quality Patents: Patents That Stand the Test of TimeQuality Patents: Patents That Stand the Test of Time
Quality Patents: Patents That Stand the Test of Time
Aurora Consulting
 
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Bert Blevins
 
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALLBLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
Liveplex
 
Calgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptxCalgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptx
ishalveerrandhawa1
 
Manual | Product | Research Presentation
Manual | Product | Research PresentationManual | Product | Research Presentation
Manual | Product | Research Presentation
welrejdoall
 
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Erasmo Purificato
 
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-InTrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc
 
Password Rotation in 2024 is still Relevant
Password Rotation in 2024 is still RelevantPassword Rotation in 2024 is still Relevant
Password Rotation in 2024 is still Relevant
Bert Blevins
 
How to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptxHow to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptx
Adam Dunkels
 
UiPath Community Day Kraków: Devs4Devs Conference
UiPath Community Day Kraków: Devs4Devs ConferenceUiPath Community Day Kraków: Devs4Devs Conference
UiPath Community Day Kraków: Devs4Devs Conference
UiPathCommunity
 
Measuring the Impact of Network Latency at Twitter
Measuring the Impact of Network Latency at TwitterMeasuring the Impact of Network Latency at Twitter
Measuring the Impact of Network Latency at Twitter
ScyllaDB
 
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyyActive Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
RaminGhanbari2
 

Recently uploaded (20)

Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
 
Advanced Techniques for Cyber Security Analysis and Anomaly Detection
Advanced Techniques for Cyber Security Analysis and Anomaly DetectionAdvanced Techniques for Cyber Security Analysis and Anomaly Detection
Advanced Techniques for Cyber Security Analysis and Anomaly Detection
 
Best Programming Language for Civil Engineers
Best Programming Language for Civil EngineersBest Programming Language for Civil Engineers
Best Programming Language for Civil Engineers
 
Coordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar SlidesCoordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar Slides
 
Observability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetryObservability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetry
 
DealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 editionDealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 edition
 
20240705 QFM024 Irresponsible AI Reading List June 2024
20240705 QFM024 Irresponsible AI Reading List June 202420240705 QFM024 Irresponsible AI Reading List June 2024
20240705 QFM024 Irresponsible AI Reading List June 2024
 
The Rise of Supernetwork Data Intensive Computing
The Rise of Supernetwork Data Intensive ComputingThe Rise of Supernetwork Data Intensive Computing
The Rise of Supernetwork Data Intensive Computing
 
Quality Patents: Patents That Stand the Test of Time
Quality Patents: Patents That Stand the Test of TimeQuality Patents: Patents That Stand the Test of Time
Quality Patents: Patents That Stand the Test of Time
 
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
 
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALLBLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
 
Calgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptxCalgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptx
 
Manual | Product | Research Presentation
Manual | Product | Research PresentationManual | Product | Research Presentation
Manual | Product | Research Presentation
 
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
 
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-InTrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
 
Password Rotation in 2024 is still Relevant
Password Rotation in 2024 is still RelevantPassword Rotation in 2024 is still Relevant
Password Rotation in 2024 is still Relevant
 
How to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptxHow to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptx
 
UiPath Community Day Kraków: Devs4Devs Conference
UiPath Community Day Kraków: Devs4Devs ConferenceUiPath Community Day Kraków: Devs4Devs Conference
UiPath Community Day Kraków: Devs4Devs Conference
 
Measuring the Impact of Network Latency at Twitter
Measuring the Impact of Network Latency at TwitterMeasuring the Impact of Network Latency at Twitter
Measuring the Impact of Network Latency at Twitter
 
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyyActive Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
 

CloudBurst 2019 - Indexing and searching NuGet.org with Azure Functions and Search

  • 1. Indexing and searching NuGet.org with Azure Functions and Search Maarten Balliauw @maartenballiauw
  • 2. “Find this type on NuGet.org”
  • 3. “Find this type on NuGet.org” In ReSharper and Rider Search for namespaces & types that are not yet referenced
  • 4. “Find this type on NuGet.org” Idea in 2013, introduced in ReSharper 9 (2015 - https://www.jetbrains.com/resharper/whatsnew/whatsnew_9.html) Consists of ReSharper functionality A service that indexes packages and powers search Azure Cloud Service (Web and Worker role) Indexer uses NuGet OData feed https://www.nuget.org/api/v2/Packages?$select=Id,Version,NormalizedVersion,LastEdited,Published&$ orderby=LastEdited%20desc&$filter=LastEdited%20gt%20datetime%272012-01-01%27
  • 6. NuGet over time... Repo-signing announced August 10, 2018 Big chunk of packages signed over holidays 2018/2019 Re-download all metadata & binaries Very slow over OData Is there a better way? https://blog.nuget.org/20180810/Introducing-Repository-Signatures.html
  • 8. NuGet talks to a repository Can be on disk/network share or remote over HTTP(S) HTTP(S) API’s V2 – OData based (used by pretty much all NuGet servers out there) V3 – JSON based (NuGet.org, TeamCity, MyGet, Azure DevOps, GitHub repos)
  • 9. V2 Protocol Started as “OData-to-LINQ-to-Entities” (V1 protocol) Optimizations added to reduce # of random DB queries (VS2013+ & NuGet 2.x) Search – Package manager list/search FindPackagesById – Package restore (Does it exist? Where to download?) GetUpdates – Package manager updates https://www.nuget.org/api/v2 (code in https://github.com/NuGet/NuGetGallery)
  • 10. V3 Protocol JSON based A “resource provider” of various endpoints per purpose Catalog (NuGet.org only) – append-only event log Registrations – materialization of newest state of a package Flat container – .NET Core package restore (and VS autocompletion) Report abuse URL template Statistics … https://api.nuget.org/v3/index.json (code in https://github.com/NuGet/NuGet.Services.Metadata)
  • 11. How does NuGet.org work? User uploads to NuGet.org Data added to database Data added to catalog (append-only data stream) Various jobs run over catalog using a cursor Registrations (last state of a package/version), reference catalog entry Flatcontainer (fast restores) Search index (search, autocomplete, NuGet Gallery search) …
  • 12. Catalog seems interesting! Append-only stream of mutations on NuGet.org Updates (add/update) and Deletes Chronological Can continue where left off (uses a timestamp cursor) Can restore NuGet.org to a given point in time Structure Root https://api.nuget.org/v3/catalog0/index.json + Page https://api.nuget.org/v3/catalog0/page0.json + Leaf https://api.nuget.org/v3/catalog0/data/2015.02.01.06.22.45/adam.jsgenerator.1.1.0.json
  • 14. “Find this type on NuGet.org” Refactor from using OData to using V3? Mostly done, one thing missing: download counts (using search now) https://github.com/NuGet/NuGetGallery/issues/3532 Build a new version? Welcome to this talk 
  • 15. Building a new version
  • 16. What do we need? Watch the NuGet.org catalog for package changes For every package change Scan all assemblies Store relation between package id+version and namespace+type API compatible with all ReSharper and Rider versions Bonus points! Easy way to re-index later (copy .nupkg binaries + dump index to JSON blobs)
  • 17. What do we need? Watch the NuGet.org catalog for package changes periodic check For every package change based on a queue Scan all assemblies Store relation between package id+version and namespace+type API compatible with all ReSharper and Rider versions always up, flexible scale Bonus points! Easy way to re-index later (copy .nupkg binaries + dump index to JSON blobs)
  • 19. Sounds like functions! NuGet.org catalog Watch catalog Index command Find type API Find namespace API Search index Index package Raw .nupkg Index as JSON Download packageDownload command
  • 21. Functions best practices @PaulDJohnston https://medium.com/@PaulDJohnston/serverless-best-practices-b3c97d551535 Each function should do only one thing Easier error handling & scaling Learn to use messages and queues Asynchronous means of communicating, helps scale and avoid direct coupling ...
  • 23. Bindings Help a function do only one thing Trigger, provide input/output Function code bridges those Build your own!* SQL Server binding Dropbox binding ... NuGet Catalog *Custom triggers not officially supported (yet?) Trigger Input Output Timer ✔ HTTP ✔ ✔ Blob ✔ ✔ ✔ Queue ✔ ✔ Table ✔ ✔ Service Bus ✔ ✔ EventHub ✔ ✔ EventGrid ✔ CosmosDB ✔ ✔ ✔ IoT Hub ✔ SendGrid, Twilio ✔ ... ✔
  • 25. We’re making progress! NuGet.org catalog Watch catalog Index command Find type API Find namespace API Search index Index package Raw .nupkg Index as JSON Download packageDownload command
  • 27. Next up: indexing NuGet.org catalog Watch catalog Index command Find type API Find namespace API Search index Index package Raw .nupkg Index as JSON Download packageDownload command
  • 28. Indexing Opening up the .nupkg and reflecting on assemblies System.Reflection.Metadata Does not load the assembly being reflected into application process Provides access to Portable Executable (PE) metadata in assembly Store relation between package id+version and namespace+type Azure Search? A database? Redis? Other?
  • 30. System.Reflection.Metadata using (var portableExecutableReader = new PEReader(assemblySeekableStream)) { var metadataReader = portableExecutableReader.GetMetadataReader(); foreach (var typeDefinition in metadataReader.TypeDefinitions.Select(metadataReader .GetTypeDefinition)) { if (!typeDefinition.Attributes.HasFlag(TypeAttributes.Public)) continue; var typeNamespace = metadataReader.GetString(typeDefinition.Namespace); var typeName = metadataReader.GetString(typeDefinition.Name); if (typeName.StartsWith("<") || typeName.StartsWith("__Static") || typeName.Contains("c__DisplayClass")) continue; typeNames.Add($"{typeNamespace}.{typeName}"); } }
  • 31. Azure Search “Search-as-a-Service” Scales across partitions and replicas Define an index that will hold documents consisting of fields Fields can be searchable, facetable, filterable, sortable, retrievable Can’t be changed easily, think upfront! Have to define what we want to search, and what we want to display My function will also write documents to a JSON blob Can re-index using Azure Search importer in case needed
  • 33. “Do one thing well” Our function shouldn’t care about creating a search index. Better: return index operations, have something else handle those Custom output binding?
  • 35. Almost there… NuGet.org catalog Watch catalog Index command Find type API Find namespace API Search index Index package Raw .nupkg Index as JSON Download packageDownload command
  • 36. HTTP trigger binding [HttpTrigger(AuthorizationLevel.Anonymous, "get", Route = "v1/find-type")] HttpRequest request Options for trigger Authentication (anonymous, a function/host key, a user token) HTTP method What the route looks like
  • 37. Making search work with ReSharper and Rider demo
  • 38. One issue left... Download counts - used for sorting and scoring search results Change continuously on NuGet Not part of V3 catalog Could use search but that’s N(packages) queries https://github.com/NuGet/NuGetGallery/issues/3532 If that data existed, how to update search? Merge data! new PackageDocumentDownloads(key, downloadcount)
  • 39. We’re done! NuGet.org catalog Watch catalog Index command Find type API Find namespace API Search index Index package Raw .nupkg Index as JSON Download packageDownload command
  • 40. We’re done! Functions Collect changes from NuGet catalog Download binaries Index binaries using PE Header Make search index available in API Trigger, input and output bindings Each function should do only one thing NuGet.org catalog Watch catalog Index command Find type API Find namespace API Search index Index package Raw .nupkg Index as JSON Download packageDownload command
  • 41. We’re done! All our functions can scale (and fail) independently Full index in May 2019 took ~12h on 2 B1 instances Can be faster on more CPU’s ~ 1.7mio packages (NuGet.org homepage says) ~ 2.1mio packages (the catalog says ) ~ 8 400 catalog pages with ~ 4 200 000 catalog leaves (hint: repo signing) NuGet.org catalog Watch catalog Index command Find type API Find namespace API Search index Index package Raw .nupkg Index as JSON Download packageDownload command
  • 42. Closing thoughts… Would deploy in separate function apps for cost Trigger binding collects all the time so needs dedicated capacity (and thus, cost) Others can scale within bounds (think of $$$) Would deploy in separate function apps for failure boundaries Trigger, indexing, downloading should not affect health of API Are bindings portable...? Avoid them if (framework) lock-in matters to you They áre nice in terms of programming model…

Editor's Notes

  1. https://pixabay.com
  2. Show feature in action in Visual Studio (and show you can see basic metadata etc.)
  3. Copied in 2017 in VS - https://www.hanselman.com/blog/VisualStudio2017CanAutomaticallyRecommendNuGetPackagesForUnknownTypes.aspx Demo the feed quickly?
  4. Around 3 TB in May 2019
  5. Demo ODataDump quickly
  6. Demo: click around in the API to show some base things
  7. Raw API - click around in the API to show some base things, explain how a cursor could go over it Root https://api.nuget.org/v3/catalog0/index.json Page https://api.nuget.org/v3/catalog0/page0.json Leaf https://api.nuget.org/v3/catalog0/data/2015.02.01.06.22.45/adam.jsgenerator.1.1.0.json Explain CatalogDump NuGet.Protocol.Catalog comes from GitHub CatalogProcessor feches all pages between min and max timestamp My implementation BatchCatalogProcessor fetches multiple pages at the same time and build a “latest state” – much faster! Fetches leaves, for every leaf calls into a simple method Much faster, easy to pause (keep track of min/max timestamp)
  8. LOL input, process, output More serious: events trigger code Periodic check for packages Queue message to index things API request runs a search No server management or capacity planning
  9. Will use storage queues n demo’s to be able to run things locally. Ideally use SB topics or event grid (transactional)
  10. Create a new TimerTrigger function We will need a function to index things from NuGet Timer will trigger every X amount of time Timer provides last timestamp and next timestamp, so we can run our collector for that period Snippet: demo-timertrigger Mention HttpClient not used correctly: not disposed, so will starve TCP connections at some point Go over code example and run it var httpClient = new HttpClient(); var cursor = new InMemoryCursor(timer.ScheduleStatus?.Last ?? DateTimeOffset.UtcNow); var processor = new CatalogProcessor( cursor, new CatalogClient(httpClient, new NullLogger<CatalogClient>()), new DelegatingCatalogLeafProcessor( added => { log.LogInformation("[ADDED] " + added.PackageId + "@" + added.PackageVersion); return Task.FromResult(true); }, deleted => { log.LogInformation("[DELETED] " + deleted.PackageId + "@" + deleted.PackageVersion); return Task.FromResult(true); }), new CatalogProcessorSettings { MinCommitTimestamp = timer.ScheduleStatus?.Last ?? DateTimeOffset.UtcNow, MaxCommitTimestamp = timer.ScheduleStatus?.Next ?? DateTimeOffset.UtcNow, ServiceIndexUrl = "https://api.nuget.org/v3/index.json" }, new NullLogger<CatalogProcessor>()); await processor.ProcessAsync(CancellationToken.None);
  11. Each function should only do one thing! We are violating this.
  12. Go over Approach1 code – Enqueuer class Mention we are using roughly the same code as before Differences are that our function is now no longer doing things itself, instead it’s adding messages to a queue for processing later on That Queue binding is interesting. This is where the input/output comes from. Instead of managing our own queue connection, we let the framework handle all plumbing so we can focus on adding messages. In Indexer, we use the Queue as an input binding, and read messages. We can now scale enqueuing and scaling separately! But are we there yet?
  13. https://github.com/thinktecture/azure-functions-extensibility
  14. Go over Approach2 code Show this is MUCH simpler – trigger binding that provides input, queue output bindign to write that input to a queue Let’s go over what it takes to build a trigger binding NuGetCatalogTriggerAttribute – the data needed for the trigger to work – go over properties and attributes Hooking it up requires a binding configuration – NuGetCatalogTriggerExtensionConfigProvider It says: if you see this specific binding, register it as a trigger that maps to some provider So we need that provider – NuGetCatalogTriggerAttributeBindingProvider Provider is there to create an object that provides data. In our case we need to store the NuGet catalog timestamp cursor, so we do that on storage, and then return the actual binding – NuGetCatalogTriggerBinding In NuGetCatalogTriggerBinding, we have to specify how data can be bound. What if I use a differnt type of object than PackageOperation? What if someone used a node.js or Python function instead of .NET. Need to define the shape of the data our trigger provides. PackageOperationValueProvider is also interesting, this provides data shown in the portal diagnostics CreateListenerAsync is where the actual triger code will be created – NuGetCatalogListener NuGetCatalogListener uses the BatchCatalogProcessor we had previously, and when a package is added or deleted it will call into the injected ITriggeredFunctionExecutor ITriggeredFunctionExecutor is Azure Functions framework specific, but it’s the glue that will clal into our function with the data we provide Note StartAsync/StopAsync where you can add startup/shutdown code ONE THING LEFT THAT IS NOT DOCUMENTED – Startup.cs to register the binding. And since we are in a different class library, also need Microsoft.Azure.WebJobs.Extensions referenced to generate \bin\Debug\netcoreapp2.1\bin\extensions.json As a result our code is now MUCH cleaner, show it again and maybe also show it in action Mention [Singleton(Mode = SingletonMode.Listener)] – we need to ensure this binding only runs single-instance (cursor clashes otherwise). This is due to ho the catalog works, parallel processing is harder to do. But we can fix that by scaling the Indexer later on. Show Approach3 PopulateQueueAndTable Same code, but a bit more production worthy Sending data to two queues (indexing and downloading) Storing data in a table (and yes, violating “do one thing” again but I call it architectural freedom)
  15. Next up will be downloading and indexing. Let’s start with downloading. Grab a copy of the .nupkg from NuGet and store it in a blob Redundancy - no need to re-download/stress NuGet on a re-index
  16. Go over Approach3 code DownloadToStorage uses a QueueTrigger to run whenever a message appears in queue Note no singleton: we can scale this across multiple instances/multiple servers Uses a Blob input binding that provides access to a blob Note the parameters, name of the blob is resolved based on data from other inputs which is prety nifty Our code checks whether it’s an add or a delete, and either downloads+uploads to the blob reference, or delets the blob reference
  17. Next up will be indexing itself. There are a couple of things here…
  18. Go over Approach3 code PackageIndexer uses a QueueTrigger to run whenever a message appears in queue Uses a Blob input binding that provides access to a blob where we can write our indexed entity – will show this later Based on package operation, we will add or delete from the index RunAddPackageAsync has some plumbing, probably too much, to dowload the .nupkg file and store it on disk Note: we store it on disk as we need a seekable stream. So why no memoy stream? Some NuGet packages are HUGE. Find PEReader usage and show how it will index a given package’s public types and namespaces All goes into a typeNames collection. Now: how do we add this info to the index? Show PackageDocument class, has MANY properties First important: the Identifier property has [Key] applied. Azure Search needs a key for teh document so we can retrieve by key, which could be useful when updating existing content or to find a specific document and delete it from the index. Second important: TypeNames is searchable. Also mention “simpleanalyzer”: “Divides text at non-letters and converts them to lower case.” Other analyzers remove stopwords and do other things, this one should be as searchable as possible. Other fields are sometimes searchable, sometimes facetable – a bit of leftover from me thinking about search use cases. The R# API ony searches on typename so could make everything else just retrievable as well. Of course, index is not there by default, so need to create it. We do this when our function is instantiated (static constructor, so only once per launch of our functions) Is this good? Yes, because only once per server instance our function runs on. No because we do it at one point, what if the index is deleted in between and needs to be recreated? Edge case, but a retry strategy could be a good idea... Next, we create our package document, and at one point we add it to a list of index actions, and to blob storage indexActions.Add(IndexAction.MergeOrUpload(packageToIndex)); JsonSerializer.Serialize(jsonWriter, packagesToIndex); Writing to index using batch - var indexBatch = IndexBatch.New(actions); Leftover code from earlier, batch makes no sense for one document, but in case you want to do multiple in one go this is the way. Do beware a batch can only be several MB in size, for this NuGet indexing I can only do ~25 in a batch before payload is too large. That’s… it! Run approach 3 (for last hour) and see functions being hit / packages added to index Go to Azure Search portal as well, show how importer would work in case of fire
  19. Go over Approach3 code PackageIndexerWithCustomBinding is mostly the same code One difference: it uses the [AzureSearchIndex] binding to write add/delete operations to the index instead Go over how it works. Again, an attribute with settings – AzureSearchIndexAttribute Also a configuration that registers the binding as an output binding using BindToCollector – AzureSearchExtensionConfigProvider Now, what’s this OpenType? It’s some sort of dynamic type. If we want to create an AzureSearch output binding, we better support more than just our PackageDocument use case! So we need a collector builder that can create the actual binding implementation based on the real type requested by our function parameter – AzureSearchAsyncCollectorBuilder In AzureSearchAsyncCollectorBuilder, we do that. Very simple bootstrap code in this case, but could be more complex depending on the type of binding you are creating. Our AzureSearchAsyncCollector uses the attribute to check for Azure Search connection details, as well as the type of operation we expect it to handle. Why not all? Well, IAsyncCollector only has Add and Flush. Note: add called manually, flush at function complete – could use flush to send things in a batch... Code itself pretty straightforward. On Add, we add an action to search. With a retry in case the index does not exist – we then create it. Creation code kind of interesting as we use some reflection in case we specify a given type of coument to index. Why? Cause when we do Upserts, we may want to update just one or two properties, and can use a different Dto in that case (but still have the index shaped to the full document shape) Run when time left, but nothing fancy here...
  20. Now we need to make ReSharper talk to our search. We have the index, so that should be a breeze, right?
  21. Go over Web code RunFindTypeApiAsync and RunFindNamespaceAsync Both use “name” as their query parameter to search for RunInternalAsync does the heavy lifting Grabs other parameters Runs search, and collects several pages of results Why is this ForEachAsync there? Search index has multiple versions for every package id, yet ReSharper expects only the latest matching all parameters Azure Search has no group by / distinct by, so need to do this in memory. Doing it here by fetching a maximum number of results and doing the grouping manually. Use the collected data to build result. Add matching type names etc. Example requests: http://localhost:7071/api/v1/find-type?name=JsonConvert http://localhost:7071/api/v1/find-type?name=CamoServer&allowPrerelease=true&latestVersion=false https://nugettypesearch.azurewebsites.net/api/v1/find-type?name=JsonConvert In ReSharper (devenv /ReSharper.Internal, go to NuGet tool window, set base URL to https://nugettypesearch.azurewebsites.net/api/v1/) Write some code that uses JsonConvert / JObject and try it out.