SlideShare a Scribd company logo
Automated product categorization
Chapters
1. Introduction
2. Service Architecture
3. Data
4. Training
5. Inference
6. Evaluation
#1 What is Skroutz.gr?
• Skroutz.gr is a marketplace & shopping assistant which
makes online shopping easier and more reliable
• It includes more than 11,000,000 products from 3,200
different e-shops
• On a monthly basis the website welcomes more than 8
million unique visitors ranking in the top positions in the
Greek Web
#1 Some Numbers
3,200
merchants
11
million products
270 mil.
pageviews /
mo
1.1 mil.
searches/day
33 mil.
sessions/mo

Recommended for you

Types of testing
Types of testingTypes of testing
Types of testing

Software testing is the process of evaluation a software item to detect differences between given input and expected output. Also to assess the feature of A software item. Testing assesses the quality of the product. Software testing is a process that should be done during the development process. In other words software testing is a verification and validation process. TYPES OF TESTING There are many types of testing like Unit Testing Integration Testing Functional Testing System Testing Stress Testing Performance Testing Usability Testing Acceptance Testing Regression Testing Beta Testing

softwaresoftware testingtypes of testing
Software testing tools (free and open source)
Software testing tools (free and open source)Software testing tools (free and open source)
Software testing tools (free and open source)

This document discusses various tools used for test automation including Cobertura, Selenium, JMeter, Bugzilla, and Testia Tarantula. Cobertura is a code coverage tool that calculates test coverage percentages. Selenium is described as a tool for automating web application testing across browsers. JMeter is introduced as a load testing tool focused on analyzing performance of web applications. Bugzilla and Tarantula are mentioned as tools for bug tracking and project/test management respectively in agile software development. The document also discusses integrating these various tools together for a complete test automation framework.

toolsopen sourcebugs
Software Quality Assurance
Software Quality AssuranceSoftware Quality Assurance
Software Quality Assurance

This lecture is about the detail definition of software quality and quality assurance. Provide details about software tesing and its types. Clear the basic concepts of software quality and software testing.

software testingstatic and dynamic testingsoftware quality
#1 The Problem
• Each day we collect thousands of new products by
downloading e-shop feeds (XML, CSV etc. - product
catalogs)
• We want to categorize incoming product payloads as
provided by eshops to the most relevant categories in
Skroutz category tree taxonomy with the minimum human
intervention.
- Difficult
- Important
#1 Why Difficult?
• Many leaf categories in
Skroutz taxonomy (>2k)
• Sibling categories
(subjective categorization)
• Misleading product titles
and shop-categories from
shops
#1 Why Important?
Robot MO collects
products from shop
feeds and stores them
to DB
Megatron category
classifier categorizes
products to the correct
category
Tron groups similar
products to entities
called SKUs to be
ready for indexing
Elasticsearch indexes
products to be
searchable from user
interface
#1 Facts
•Merchants send more than ~15k new products every day in
Skroutz!!!
•2.3k unique leaf categories in our category tree (taxonomy)
•Manual “move-to-category” action:
- Costs ~7.8s on average for content managers
- Subjective decisions may add extra overhead

Recommended for you

Software quality assurance
Software quality assuranceSoftware quality assurance
Software quality assurance

This document discusses software quality assurance (SQA). It defines SQA as a planned set of activities to provide confidence that software meets requirements and specifications. The document outlines important software quality factors like correctness, reliability, and maintainability. It describes SQA objectives in development and maintenance. Key principles of SQA involve understanding the development process, requirements, and how to measure conformance. Typical SQA activities include validation, verification, defect prevention and detection, and metrics. SQA can occur at different levels like testing, validation, and certification.

Internals of Presto Service
Internals of Presto ServiceInternals of Presto Service
Internals of Presto Service

Presto is a distributed SQL query engine that Treasure Data provides as a service. Taro Saito discussed the internals of the Presto service at Treasure Data, including how the TD Presto connector optimizes scan performance from storage systems and how the service manages multi-tenancy and resource allocation for customers. Key challenges in providing a database as a service were also covered, such as balancing cost and performance.

prestotdtech
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...

Morningstar’s Risk Model project is created by stitching together statistical and machine learning models to produce risk and performance metrics for millions of financial securities. Previously, we were running a single version of this application, but needed to expand it to allow for customizations based on client demand. With the goal of running hundreds of custom Risk Model runs at once at an output size of around 1TB of data each, we had a challenging technical problem on our hands! In this presentation, we’ll talk about the challenges we faced replatforming this application to Spark, how we solved them, and the benefits we saw. Some things we’ll touch on include how we created customized models, the architecture of our machine learning application, how we maintain an audit trail of data transformations (for rigorous third party audits), and how we validate the input data our model takes in and output data our model produces. We want the attendees to walk away with some key ideas of what worked for us when productizing a large scale machine learning platform.

* 
apache spark

 *big data

 *ai

 *
#1 Old Solution - Overview
•Use Elasticseach to match specific product attributes:
- PN (manufacturer part number)
- Name
- Shop-category
•Aggregate matches and group by categories
•Normalize results and use custom weights to calculate a score
•Take Top-K results
#1 Old Solution - Limitations
•Plain cosine similarity distance on TF/IDF weights:
- No learning feedback loop
- No advanced statistics utilization (e.g. correlation between price
value and text features)
•No easy way to tune custom weights applied on final scoring
•Heuristics don’t take into account category specific context
•Heuristics don’t take into account word level context. E.g.
word “samsung” is followed by word “galaxy” most of the time
and then probably follows a model number.
#1 Old Solution - Good Parts
•Simple solution (except for custom scoring stuff)
•Easy to debug
•Easy to deploy
•Online
#1 New Solution - “Megatron”

Recommended for you

KP Partners: DataStax and Analytics Implementation Methodology
KP Partners: DataStax and Analytics Implementation MethodologyKP Partners: DataStax and Analytics Implementation Methodology
KP Partners: DataStax and Analytics Implementation Methodology

Apache Cassandra is the leading distributed database in use at thousands of sites with the world’s most demanding scalability and availability requirements. Apache Spark is a distributed data analytics computing framework that has gained a lot of traction in processing large amounts of data in an efficient and user-friendly manner. The joining of both provides a powerful combination of real-time data collection with analytics. After a brief overview of Cassandra and Spark, this class will dive into various aspects of the integration.

kp partnersbig datadatastax
[db tech showcase Tokyo 2017] C34: Replacing Oracle Database at DBS Bank ~Ora...
[db tech showcase Tokyo 2017] C34: Replacing Oracle Database at DBS Bank ~Ora...[db tech showcase Tokyo 2017] C34: Replacing Oracle Database at DBS Bank ~Ora...
[db tech showcase Tokyo 2017] C34: Replacing Oracle Database at DBS Bank ~Ora...

Migrating Oracle based applications to MariaDB has become easier and economically advantageous with the feature set of MariaDB 10.2 and the upcoming 10.3 release. We’ll present details of the features that led DBS Bank to migrate mission critical applications to MariaDB.

Boosting the Performance of your Rails Apps
Boosting the Performance of your Rails AppsBoosting the Performance of your Rails Apps
Boosting the Performance of your Rails Apps

A short presentation showing some ways to improve the performance of ruby on rails apps. Presented at the Jakarta ruby user group meetup.

railsruby on railsperformance
#1 Overview
•Approach problem as a supervised learning task
•Rely on probabilities to obtain a meaningful score
•Use more features from multiple sources and use datasets
•Learn new patterns and relations by training
•Measure performance on dataset splits
•Use a microservice to serve classification requests
•Apply threshold for low confidence results
Chapters
1. Introduction
2. Service Architecture
3. Data
4. Training
5. Inference
6. Evaluation
#2 Service Architecture
1.Training Phase
2.Inference Phase
3.APIs
Automated product categorization

Recommended for you

Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The SequelSilicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel

MongoDB presentation from Silicon Valley Code Camp 2015. Walkthrough developing, deploying and operating a MongoDB application, avoiding the most common pitfalls.

svccdatabasemongodb
Performance Testing Java Applications
Performance Testing Java ApplicationsPerformance Testing Java Applications
Performance Testing Java Applications

Video and slides synchronized, mp3 and slide download available at http://bit.ly/14w07bK. Martin Thompson explores performance testing, how to avoid the common pitfalls, how to profile when the results cause your team to pull a funny face, and what you can do about that funny face. Specific issues to Java and managed runtimes in general will be explored, but if other languages are your poison, don't be put off as much of the content can be applied to any development. Filmed at qconlondon.com. Martin Thompson is a high-performance and low-latency specialist, with over two decades working with large scale transactional and big-data systems, in the automotive, gaming, financial, mobile, and content management domains. Martin was the co-founder and CTO of LMAX, until he left to specialize in helping other people achieve great performance with their software.

qconlondoninfoqjava
Visual Studio Profiler
Visual Studio ProfilerVisual Studio Profiler
Visual Studio Profiler

Quick overview on Visual Studio 2012 Profiler & Profiling tools : the importance of the profiling methods (sampling, instrumentation, memory, concurrency, … ), how to run a profiling session, how to profile unit test/load test, how to use API and a few samples

microsoft visual studioprofilerconcurrency
#2.1 Training Phase
1. Export dataset (product features labeled with category_id) and upload to Swift
2. Download specific dataset version in “training VM”
3. Start a training session using a train/val split from dataset
4. Save best performing model params snapshot (based on validation set loss)
5. Compress and upload model params to Swift container
#2.2 Inference Phase
1. Application Part: Send classification request
upon new product arrivals:
- Kafka producer (asynchronous request)
- Megatron Client HTTP synchronous
requests (2nd alternative)
2. Category Classifier Microservice Part:
- Pop messages from stream (Kafka
consumer)
- Dispatch messages to in-memory Neural
Network instance
- Fetch predictions (scores) and post-back
to Core Application API endpoint
#2.3 APIs
1. Megatron microservice internal API
- Common API (wraps Keras API)
- Basic methods:
✓ build
✓ train
✓ save
✓ load
✓ predict
- CLI commands
#2.3 APIs(2)
1. Skroutz Application Ecosystem (Ruby client)
- Megatron::Client
✓ Issues requests to microservice
- Megatron DB model
✓ Stores prediction results
- ApiController endpoint
✓ Receives callbacks from microservice

Recommended for you

The Diabolical Developers Guide to Performance Tuning
The Diabolical Developers Guide to Performance TuningThe Diabolical Developers Guide to Performance Tuning
The Diabolical Developers Guide to Performance Tuning

This talk covers both the methodology and the tooling on how to diagnose root cause's of bottlenecks in Java/JVM systems

javajava virtual machineperformance tuning
Webinar: Best Practices for Upgrading to MongoDB 3.2
Webinar: Best Practices for Upgrading to MongoDB 3.2Webinar: Best Practices for Upgrading to MongoDB 3.2
Webinar: Best Practices for Upgrading to MongoDB 3.2

MongoDB 3.2 introduces a host of new features and benefits, including encryption at rest, document validation, MongoDB Compass, numerous improvements to queries and the aggregation framework, and more. To take advantage of these features, your team needs an upgrade plan. In this session, we’ll walk you through how to build an upgrade plan. We’ll show you how to validate your existing deployment, build a test environment with a representative workload, and detail how to carry out the upgrade. By the end, you should be prepared to start developing an upgrade plan for your deployment.

Technical Webinar: Patterns for Integrating Your Salesforce App with Off-Plat...
Technical Webinar: Patterns for Integrating Your Salesforce App with Off-Plat...Technical Webinar: Patterns for Integrating Your Salesforce App with Off-Plat...
Technical Webinar: Patterns for Integrating Your Salesforce App with Off-Plat...

Patterns for Integrating Your Salesforce App with Off-Platform Apps Integrating Salesforce applications with additional off-platform apps can dramatically extend the capability of powerful business apps. From ERP systems to custom apps, integrating with Salesforce can help streamline essential processes, saving your business valuable time and money. In our latest tech webinar, CodeScience Technical Architect Mark Pond dives into Salesforce integration patterns and the positive impacts they can deliver. In this technical webinar, you will learn: - Several common Salesforce integration patterns - Integration pattern pitfalls - How to leverage a custom queue to automate background work - Deliverability and reporting advantages of custom queues Watch today to learn how automated testing can take your enterprise solutions to the next level.

salesforce developersappexchangeintegration patterns
Chapters
1. Introduction
2. Service Architecture
3. Data
4. Training
5. Inference
6. Evaluation
#3 Data
•Product attribute values (potential features)
Product
Name
Shop manufacturer
Part number
EAN
Price
Shop category
...
Samsung TV 32'' DF324 (PNDFD22) Full HD Black NEW
Αρχική > Ηλεκτρονικά > Τηλεοράσεις
PNDFD22
300 €
#3 Data(2)
•Training Dataset - Raw Features
Image
Numerical
Categorical
Label
Text
#3 Data(3)
•Preprocessing
- Text
- Numerical
- Categorical
- Labels
X
y

Recommended for you

AWS re:Invent 2014 | (ARC202) Real-World Real-Time Analytics
AWS re:Invent 2014 | (ARC202) Real-World Real-Time AnalyticsAWS re:Invent 2014 | (ARC202) Real-World Real-Time Analytics
AWS re:Invent 2014 | (ARC202) Real-World Real-Time Analytics

Working with big volumes of data is a complicated task, but it's even harder if you have to do everything in real time and try to figure it all out yourself. This session will use practical examples to discuss architectural best practices and lessons learned when solving real-time social media analytics, sentiment analysis, and data visualization decision-making problems with AWS. Learn how you can leverage AWS services like Amazon RDS, AWS CloudFormation, Auto Scaling, Amazon S3, Amazon Glacier, and Amazon Elastic MapReduce to perform highly performant, reliable, real-time big data analytics while saving time, effort, and money. Gain insight from two years of real-time analytics successes and failures so you don't have to go down this path on your own.

awsreinvent2014reinvent
Sumo Logic Quickstart - Jan 2017
Sumo Logic Quickstart - Jan 2017Sumo Logic Quickstart - Jan 2017
Sumo Logic Quickstart - Jan 2017

QuickStart your Sumo Logic service with this exclusive webinar. At these monthly live events you will learn how to capitalize on critical capabilities that can amplify your log analytics and monitoring experience while providing you with meaningful business and IT insights. Video: https://www.sumologic.com/online-training/#QuickStart

pcicloud computingretail logging
CQRS recepies
CQRS recepiesCQRS recepies
CQRS recepies

The document provides recipes for building CQRS architectures. It begins with an introduction and agenda. Lesson 1 discusses a basic layered architecture, noting its limitations in scalability and evolvability. Lesson 2 introduces n-layered architecture with dependency injection to improve decoupling but it also has scalability issues. Lesson 3 explains the Command Query Responsibility Segregation (CQRS) pattern, separating read and write operations for improved scalability and flexibility. It provides code examples of implementing CQRS. The document aims to help architects evolve monolithic systems to more scalable and maintainable architectures.

workshopenterprise architecturecqrs
#3 Preprocessing - Text
• Our best solution involves “Word Vectors”
• Steps to prepare for word vectors:
- Learn a words Vocabulary (mapping of words to numeric id)
- Transform text sentences to Sequences of ids based on Vocabulary
- Decide a representative sequence length (E.g. 60 words)
- Apply zero padding (pre or post) and truncation to maintain a fixed length
#3 Preprocessing - Text(2)
• Use of Pretrained Embeddings (see W2Vec, FastText, GloVe etc.)
• We use FastText library with skipgram algorithm (unsupervised)
- https://fasttext.cc/docs/en/unsupervised-tutorial.html
#3 Preprocessing - Text(3)
• Embeddings:
- Outputs 100 dim Vector
- Total 1,500,000 rows (vocab)
• 2 versions (Name, Shop-category)
#3 Preprocessing - Numerical
• “Pricevat” and “Name Length” values
• Apply Standard Scaling

Recommended for you

The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it Work

Automated ML is an approach to minimize the need of data science effort by enabling domain experts to build ML models without having deep knowledge of algorithms, mathematics or programming skills. The mechanism works by allowing end-users to simply provide data and the system automatically does the rest by determining approach to perform particular ML task. At first this may sound discouraging to those aiming to the “sexiest job of the 21st century” - the data scientists. However, Auto ML should be considered as democratization of ML, rather that automatic data science. In this session we will talk about how Auto ML works, how is it implemented by Microsoft and how it could improve the productivity of even professional data scientists.

automated machine learningautomlazure
Alfresco Business Reporting - Tech Talk Live 20130501
Alfresco Business Reporting - Tech Talk Live 20130501Alfresco Business Reporting - Tech Talk Live 20130501
Alfresco Business Reporting - Tech Talk Live 20130501

This is the Slide Deck used in Alfresco's Tech Talk Live, May 1, 2013. It featured my Alfresco add-on: Alfresco Business Reporting. The purpose is to the technical 'why' and 'how' of the add-on module, the challenge faced and he solutions designed.

businessreportingpentaho
Test strategy utilising mc useful tools
Test strategy utilising mc useful toolsTest strategy utilising mc useful tools
Test strategy utilising mc useful tools

1) The document outlines a high level test strategy that involves layering the project under test and identifying components in each layer. It describes identifying test basis documentation, creating a dependency matrix, and formulating an overall test "big picture". 2) Test packs will be designed based on project layers, and key documentation will be stored in a repository to facilitate test coverage analysis. A dependency matrix and big picture diagram will guide regression test selection. 3) Tools like DocIndex, InternetMiner and VisioDecompositer are used to extract and store information from documents, web pages and diagrams to generate the test basis repository, and inform the dependency matrix and big picture diagram.

software testingbig data
#3 Preprocessing - Categorical
• All discrete value attributes/features:
- shop_id
- matching Product PNs category_id list
• One-Hot encoding:
#3 Label Encoding
• “category_id “ values are the “true” labels which should be learned by NN
• One-Hot encoding
• OR just use IDs and rely to “Keras” conventions (E.g. use an internal sparse categorical
representation to save huge amounts of RAM)
Chapters
1. Introduction
2. Service Architecture
3. Data
4. Training
5. Inference
6. Evaluation
#4 Training
1.Basic Concepts
2.Model Architecture
3.Training “In Action”

Recommended for you

Hardware Provisioning
Hardware ProvisioningHardware Provisioning
Hardware Provisioning

This document discusses hardware provisioning best practices for MongoDB. It covers key concepts like bottlenecks, working sets, and replication vs sharding. It also presents two case studies where these concepts were applied: 1) For a Spanish bank storing logs, the working set was 4TB so they provisioned servers with at least that much RAM. 2) For an online retailer storing products, testing found the working set was 270GB, so they recommended a replica set with 384GB RAM per server to avoid complexity of sharding. The key lessons are to understand requirements, test with a proof of concept, measure resource usage, and expect that applications may become bottlenecks over time.

mongodbmongodbdays
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical SolutionEnterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution

This session will cover building the modern Data Warehouse by migration from the traditional DW platform into the cloud, using Amazon Redshift and Cloud ETL Matillion in order to provide Self-Service BI for the business audience. This topic will cover the technical migration path of DW with PL/SQL ETL to the Amazon Redshift via Matillion ETL, with a detailed comparison of modern ETL tools. Moreover, this talk will be focusing on working backward through the process, i.e. starting from the business audience and their needs that drive changes in the old DW. Finally, this talk will cover the idea of self-service BI, and the author will share a step-by-step plan for building an efficient self-service environment using modern BI platform Tableau.

cloud analyticsawsmatillion
Design principles & quality factors
Design principles & quality factorsDesign principles & quality factors
Design principles & quality factors

The document discusses McCall's quality factors model for classifying software quality requirements. It describes the three categories in McCall's model - product operation factors, product revision factors, and product transition factors. Under each category, it lists and describes the specific quality factors, including correctness, reliability, efficiency, integrity, usability, maintainability, flexibility, testability, portability, reusability, and interoperability. It also discusses some alternative models that other researchers have proposed and eight design principles for structuring high-quality software designs.

#4.1 Basic Concepts
•Objective:
- Find a combination of mathematical functions and a set of
corresponding params to maximize prediction accuracy (or minimize
error rate).
- Ensure that the above generalizes well for production.
- Learn params in an acceptable time window.
•Experiment with Neural Network architectures
•GPUS to the rescue (speedup x10)
#4.1 Basic Concepts(2)
•Loss function
- Categorical Crossentropy
•Optimizer
- Adam (Gradient Descent)
•Hyper-params
- Mini-Batch Size
- Learning Rate
- Epochs
#4.1 Validation
•Why?
- Simulate unseen data
- Compare different:
✓ training methods
✓ hyper -params
- Avoid Overfitting
•Should be representative
•Validation Strategy
- 10% of whole Dataset
- Stratification on Categories
#4.2 Model Architecture
Text

Recommended for you

Chatbot workshop - How to build one.#digitized16
Chatbot workshop - How to build one.#digitized16Chatbot workshop - How to build one.#digitized16
Chatbot workshop - How to build one.#digitized16

Chatbots are artificial intelligence assistants that users can interact with through messaging applications. They are powered by technologies like machine learning and natural language processing. The document discusses how chatbots offer opportunities for businesses to engage with customers in new ways through around-the-clock marketing and customer service on messaging platforms where people spend much of their time online. It also notes that chatbots could help streamline operations and automate tasks for small businesses. The rise of conversational interfaces through chatbots may change how people complete daily tasks and interact with customer service in the future.

artificial intelligencebrandingpersonality
Chatbot workshop introduction.#digitized16
Chatbot workshop introduction.#digitized16 Chatbot workshop introduction.#digitized16
Chatbot workshop introduction.#digitized16

We briefly describe the evolution of HCI alongside the developments of artificial intelligence and nlp.

digitizednlpartificial intelligence
Warply Mobile Banking solutions
Warply Mobile Banking solutionsWarply Mobile Banking solutions
Warply Mobile Banking solutions

This document discusses mobile banking solutions from Warply. It provides an overview of Warply's engage platform, which offers CRM, analytics, audience targeting, campaign management, and beacon capabilities. It also discusses Warply's HCE wallet and tokenization services for contactless payments, as well as its banking products like loyalty and benefits applications and chatbot banking.

fintechmoneymartech
#4.2 Model Architecture
•Hybrid End-to-End architecture
•4 branches (4 input vectors):
A. Name Features Branch
B. Shop-Category Features Branch
C. Basic Features Branch (Numerics, Categorical)
D. Matching PNs Branch (Categorical)
Text
#4.2 Text Branches
• Inspired by “Embed, Encode, Attend, Predict”
- https://explosion.ai/blog/deep-learning-formula-nlp
• Each of “name” and “shop-category” sequence flows through:
- 1 x Embeddings Layer
- 1 x Bi-LSTM Encoder
- 1 x Attention Module
- 1 x LSTM Encoder
#4.2 Text Branches - Why LSTM?
• LSTM stands for “Long Short Term Memory” Layer (Encoder):
- Memory Cells / Captures context
- Propagates signal from previous words to the next in a Sequence
- 2 Stacked Layers performed better in our experiments
- 128 dimension output vector
- https://colah.github.io/posts/2015-08-Understanding-LSTMs/
128dim
#cells = sequence length
#4.2 Text Branches - Why pay Attention?
• Attention Mechanism:
- Controll how much signal should be propagated to next layers
- https://distill.pub/2016/augmented-rnns/

Recommended for you

Warply Mobile Banking solutions
Warply Mobile Banking solutionsWarply Mobile Banking solutions
Warply Mobile Banking solutions

This document discusses mobile banking solutions from Warply. It provides an overview of Warply's engage platform, which offers CRM, analytics, audience targeting, campaign management, and beacon capabilities. It also describes Warply's HCE wallet and tokenization services that enable mobile payments without secure elements, as well as a bank-wide loyalty and benefits application.

mobilemarketing automationbanking
Chatbots - A new era in digital banking
Chatbots - A new era in digital bankingChatbots - A new era in digital banking
Chatbots - A new era in digital banking

This document discusses the rise of chatbots and their potential impact on digital banking and commerce. It provides statistics on growing mobile messaging usage and time spent on apps. Chatbots are described as automated software applications that can perform simple repetitive tasks or have conversational abilities powered by artificial intelligence. Case studies show how chatbots are starting to be used for customer service functions like flight updates, personal assistants, shopping recommendations, and technical screening. The document suggests chatbots may become the primary way users access services and that developers will need to adopt conversational interfaces.

mobile marketingfintechchatbot
The CNN Greece Case study
The CNN Greece Case studyThe CNN Greece Case study
The CNN Greece Case study

CNN Greece's mobile strategy focuses on native advertising formats within their mobile app. They send over 30,000 push notifications per day with an open rate of 6.3%. Their strategy utilizes behavioral and geographical user data for targeted mobile ads. While ad blockers present a challenge, CNN sees opportunities in more engaging native ad formats and improved targeting. They are also exploring crowd-sourced journalism through mobile with a new "iReport" feature for Greece.

mobilemobile marketingmarketing and advertising
#4.2 Other Branches
• Basic Features Branch
- Inputs a concatenation of basic feats
- 1 Dense layer with #classes output
- ReLU activation
• Matching PNs Branch
- Inputs a concatenation of PN feats
- Short-circuited to final layer
InputVector
#classes
(~2kforSkroutz)
#4.2 Final Layer
• Merging Layer
- Concatenates all 4 branches outputs
- softmax activation
- Output: probabilities for each class
#4.2
#4.2 Model Architecture
• Model Capacity/Complexity:

Recommended for you

Programmatic Demystified (?)
Programmatic Demystified (?)Programmatic Demystified (?)
Programmatic Demystified (?)

George Zalokostas' presentation on programmatic advertising from the 5th Mobile Marketing Event by Warply

mobilemarketing and advertisingmobile marketing
In store Retail sap forum
In store Retail sap forum In store Retail sap forum
In store Retail sap forum

Retail experiences will be dominated by in-store analytic modules and aisle or on-self interactions. These new type of data will provide a true omni-channel experience to the end consumer enabling brands deliver a real precision marketing experience. One to one marketing is here and can be efficient applied within the physical store, where still 94% of all retail purchases take place.

instoresaphana
4th Mobile Marketing Event by Warply: Mobile: In-store Retail’s New Era_Lavip...
4th Mobile Marketing Event by Warply: Mobile: In-store Retail’s New Era_Lavip...4th Mobile Marketing Event by Warply: Mobile: In-store Retail’s New Era_Lavip...
4th Mobile Marketing Event by Warply: Mobile: In-store Retail’s New Era_Lavip...

Pharma PLUS loyalty

warplywarply engage platformretail
#4.3 Training In Action - Model Selection
•Conducted 100s of experiments with different combinations
of features, layers, modules (e.g. Embeddings, Bag of Words,
TF/IDF, LSTM, etc.)
•10s of Ablations studies: remove specific features to see how
performance is affected
•Read many papers and applied some common tricks (Bi-LSTM,
AdaptivePooling etc.)
•It is an alchemy!
#4.3 Training In Action - Tools
•Training Scheduler Process runs weekly
•CLI training commands
- CUDA_VISIBLE_DEVICES=1 python -m category_classifier.cli scrooge --model end2end --train --epochs
8 --batch_size 128
•Model Versioning
- E.g. “skroutz_models_2018_09_01_v1.tar.gz”
#4.3 Training In Action
Training run output example:
GPU monitoring:
#4.3 Training In Action
Learning Curves (Tensorboard):
Current best
Previous Arch Current bestPrevious Arch

Recommended for you

4th Mobile Marketing Event by Warply: Mobile: In-store Retail’s New Era_Nestl...
4th Mobile Marketing Event by Warply: Mobile: In-store Retail’s New Era_Nestl...4th Mobile Marketing Event by Warply: Mobile: In-store Retail’s New Era_Nestl...
4th Mobile Marketing Event by Warply: Mobile: In-store Retail’s New Era_Nestl...

The document discusses trends in digital media consumption and online activities from 2008 to 2014. It shows significant increases in areas like internet usage, social media, smartphones, and e-commerce. For example, internet usage increased 71% and social media increased over 500% during this period. The document also discusses challenges with traditional paper coupons like low redemption rates, fraud, and high costs. It suggests digital coupons could address these issues by being more user-friendly, flexible, and personalized for consumers while also reducing fraud and costs for companies.

warply4th mobile marketing event by warplye-coupons
4th Mobile Marketing Event by Warply: Mobile: In-store Retail’s New Era_First...
4th Mobile Marketing Event by Warply: Mobile: In-store Retail’s New Era_First...4th Mobile Marketing Event by Warply: Mobile: In-store Retail’s New Era_First...
4th Mobile Marketing Event by Warply: Mobile: In-store Retail’s New Era_First...

First Data is a large payment processing company operating in Greece. They propose an e-couponing solution to address issues with traditional paper coupons like high costs, fraud risk, and low redemption rates. Their mobile app and web portal solution allows suppliers to create and manage digital coupons that consumers can access and redeem at retailers' point-of-sale systems. This provides benefits like reduced costs, increased sales and customer data for all stakeholders. The solution is positioned to evolve further by integrating additional e-wallet capabilities over time.

e-couponswarply engage platform4th mobile marketing event by warply
4th Mobile Marketing Event by Warply: Mobile: In-store Retail’s New Era_iStor...
4th Mobile Marketing Event by Warply: Mobile: In-store Retail’s New Era_iStor...4th Mobile Marketing Event by Warply: Mobile: In-store Retail’s New Era_iStor...
4th Mobile Marketing Event by Warply: Mobile: In-store Retail’s New Era_iStor...

Elevating the in-store experience

e-couponswarply engage platform4th mobile marketing event by warply
Chapters
1. Introduction
2. Service Architecture
3. Data
4. Training
5. Inference
6. Evaluation
#5 Inference
1.Inference Pipeline
2.Inference API
3.Production
#5.1 Inference Pipeline
•Online execution:
- preprocessing
- vectorization
- Prediction
•Utilized by CategoryClassifier Class
- Wrapper of external API
•Utilize scikit-learn Pipelines
- http://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html
#5.2 Inference API
• REPL
• Kafka Worker
• Flask App

Recommended for you

Approaching Customers on Mobile. Bonus: sneak peek of Warply Engage Platform 2.0
Approaching Customers on Mobile. Bonus: sneak peek of Warply Engage Platform 2.0Approaching Customers on Mobile. Bonus: sneak peek of Warply Engage Platform 2.0
Approaching Customers on Mobile. Bonus: sneak peek of Warply Engage Platform 2.0

A presentation by Sotiris Alexopoulos, Head of Operations at Warply. This presentation was part of the 5th Infocom Mobile World 2015,"Mobile Everywhere: The touch 'n Go era!", in Greece. You can find more details about the event here: http://www.infocomapps.gr/ Feel free to share your thoughts on facebook.com/Warply and twitter.com/Warply.

mobile marketingwarplywarply engage platform
Mobile Payments Event by Warply: Eurobank's presentation
Mobile Payments Event by Warply: Eurobank's presentationMobile Payments Event by Warply: Eurobank's presentation
Mobile Payments Event by Warply: Eurobank's presentation

Eurobank's presentation by Antigonos Papadopoulos at Warply's special event on "Mobile payments" and new technologies in our everyday transactions through smartphones. Featured speakers were senior executives of leading brands of the associated ecosystem: high-tech companies, banking industry, mobile operators and retail - and presented real case studies on innovative payment services via mobile, personalized advertising, mobile data, and more. • Vasilis Koutsoubas, First Data • Gerasimos Livieratos, iSquare • Thomas Philippou, Vodafone • Antigonos Papadopoulos, Eurobank • Constantine Frydakis, Piraeus Bank • John Doxaras, Warply Aiming at an open and constructive dialogue at the event took place a panel, coordinated by the journalist of the newspaper "Proto Thema", Panagiotis Markidis. The event was powered by: Warply INNOVATHENS, the Hub of Innovation and Entrepreneurship of Technopolis Athens with HAMAC's support. Communication sponsorships of the event were the magazines RETAILBUSINESS, ADBUSINESS and the business portal BusinessNews.gr

mobile paymentmobile advertisingmobile marketing
Mobile Payments Event by Warply: Apple Pay
Mobile Payments Event by Warply: Apple PayMobile Payments Event by Warply: Apple Pay
Mobile Payments Event by Warply: Apple Pay

i-Square's presentation by Gerasimos Livieratos at Warply's special event on "Mobile payments" and new technologies in our everyday transactions through smartphones. Featured speakers were senior executives of leading brands of the associated ecosystem: high-tech companies, banking industry, mobile operators and retail - and presented real case studies on innovative payment services via mobile, personalized advertising, mobile data, and more. • Vasilis Koutsoubas, First Data • Gerasimos Livieratos, iSquare • Thomas Philippou, Vodafone • Antigonos Papadopoulos, Eurobank • Constantine Frydakis, Piraeus Bank • John Doxaras, Warply Aiming at an open and constructive dialogue at the event took place a panel, coordinated by the journalist of the newspaper "Proto Thema", Panagiotis Markidis. The event was powered by: Warply INNOVATHENS, the Hub of Innovation and Entrepreneurship of Technopolis Athens with HAMAC's support. Communication sponsorships of the event were the magazines RETAILBUSINESS, ADBUSINESS and the business portal BusinessNews.gr.

mobile paymentmobile marketingmobile advertising
#5.3 Production
•x2 inference VMs
- inference1.skroutz.gr, inference2.skroutz.gr (Kafka Workers)
•x2 Flavors (Greece, UK)
•Grafana Monitoring for Kafka Part
Chapters
1. Introduction
2. Service Architecture
3. Data
4. Training
5. Inference
6. Evaluation
#6 Evaluation
•More than 6% error rate reduction overall in Skroutz!
•Currently, more than ~2 content-editor hours saved per day in
Skroutz (this is scaling)!
•Move operations from list with “uncategorized” products
reduced significantly (by an order of magnitude)!
#6 Performance Summary
Success Rate Failure Rate No Prediction Rate
Megatron Old Megatron Old Megatron Old
Skroutz (GR)
2.3k categories
90.10% 82.6% 7.9% 13.8% 2% 3.5%
91.85% 85.7% 8.14% 14.32% N/A N/A
Scrooge (UK)
350 categories
87.56% 38.9% 2.5% 26.24% 9.9% 58.48%
97.1% 93.67% 2.8% 6.32% N/A N/A

Recommended for you

Data Privacy in Modern Advertisement
Data Privacy in Modern AdvertisementData Privacy in Modern Advertisement
Data Privacy in Modern Advertisement

A presentation by John Doxaras, CEO of Warply. This presentation was part of the guest lectures given during the 1st Athens Hackathon of OPENi in Greece. You can find more details about the event here: http://www.openi-ict.eu/1st-openi-hackathon-in-athens-on-september-12th-13th-2014/ You can find more details about OPENi here: http://www.openi-ict.eu/ You can see the pictures from the presentation here: http://goo.gl/YDQrlO Feel free to share your thoughts on Data Privacy in Modern Advertisement, on facebook.com/Warply and twitter.com/Warply.

marketing and advertisingmobile advertisingdata
Mobile Loyalty that works: a successful case study by Warply and Eurobank
Mobile Loyalty that works: a successful case study by Warply and Eurobank Mobile Loyalty that works: a successful case study by Warply and Eurobank
Mobile Loyalty that works: a successful case study by Warply and Eurobank

A presentation delivered by John Doxaras, Founder-CEO, Warply and George Veliziotis, Card Business Director, Eurobank "Mobile Loyalty that works": the best case practice of "Epistrofi", Eurobank's loyalty program, and its mobile application success, driven by Warply's technology and innovative Mobile Marketing capabilities. This presentation was part of the 1st Mobile Marketing Conference in Greece on July 3rd, 2014 in Athens, Greece. International and Greek speakers from the mobile industry presented trends and case studies of Mobile Marketing & Advertising practices globally to an audience of executives, professionals and media. Follow the conversation using the hashtags #warply and #mmcgr on Facebook and Twitter. The event was organized by Marketing Week Greece.

mobileeurobankbanking services
3rd Mobile Marketing Event by Warply: WIND Telecommunications Hellas presenta...
3rd Mobile Marketing Event by Warply: WIND Telecommunications Hellas presenta...3rd Mobile Marketing Event by Warply: WIND Telecommunications Hellas presenta...
3rd Mobile Marketing Event by Warply: WIND Telecommunications Hellas presenta...

A presentation by Konstantinos Tylipakis, Mobile Applications Product Manager, WIND Telecommunications Hellas This presentation was part of the 3rd Mobile Marketing event by Warply. Top executives from leading brands/companies and media agencies presented trends of the mobile industry and real examples of how they engage their customers using innovative mobile marketing practices. Follow the conversation using #warply #mme3 on Facebook and Twitter. The event was powered by: Warply Microsoft Innovation Center Greece Nespresso Nestle Ice-Cream Hellas

mobile marketingmobile advertising networkmarketing
#6 Monitoring Dashboard
#Future Improvements
• Utilize Image Features (in End-To-End model)
• Utilize Entity Recognition to extract more features
• Find ways to utilize more features (color, sizes etc.)
• Categorical Self-Trained Embeddings
• Experiment with newer solutions like “Transformer”
#Contact Info
Andreas Loupasakis
• Email: alup@skroutz.gr
• Kaggle: https://www.kaggle.com/andreaslup
• Twitter: https://twitter.com/andy_lupo
• LinkedIn: https://www.linkedin.com/in/andreas-loupasakis-06399a47
Thank you!

Recommended for you

3rd Mobile Marketing event by Warply: Mobile as a Revenue Channel
3rd Mobile Marketing event by Warply: Mobile as a Revenue Channel3rd Mobile Marketing event by Warply: Mobile as a Revenue Channel
3rd Mobile Marketing event by Warply: Mobile as a Revenue Channel

A presentation by John Doxaras, Founder & CEO, Warply This presentation was part of the 3rd Mobile Marketing event by Warply. Top executives from leading brands/companies and media agencies presented trends of the mobile industry and real examples of how they engage their customers using innovative mobile marketing practices. Follow the conversation using #warply #mme3 on Facebook and Twitter. The event was powered by: Warply Microsoft Innovation Center Greece Nespresso Nestle Ice-Cream Hellas

mobilemake money mobilemobile ad network
3rd Mobile Marketing event by Warply: Travelplanet24 presentation
3rd Mobile Marketing event by Warply: Travelplanet24 presentation3rd Mobile Marketing event by Warply: Travelplanet24 presentation
3rd Mobile Marketing event by Warply: Travelplanet24 presentation

A presentation by Nicola Perobelli, Market Manager Italy, and Christos Vareloglou, Market Manager Greece, Travelplanet24/ Tripsta This presentation was part of the 3rd Mobile Marketing event by Warply. Top executives from leading brands/companies and media agencies presented trends of the mobile industry and real examples of how they engage their customers using innovative mobile marketing practices. Follow the conversation using #warply #mme3 on Facebook and Twitter. The event was powered by: Warply Microsoft Innovation Center Greece Nespresso Nestle Ice-Cream Hellas

marketingmarketing and advertisingmobile marketing
Recent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS InfrastructureRecent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS Infrastructure

Recent advancements in the NIST-JARVIS infrastructure: JARVIS-Overview, JARVIS-DFT, AtomGPT, ALIGNN, JARVIS-Leaderboard

jarvisjarvis-dftalignn

More Related Content

What's hot

Roket Yazılımı Eğitimi Hafta 1
Roket Yazılımı Eğitimi Hafta 1Roket Yazılımı Eğitimi Hafta 1
Roket Yazılımı Eğitimi Hafta 1
Uğurkan Ateş
 
Introduction To Software Configuration Management
Introduction To Software Configuration ManagementIntroduction To Software Configuration Management
Introduction To Software Configuration Management
Rajesh Kumar
 
AI for Software Engineering
AI for Software EngineeringAI for Software Engineering
AI for Software Engineering
Miroslaw Staron
 
Types of testing
Types of testingTypes of testing
Types of testing
Valarmathi Srinivasan
 
Software testing tools (free and open source)
Software testing tools (free and open source)Software testing tools (free and open source)
Software testing tools (free and open source)
Wael Mansour
 
Software Quality Assurance
Software Quality AssuranceSoftware Quality Assurance
Software Quality Assurance
Saqib Raza
 
Software quality assurance
Software quality assuranceSoftware quality assurance
Software quality assurance
Er. Nancy
 

What's hot (7)

Roket Yazılımı Eğitimi Hafta 1
Roket Yazılımı Eğitimi Hafta 1Roket Yazılımı Eğitimi Hafta 1
Roket Yazılımı Eğitimi Hafta 1
 
Introduction To Software Configuration Management
Introduction To Software Configuration ManagementIntroduction To Software Configuration Management
Introduction To Software Configuration Management
 
AI for Software Engineering
AI for Software EngineeringAI for Software Engineering
AI for Software Engineering
 
Types of testing
Types of testingTypes of testing
Types of testing
 
Software testing tools (free and open source)
Software testing tools (free and open source)Software testing tools (free and open source)
Software testing tools (free and open source)
 
Software Quality Assurance
Software Quality AssuranceSoftware Quality Assurance
Software Quality Assurance
 
Software quality assurance
Software quality assuranceSoftware quality assurance
Software quality assurance
 

Similar to Automated product categorization

Internals of Presto Service
Internals of Presto ServiceInternals of Presto Service
Internals of Presto Service
Treasure Data, Inc.
 
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Databricks
 
KP Partners: DataStax and Analytics Implementation Methodology
KP Partners: DataStax and Analytics Implementation MethodologyKP Partners: DataStax and Analytics Implementation Methodology
KP Partners: DataStax and Analytics Implementation Methodology
DataStax Academy
 
[db tech showcase Tokyo 2017] C34: Replacing Oracle Database at DBS Bank ~Ora...
[db tech showcase Tokyo 2017] C34: Replacing Oracle Database at DBS Bank ~Ora...[db tech showcase Tokyo 2017] C34: Replacing Oracle Database at DBS Bank ~Ora...
[db tech showcase Tokyo 2017] C34: Replacing Oracle Database at DBS Bank ~Ora...
Insight Technology, Inc.
 
Boosting the Performance of your Rails Apps
Boosting the Performance of your Rails AppsBoosting the Performance of your Rails Apps
Boosting the Performance of your Rails Apps
Matt Kuklinski
 
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The SequelSilicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Daniel Coupal
 
Performance Testing Java Applications
Performance Testing Java ApplicationsPerformance Testing Java Applications
Performance Testing Java Applications
C4Media
 
Visual Studio Profiler
Visual Studio ProfilerVisual Studio Profiler
Visual Studio Profiler
Betclic Everest Group Tech Team
 
The Diabolical Developers Guide to Performance Tuning
The Diabolical Developers Guide to Performance TuningThe Diabolical Developers Guide to Performance Tuning
The Diabolical Developers Guide to Performance Tuning
jClarity
 
Webinar: Best Practices for Upgrading to MongoDB 3.2
Webinar: Best Practices for Upgrading to MongoDB 3.2Webinar: Best Practices for Upgrading to MongoDB 3.2
Webinar: Best Practices for Upgrading to MongoDB 3.2
Dana Elisabeth Groce
 
Technical Webinar: Patterns for Integrating Your Salesforce App with Off-Plat...
Technical Webinar: Patterns for Integrating Your Salesforce App with Off-Plat...Technical Webinar: Patterns for Integrating Your Salesforce App with Off-Plat...
Technical Webinar: Patterns for Integrating Your Salesforce App with Off-Plat...
CodeScience
 
AWS re:Invent 2014 | (ARC202) Real-World Real-Time Analytics
AWS re:Invent 2014 | (ARC202) Real-World Real-Time AnalyticsAWS re:Invent 2014 | (ARC202) Real-World Real-Time Analytics
AWS re:Invent 2014 | (ARC202) Real-World Real-Time Analytics
Socialmetrix
 
Sumo Logic Quickstart - Jan 2017
Sumo Logic Quickstart - Jan 2017Sumo Logic Quickstart - Jan 2017
Sumo Logic Quickstart - Jan 2017
Sumo Logic
 
CQRS recepies
CQRS recepiesCQRS recepies
CQRS recepies
Francesco Garavaglia
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it Work
Ivo Andreev
 
Alfresco Business Reporting - Tech Talk Live 20130501
Alfresco Business Reporting - Tech Talk Live 20130501Alfresco Business Reporting - Tech Talk Live 20130501
Alfresco Business Reporting - Tech Talk Live 20130501
Tjarda Peelen
 
Test strategy utilising mc useful tools
Test strategy utilising mc useful toolsTest strategy utilising mc useful tools
Test strategy utilising mc useful tools
Mark Chappell
 
Hardware Provisioning
Hardware ProvisioningHardware Provisioning
Hardware Provisioning
MongoDB
 
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical SolutionEnterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Dmitry Anoshin
 
Design principles & quality factors
Design principles & quality factorsDesign principles & quality factors
Design principles & quality factors
Aalia Barbe
 

Similar to Automated product categorization (20)

Internals of Presto Service
Internals of Presto ServiceInternals of Presto Service
Internals of Presto Service
 
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
 
KP Partners: DataStax and Analytics Implementation Methodology
KP Partners: DataStax and Analytics Implementation MethodologyKP Partners: DataStax and Analytics Implementation Methodology
KP Partners: DataStax and Analytics Implementation Methodology
 
[db tech showcase Tokyo 2017] C34: Replacing Oracle Database at DBS Bank ~Ora...
[db tech showcase Tokyo 2017] C34: Replacing Oracle Database at DBS Bank ~Ora...[db tech showcase Tokyo 2017] C34: Replacing Oracle Database at DBS Bank ~Ora...
[db tech showcase Tokyo 2017] C34: Replacing Oracle Database at DBS Bank ~Ora...
 
Boosting the Performance of your Rails Apps
Boosting the Performance of your Rails AppsBoosting the Performance of your Rails Apps
Boosting the Performance of your Rails Apps
 
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The SequelSilicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
 
Performance Testing Java Applications
Performance Testing Java ApplicationsPerformance Testing Java Applications
Performance Testing Java Applications
 
Visual Studio Profiler
Visual Studio ProfilerVisual Studio Profiler
Visual Studio Profiler
 
The Diabolical Developers Guide to Performance Tuning
The Diabolical Developers Guide to Performance TuningThe Diabolical Developers Guide to Performance Tuning
The Diabolical Developers Guide to Performance Tuning
 
Webinar: Best Practices for Upgrading to MongoDB 3.2
Webinar: Best Practices for Upgrading to MongoDB 3.2Webinar: Best Practices for Upgrading to MongoDB 3.2
Webinar: Best Practices for Upgrading to MongoDB 3.2
 
Technical Webinar: Patterns for Integrating Your Salesforce App with Off-Plat...
Technical Webinar: Patterns for Integrating Your Salesforce App with Off-Plat...Technical Webinar: Patterns for Integrating Your Salesforce App with Off-Plat...
Technical Webinar: Patterns for Integrating Your Salesforce App with Off-Plat...
 
AWS re:Invent 2014 | (ARC202) Real-World Real-Time Analytics
AWS re:Invent 2014 | (ARC202) Real-World Real-Time AnalyticsAWS re:Invent 2014 | (ARC202) Real-World Real-Time Analytics
AWS re:Invent 2014 | (ARC202) Real-World Real-Time Analytics
 
Sumo Logic Quickstart - Jan 2017
Sumo Logic Quickstart - Jan 2017Sumo Logic Quickstart - Jan 2017
Sumo Logic Quickstart - Jan 2017
 
CQRS recepies
CQRS recepiesCQRS recepies
CQRS recepies
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it Work
 
Alfresco Business Reporting - Tech Talk Live 20130501
Alfresco Business Reporting - Tech Talk Live 20130501Alfresco Business Reporting - Tech Talk Live 20130501
Alfresco Business Reporting - Tech Talk Live 20130501
 
Test strategy utilising mc useful tools
Test strategy utilising mc useful toolsTest strategy utilising mc useful tools
Test strategy utilising mc useful tools
 
Hardware Provisioning
Hardware ProvisioningHardware Provisioning
Hardware Provisioning
 
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical SolutionEnterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
 
Design principles & quality factors
Design principles & quality factorsDesign principles & quality factors
Design principles & quality factors
 

More from Warply

Chatbot workshop - How to build one.#digitized16
Chatbot workshop - How to build one.#digitized16Chatbot workshop - How to build one.#digitized16
Chatbot workshop - How to build one.#digitized16
Warply
 
Chatbot workshop introduction.#digitized16
Chatbot workshop introduction.#digitized16 Chatbot workshop introduction.#digitized16
Chatbot workshop introduction.#digitized16
Warply
 
Warply Mobile Banking solutions
Warply Mobile Banking solutionsWarply Mobile Banking solutions
Warply Mobile Banking solutions
Warply
 
Warply Mobile Banking solutions
Warply Mobile Banking solutionsWarply Mobile Banking solutions
Warply Mobile Banking solutions
Warply
 
Chatbots - A new era in digital banking
Chatbots - A new era in digital bankingChatbots - A new era in digital banking
Chatbots - A new era in digital banking
Warply
 
The CNN Greece Case study
The CNN Greece Case studyThe CNN Greece Case study
The CNN Greece Case study
Warply
 
Programmatic Demystified (?)
Programmatic Demystified (?)Programmatic Demystified (?)
Programmatic Demystified (?)
Warply
 
In store Retail sap forum
In store Retail sap forum In store Retail sap forum
In store Retail sap forum
Warply
 
4th Mobile Marketing Event by Warply: Mobile: In-store Retail’s New Era_Lavip...
4th Mobile Marketing Event by Warply: Mobile: In-store Retail’s New Era_Lavip...4th Mobile Marketing Event by Warply: Mobile: In-store Retail’s New Era_Lavip...
4th Mobile Marketing Event by Warply: Mobile: In-store Retail’s New Era_Lavip...
Warply
 
4th Mobile Marketing Event by Warply: Mobile: In-store Retail’s New Era_Nestl...
4th Mobile Marketing Event by Warply: Mobile: In-store Retail’s New Era_Nestl...4th Mobile Marketing Event by Warply: Mobile: In-store Retail’s New Era_Nestl...
4th Mobile Marketing Event by Warply: Mobile: In-store Retail’s New Era_Nestl...
Warply
 
4th Mobile Marketing Event by Warply: Mobile: In-store Retail’s New Era_First...
4th Mobile Marketing Event by Warply: Mobile: In-store Retail’s New Era_First...4th Mobile Marketing Event by Warply: Mobile: In-store Retail’s New Era_First...
4th Mobile Marketing Event by Warply: Mobile: In-store Retail’s New Era_First...
Warply
 
4th Mobile Marketing Event by Warply: Mobile: In-store Retail’s New Era_iStor...
4th Mobile Marketing Event by Warply: Mobile: In-store Retail’s New Era_iStor...4th Mobile Marketing Event by Warply: Mobile: In-store Retail’s New Era_iStor...
4th Mobile Marketing Event by Warply: Mobile: In-store Retail’s New Era_iStor...
Warply
 
Approaching Customers on Mobile. Bonus: sneak peek of Warply Engage Platform 2.0
Approaching Customers on Mobile. Bonus: sneak peek of Warply Engage Platform 2.0Approaching Customers on Mobile. Bonus: sneak peek of Warply Engage Platform 2.0
Approaching Customers on Mobile. Bonus: sneak peek of Warply Engage Platform 2.0
Warply
 
Mobile Payments Event by Warply: Eurobank's presentation
Mobile Payments Event by Warply: Eurobank's presentationMobile Payments Event by Warply: Eurobank's presentation
Mobile Payments Event by Warply: Eurobank's presentation
Warply
 
Mobile Payments Event by Warply: Apple Pay
Mobile Payments Event by Warply: Apple PayMobile Payments Event by Warply: Apple Pay
Mobile Payments Event by Warply: Apple Pay
Warply
 
Data Privacy in Modern Advertisement
Data Privacy in Modern AdvertisementData Privacy in Modern Advertisement
Data Privacy in Modern Advertisement
Warply
 
Mobile Loyalty that works: a successful case study by Warply and Eurobank
Mobile Loyalty that works: a successful case study by Warply and Eurobank Mobile Loyalty that works: a successful case study by Warply and Eurobank
Mobile Loyalty that works: a successful case study by Warply and Eurobank
Warply
 
3rd Mobile Marketing Event by Warply: WIND Telecommunications Hellas presenta...
3rd Mobile Marketing Event by Warply: WIND Telecommunications Hellas presenta...3rd Mobile Marketing Event by Warply: WIND Telecommunications Hellas presenta...
3rd Mobile Marketing Event by Warply: WIND Telecommunications Hellas presenta...
Warply
 
3rd Mobile Marketing event by Warply: Mobile as a Revenue Channel
3rd Mobile Marketing event by Warply: Mobile as a Revenue Channel3rd Mobile Marketing event by Warply: Mobile as a Revenue Channel
3rd Mobile Marketing event by Warply: Mobile as a Revenue Channel
Warply
 
3rd Mobile Marketing event by Warply: Travelplanet24 presentation
3rd Mobile Marketing event by Warply: Travelplanet24 presentation3rd Mobile Marketing event by Warply: Travelplanet24 presentation
3rd Mobile Marketing event by Warply: Travelplanet24 presentation
Warply
 

More from Warply (20)

Chatbot workshop - How to build one.#digitized16
Chatbot workshop - How to build one.#digitized16Chatbot workshop - How to build one.#digitized16
Chatbot workshop - How to build one.#digitized16
 
Chatbot workshop introduction.#digitized16
Chatbot workshop introduction.#digitized16 Chatbot workshop introduction.#digitized16
Chatbot workshop introduction.#digitized16
 
Warply Mobile Banking solutions
Warply Mobile Banking solutionsWarply Mobile Banking solutions
Warply Mobile Banking solutions
 
Warply Mobile Banking solutions
Warply Mobile Banking solutionsWarply Mobile Banking solutions
Warply Mobile Banking solutions
 
Chatbots - A new era in digital banking
Chatbots - A new era in digital bankingChatbots - A new era in digital banking
Chatbots - A new era in digital banking
 
The CNN Greece Case study
The CNN Greece Case studyThe CNN Greece Case study
The CNN Greece Case study
 
Programmatic Demystified (?)
Programmatic Demystified (?)Programmatic Demystified (?)
Programmatic Demystified (?)
 
In store Retail sap forum
In store Retail sap forum In store Retail sap forum
In store Retail sap forum
 
4th Mobile Marketing Event by Warply: Mobile: In-store Retail’s New Era_Lavip...
4th Mobile Marketing Event by Warply: Mobile: In-store Retail’s New Era_Lavip...4th Mobile Marketing Event by Warply: Mobile: In-store Retail’s New Era_Lavip...
4th Mobile Marketing Event by Warply: Mobile: In-store Retail’s New Era_Lavip...
 
4th Mobile Marketing Event by Warply: Mobile: In-store Retail’s New Era_Nestl...
4th Mobile Marketing Event by Warply: Mobile: In-store Retail’s New Era_Nestl...4th Mobile Marketing Event by Warply: Mobile: In-store Retail’s New Era_Nestl...
4th Mobile Marketing Event by Warply: Mobile: In-store Retail’s New Era_Nestl...
 
4th Mobile Marketing Event by Warply: Mobile: In-store Retail’s New Era_First...
4th Mobile Marketing Event by Warply: Mobile: In-store Retail’s New Era_First...4th Mobile Marketing Event by Warply: Mobile: In-store Retail’s New Era_First...
4th Mobile Marketing Event by Warply: Mobile: In-store Retail’s New Era_First...
 
4th Mobile Marketing Event by Warply: Mobile: In-store Retail’s New Era_iStor...
4th Mobile Marketing Event by Warply: Mobile: In-store Retail’s New Era_iStor...4th Mobile Marketing Event by Warply: Mobile: In-store Retail’s New Era_iStor...
4th Mobile Marketing Event by Warply: Mobile: In-store Retail’s New Era_iStor...
 
Approaching Customers on Mobile. Bonus: sneak peek of Warply Engage Platform 2.0
Approaching Customers on Mobile. Bonus: sneak peek of Warply Engage Platform 2.0Approaching Customers on Mobile. Bonus: sneak peek of Warply Engage Platform 2.0
Approaching Customers on Mobile. Bonus: sneak peek of Warply Engage Platform 2.0
 
Mobile Payments Event by Warply: Eurobank's presentation
Mobile Payments Event by Warply: Eurobank's presentationMobile Payments Event by Warply: Eurobank's presentation
Mobile Payments Event by Warply: Eurobank's presentation
 
Mobile Payments Event by Warply: Apple Pay
Mobile Payments Event by Warply: Apple PayMobile Payments Event by Warply: Apple Pay
Mobile Payments Event by Warply: Apple Pay
 
Data Privacy in Modern Advertisement
Data Privacy in Modern AdvertisementData Privacy in Modern Advertisement
Data Privacy in Modern Advertisement
 
Mobile Loyalty that works: a successful case study by Warply and Eurobank
Mobile Loyalty that works: a successful case study by Warply and Eurobank Mobile Loyalty that works: a successful case study by Warply and Eurobank
Mobile Loyalty that works: a successful case study by Warply and Eurobank
 
3rd Mobile Marketing Event by Warply: WIND Telecommunications Hellas presenta...
3rd Mobile Marketing Event by Warply: WIND Telecommunications Hellas presenta...3rd Mobile Marketing Event by Warply: WIND Telecommunications Hellas presenta...
3rd Mobile Marketing Event by Warply: WIND Telecommunications Hellas presenta...
 
3rd Mobile Marketing event by Warply: Mobile as a Revenue Channel
3rd Mobile Marketing event by Warply: Mobile as a Revenue Channel3rd Mobile Marketing event by Warply: Mobile as a Revenue Channel
3rd Mobile Marketing event by Warply: Mobile as a Revenue Channel
 
3rd Mobile Marketing event by Warply: Travelplanet24 presentation
3rd Mobile Marketing event by Warply: Travelplanet24 presentation3rd Mobile Marketing event by Warply: Travelplanet24 presentation
3rd Mobile Marketing event by Warply: Travelplanet24 presentation
 

Recently uploaded

Recent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS InfrastructureRecent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS Infrastructure
KAMAL CHOUDHARY
 
What’s New in Teams Calling, Meetings and Devices May 2024
What’s New in Teams Calling, Meetings and Devices May 2024What’s New in Teams Calling, Meetings and Devices May 2024
What’s New in Teams Calling, Meetings and Devices May 2024
Stephanie Beckett
 
20240705 QFM024 Irresponsible AI Reading List June 2024
20240705 QFM024 Irresponsible AI Reading List June 202420240705 QFM024 Irresponsible AI Reading List June 2024
20240705 QFM024 Irresponsible AI Reading List June 2024
Matthew Sinclair
 
Research Directions for Cross Reality Interfaces
Research Directions for Cross Reality InterfacesResearch Directions for Cross Reality Interfaces
Research Directions for Cross Reality Interfaces
Mark Billinghurst
 
Best Programming Language for Civil Engineers
Best Programming Language for Civil EngineersBest Programming Language for Civil Engineers
Best Programming Language for Civil Engineers
Awais Yaseen
 
Manual | Product | Research Presentation
Manual | Product | Research PresentationManual | Product | Research Presentation
Manual | Product | Research Presentation
welrejdoall
 
The Rise of Supernetwork Data Intensive Computing
The Rise of Supernetwork Data Intensive ComputingThe Rise of Supernetwork Data Intensive Computing
The Rise of Supernetwork Data Intensive Computing
Larry Smarr
 
The Increasing Use of the National Research Platform by the CSU Campuses
The Increasing Use of the National Research Platform by the CSU CampusesThe Increasing Use of the National Research Platform by the CSU Campuses
The Increasing Use of the National Research Platform by the CSU Campuses
Larry Smarr
 
Coordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar SlidesCoordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar Slides
Safe Software
 
How Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdfHow Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdf
HackersList
 
20240702 Présentation Plateforme GenAI.pdf
20240702 Présentation Plateforme GenAI.pdf20240702 Présentation Plateforme GenAI.pdf
20240702 Présentation Plateforme GenAI.pdf
Sally Laouacheria
 
Quantum Communications Q&A with Gemini LLM
Quantum Communications Q&A with Gemini LLMQuantum Communications Q&A with Gemini LLM
Quantum Communications Q&A with Gemini LLM
Vijayananda Mohire
 
Comparison Table of DiskWarrior Alternatives.pdf
Comparison Table of DiskWarrior Alternatives.pdfComparison Table of DiskWarrior Alternatives.pdf
Comparison Table of DiskWarrior Alternatives.pdf
Andrey Yasko
 
How RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptxHow RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptx
SynapseIndia
 
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALLBLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
Liveplex
 
DealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 editionDealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 edition
Yevgen Sysoyev
 
UiPath Community Day Kraków: Devs4Devs Conference
UiPath Community Day Kraków: Devs4Devs ConferenceUiPath Community Day Kraków: Devs4Devs Conference
UiPath Community Day Kraków: Devs4Devs Conference
UiPathCommunity
 
Transcript: Details of description part II: Describing images in practice - T...
Transcript: Details of description part II: Describing images in practice - T...Transcript: Details of description part II: Describing images in practice - T...
Transcript: Details of description part II: Describing images in practice - T...
BookNet Canada
 
Calgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptxCalgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptx
ishalveerrandhawa1
 
Cookies program to display the information though cookie creation
Cookies program to display the information though cookie creationCookies program to display the information though cookie creation
Cookies program to display the information though cookie creation
shanthidl1
 

Recently uploaded (20)

Recent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS InfrastructureRecent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS Infrastructure
 
What’s New in Teams Calling, Meetings and Devices May 2024
What’s New in Teams Calling, Meetings and Devices May 2024What’s New in Teams Calling, Meetings and Devices May 2024
What’s New in Teams Calling, Meetings and Devices May 2024
 
20240705 QFM024 Irresponsible AI Reading List June 2024
20240705 QFM024 Irresponsible AI Reading List June 202420240705 QFM024 Irresponsible AI Reading List June 2024
20240705 QFM024 Irresponsible AI Reading List June 2024
 
Research Directions for Cross Reality Interfaces
Research Directions for Cross Reality InterfacesResearch Directions for Cross Reality Interfaces
Research Directions for Cross Reality Interfaces
 
Best Programming Language for Civil Engineers
Best Programming Language for Civil EngineersBest Programming Language for Civil Engineers
Best Programming Language for Civil Engineers
 
Manual | Product | Research Presentation
Manual | Product | Research PresentationManual | Product | Research Presentation
Manual | Product | Research Presentation
 
The Rise of Supernetwork Data Intensive Computing
The Rise of Supernetwork Data Intensive ComputingThe Rise of Supernetwork Data Intensive Computing
The Rise of Supernetwork Data Intensive Computing
 
The Increasing Use of the National Research Platform by the CSU Campuses
The Increasing Use of the National Research Platform by the CSU CampusesThe Increasing Use of the National Research Platform by the CSU Campuses
The Increasing Use of the National Research Platform by the CSU Campuses
 
Coordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar SlidesCoordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar Slides
 
How Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdfHow Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdf
 
20240702 Présentation Plateforme GenAI.pdf
20240702 Présentation Plateforme GenAI.pdf20240702 Présentation Plateforme GenAI.pdf
20240702 Présentation Plateforme GenAI.pdf
 
Quantum Communications Q&A with Gemini LLM
Quantum Communications Q&A with Gemini LLMQuantum Communications Q&A with Gemini LLM
Quantum Communications Q&A with Gemini LLM
 
Comparison Table of DiskWarrior Alternatives.pdf
Comparison Table of DiskWarrior Alternatives.pdfComparison Table of DiskWarrior Alternatives.pdf
Comparison Table of DiskWarrior Alternatives.pdf
 
How RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptxHow RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptx
 
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALLBLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
 
DealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 editionDealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 edition
 
UiPath Community Day Kraków: Devs4Devs Conference
UiPath Community Day Kraków: Devs4Devs ConferenceUiPath Community Day Kraków: Devs4Devs Conference
UiPath Community Day Kraków: Devs4Devs Conference
 
Transcript: Details of description part II: Describing images in practice - T...
Transcript: Details of description part II: Describing images in practice - T...Transcript: Details of description part II: Describing images in practice - T...
Transcript: Details of description part II: Describing images in practice - T...
 
Calgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptxCalgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptx
 
Cookies program to display the information though cookie creation
Cookies program to display the information though cookie creationCookies program to display the information though cookie creation
Cookies program to display the information though cookie creation
 

Automated product categorization

  • 2. Chapters 1. Introduction 2. Service Architecture 3. Data 4. Training 5. Inference 6. Evaluation
  • 3. #1 What is Skroutz.gr? • Skroutz.gr is a marketplace & shopping assistant which makes online shopping easier and more reliable • It includes more than 11,000,000 products from 3,200 different e-shops • On a monthly basis the website welcomes more than 8 million unique visitors ranking in the top positions in the Greek Web
  • 4. #1 Some Numbers 3,200 merchants 11 million products 270 mil. pageviews / mo 1.1 mil. searches/day 33 mil. sessions/mo
  • 5. #1 The Problem • Each day we collect thousands of new products by downloading e-shop feeds (XML, CSV etc. - product catalogs) • We want to categorize incoming product payloads as provided by eshops to the most relevant categories in Skroutz category tree taxonomy with the minimum human intervention. - Difficult - Important
  • 6. #1 Why Difficult? • Many leaf categories in Skroutz taxonomy (>2k) • Sibling categories (subjective categorization) • Misleading product titles and shop-categories from shops
  • 7. #1 Why Important? Robot MO collects products from shop feeds and stores them to DB Megatron category classifier categorizes products to the correct category Tron groups similar products to entities called SKUs to be ready for indexing Elasticsearch indexes products to be searchable from user interface
  • 8. #1 Facts •Merchants send more than ~15k new products every day in Skroutz!!! •2.3k unique leaf categories in our category tree (taxonomy) •Manual “move-to-category” action: - Costs ~7.8s on average for content managers - Subjective decisions may add extra overhead
  • 9. #1 Old Solution - Overview •Use Elasticseach to match specific product attributes: - PN (manufacturer part number) - Name - Shop-category •Aggregate matches and group by categories •Normalize results and use custom weights to calculate a score •Take Top-K results
  • 10. #1 Old Solution - Limitations •Plain cosine similarity distance on TF/IDF weights: - No learning feedback loop - No advanced statistics utilization (e.g. correlation between price value and text features) •No easy way to tune custom weights applied on final scoring •Heuristics don’t take into account category specific context •Heuristics don’t take into account word level context. E.g. word “samsung” is followed by word “galaxy” most of the time and then probably follows a model number.
  • 11. #1 Old Solution - Good Parts •Simple solution (except for custom scoring stuff) •Easy to debug •Easy to deploy •Online
  • 12. #1 New Solution - “Megatron”
  • 13. #1 Overview •Approach problem as a supervised learning task •Rely on probabilities to obtain a meaningful score •Use more features from multiple sources and use datasets •Learn new patterns and relations by training •Measure performance on dataset splits •Use a microservice to serve classification requests •Apply threshold for low confidence results
  • 14. Chapters 1. Introduction 2. Service Architecture 3. Data 4. Training 5. Inference 6. Evaluation
  • 15. #2 Service Architecture 1.Training Phase 2.Inference Phase 3.APIs
  • 17. #2.1 Training Phase 1. Export dataset (product features labeled with category_id) and upload to Swift 2. Download specific dataset version in “training VM” 3. Start a training session using a train/val split from dataset 4. Save best performing model params snapshot (based on validation set loss) 5. Compress and upload model params to Swift container
  • 18. #2.2 Inference Phase 1. Application Part: Send classification request upon new product arrivals: - Kafka producer (asynchronous request) - Megatron Client HTTP synchronous requests (2nd alternative) 2. Category Classifier Microservice Part: - Pop messages from stream (Kafka consumer) - Dispatch messages to in-memory Neural Network instance - Fetch predictions (scores) and post-back to Core Application API endpoint
  • 19. #2.3 APIs 1. Megatron microservice internal API - Common API (wraps Keras API) - Basic methods: ✓ build ✓ train ✓ save ✓ load ✓ predict - CLI commands
  • 20. #2.3 APIs(2) 1. Skroutz Application Ecosystem (Ruby client) - Megatron::Client ✓ Issues requests to microservice - Megatron DB model ✓ Stores prediction results - ApiController endpoint ✓ Receives callbacks from microservice
  • 21. Chapters 1. Introduction 2. Service Architecture 3. Data 4. Training 5. Inference 6. Evaluation
  • 22. #3 Data •Product attribute values (potential features) Product Name Shop manufacturer Part number EAN Price Shop category ... Samsung TV 32'' DF324 (PNDFD22) Full HD Black NEW Αρχική > Ηλεκτρονικά > Τηλεοράσεις PNDFD22 300 €
  • 23. #3 Data(2) •Training Dataset - Raw Features Image Numerical Categorical Label Text
  • 24. #3 Data(3) •Preprocessing - Text - Numerical - Categorical - Labels X y
  • 25. #3 Preprocessing - Text • Our best solution involves “Word Vectors” • Steps to prepare for word vectors: - Learn a words Vocabulary (mapping of words to numeric id) - Transform text sentences to Sequences of ids based on Vocabulary - Decide a representative sequence length (E.g. 60 words) - Apply zero padding (pre or post) and truncation to maintain a fixed length
  • 26. #3 Preprocessing - Text(2) • Use of Pretrained Embeddings (see W2Vec, FastText, GloVe etc.) • We use FastText library with skipgram algorithm (unsupervised) - https://fasttext.cc/docs/en/unsupervised-tutorial.html
  • 27. #3 Preprocessing - Text(3) • Embeddings: - Outputs 100 dim Vector - Total 1,500,000 rows (vocab) • 2 versions (Name, Shop-category)
  • 28. #3 Preprocessing - Numerical • “Pricevat” and “Name Length” values • Apply Standard Scaling
  • 29. #3 Preprocessing - Categorical • All discrete value attributes/features: - shop_id - matching Product PNs category_id list • One-Hot encoding:
  • 30. #3 Label Encoding • “category_id “ values are the “true” labels which should be learned by NN • One-Hot encoding • OR just use IDs and rely to “Keras” conventions (E.g. use an internal sparse categorical representation to save huge amounts of RAM)
  • 31. Chapters 1. Introduction 2. Service Architecture 3. Data 4. Training 5. Inference 6. Evaluation
  • 32. #4 Training 1.Basic Concepts 2.Model Architecture 3.Training “In Action”
  • 33. #4.1 Basic Concepts •Objective: - Find a combination of mathematical functions and a set of corresponding params to maximize prediction accuracy (or minimize error rate). - Ensure that the above generalizes well for production. - Learn params in an acceptable time window. •Experiment with Neural Network architectures •GPUS to the rescue (speedup x10)
  • 34. #4.1 Basic Concepts(2) •Loss function - Categorical Crossentropy •Optimizer - Adam (Gradient Descent) •Hyper-params - Mini-Batch Size - Learning Rate - Epochs
  • 35. #4.1 Validation •Why? - Simulate unseen data - Compare different: ✓ training methods ✓ hyper -params - Avoid Overfitting •Should be representative •Validation Strategy - 10% of whole Dataset - Stratification on Categories
  • 37. #4.2 Model Architecture •Hybrid End-to-End architecture •4 branches (4 input vectors): A. Name Features Branch B. Shop-Category Features Branch C. Basic Features Branch (Numerics, Categorical) D. Matching PNs Branch (Categorical) Text
  • 38. #4.2 Text Branches • Inspired by “Embed, Encode, Attend, Predict” - https://explosion.ai/blog/deep-learning-formula-nlp • Each of “name” and “shop-category” sequence flows through: - 1 x Embeddings Layer - 1 x Bi-LSTM Encoder - 1 x Attention Module - 1 x LSTM Encoder
  • 39. #4.2 Text Branches - Why LSTM? • LSTM stands for “Long Short Term Memory” Layer (Encoder): - Memory Cells / Captures context - Propagates signal from previous words to the next in a Sequence - 2 Stacked Layers performed better in our experiments - 128 dimension output vector - https://colah.github.io/posts/2015-08-Understanding-LSTMs/ 128dim #cells = sequence length
  • 40. #4.2 Text Branches - Why pay Attention? • Attention Mechanism: - Controll how much signal should be propagated to next layers - https://distill.pub/2016/augmented-rnns/
  • 41. #4.2 Other Branches • Basic Features Branch - Inputs a concatenation of basic feats - 1 Dense layer with #classes output - ReLU activation • Matching PNs Branch - Inputs a concatenation of PN feats - Short-circuited to final layer InputVector #classes (~2kforSkroutz)
  • 42. #4.2 Final Layer • Merging Layer - Concatenates all 4 branches outputs - softmax activation - Output: probabilities for each class
  • 43. #4.2
  • 44. #4.2 Model Architecture • Model Capacity/Complexity:
  • 45. #4.3 Training In Action - Model Selection •Conducted 100s of experiments with different combinations of features, layers, modules (e.g. Embeddings, Bag of Words, TF/IDF, LSTM, etc.) •10s of Ablations studies: remove specific features to see how performance is affected •Read many papers and applied some common tricks (Bi-LSTM, AdaptivePooling etc.) •It is an alchemy!
  • 46. #4.3 Training In Action - Tools •Training Scheduler Process runs weekly •CLI training commands - CUDA_VISIBLE_DEVICES=1 python -m category_classifier.cli scrooge --model end2end --train --epochs 8 --batch_size 128 •Model Versioning - E.g. “skroutz_models_2018_09_01_v1.tar.gz”
  • 47. #4.3 Training In Action Training run output example: GPU monitoring:
  • 48. #4.3 Training In Action Learning Curves (Tensorboard): Current best Previous Arch Current bestPrevious Arch
  • 49. Chapters 1. Introduction 2. Service Architecture 3. Data 4. Training 5. Inference 6. Evaluation
  • 51. #5.1 Inference Pipeline •Online execution: - preprocessing - vectorization - Prediction •Utilized by CategoryClassifier Class - Wrapper of external API •Utilize scikit-learn Pipelines - http://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html
  • 52. #5.2 Inference API • REPL • Kafka Worker • Flask App
  • 53. #5.3 Production •x2 inference VMs - inference1.skroutz.gr, inference2.skroutz.gr (Kafka Workers) •x2 Flavors (Greece, UK) •Grafana Monitoring for Kafka Part
  • 54. Chapters 1. Introduction 2. Service Architecture 3. Data 4. Training 5. Inference 6. Evaluation
  • 55. #6 Evaluation •More than 6% error rate reduction overall in Skroutz! •Currently, more than ~2 content-editor hours saved per day in Skroutz (this is scaling)! •Move operations from list with “uncategorized” products reduced significantly (by an order of magnitude)!
  • 56. #6 Performance Summary Success Rate Failure Rate No Prediction Rate Megatron Old Megatron Old Megatron Old Skroutz (GR) 2.3k categories 90.10% 82.6% 7.9% 13.8% 2% 3.5% 91.85% 85.7% 8.14% 14.32% N/A N/A Scrooge (UK) 350 categories 87.56% 38.9% 2.5% 26.24% 9.9% 58.48% 97.1% 93.67% 2.8% 6.32% N/A N/A
  • 58. #Future Improvements • Utilize Image Features (in End-To-End model) • Utilize Entity Recognition to extract more features • Find ways to utilize more features (color, sizes etc.) • Categorical Self-Trained Embeddings • Experiment with newer solutions like “Transformer”
  • 59. #Contact Info Andreas Loupasakis • Email: alup@skroutz.gr • Kaggle: https://www.kaggle.com/andreaslup • Twitter: https://twitter.com/andy_lupo • LinkedIn: https://www.linkedin.com/in/andreas-loupasakis-06399a47