- Taboola Blog
- Engineering
At Taboola, we work daily on improving our Deep-Learning-based content-recommendation model. We use it to suggest personalized news articles and ads to hundreds of millions users a day, so naturally we must stick to state-of-the-art deep learning modeling methods. But our job doesn’t end there – analyzing our results is a must too, and then we sometimes return to our data science roots and apply some very basic techniques. Let’s lay such a problem out. We are investigating a deep model that behaves rather strangely: it wins over our default model for what looks like a random group of advertisers, and loses for another group. This behavior is stable in the day to day, so it looks like there might be some inherent advertisers qualities (what we’ll call – campaign features) to blame for this. You can see a typical model behavior for 4 campaigns below. So we hypothesize that […]
In this blogpost I will describe how we, at Taboola, changed our metrics infrastructure twice as a result of continuous scaling in metrics volume. In the past two years, we moved from supporting 20 million metrics/min with Graphite, to 80 million metrics/min using Metrictank, and finally to a framework that will enable us to grow to over 100 million metrics/min, with Prometheus and Thanos. The journey to scale begins Taboola is constantly growing. Our publishers and advertisers increase exponentially, thus our data increases, leading to a constant growth in metrics volume. We started with a basic metrics configuration of Graphite servers. We used a Graphite Reporter component to get a snapshot of metrics from MetricRegistry (a 3rd party collection of metrics belonging to dropwizard that we used) every minute, and sent them in batches to RabbitMq for the carbon-relays to consume. The carbons are part of Graphite’s backend, and are […]
Sometimes we need to test urgent features fast. It has to be within a very short timeframe, when there is not enough time to run a full test plan for that feature. This might occur on different occasions. When not having enough manpower in QA to cover a full test plan for a feature. New special demands from an important client right before the release deadline. Product management needs new adjustments before the developer deploying a new product version. It can also happen when a client, team lead or PM wants a new feature and it should have been done YESTERDAY! It can also happen actively. Running every once in a while a wide post-production test, or dedicating limited time for a bug hunt. We at the Taboola Video Solution department call it “Search for a Bug Thursday”. This unplanned development might end up launching a “half baked” product. It […]
Our core business at Taboola is to provide the surfers-of-the-web with personalized content recommendations wherever they might surf. We do so using state of the art Deep Learning methods, which learn what to display to each user from our growing pool of articles and advertisements. But as we challenge ourselves manifesting better models and better predictions, we also find ourselves constantly facing another issue – how do we not listen to our models. Or in other words: how do we explore better? As I’ve just mentioned, our pool of articles is growing, meaning more and more items are added each minute – and from an AI perspective, this is a major issue we must tackle, because by the time we finish training a new model and push it to production, it will already have to deal with items that never existed in its training data. In a previous post, I’ve […]
Introduction Newsrooms are under constant pressure to deliver the most up to date, relevant, and engaging information possible. At Taboola, we are building tools to make this faster, easier, and now–predictable. As soon as an article is published the team has a critical eye on engagement data. Garnering insight on article performance as soon as possible is critical for guiding content strategy. Some articles receive wide attention immediately, drawing hundreds of thousands of page views within minutes, others may only see the first page view after a few hours. Taboola aims to narrow this gap even further by leveraging Machine Learning Models to predict article performance the moment after it becomes available to the reader. Read on for details on our latest research and fascinating discoveries around predicting article performance! Article Data Taboola Newsroom is a real-time optimization technology that empowers editorial teams with actionable data around what stories, headlines, […]
At Taboola, our goal is to predict whether users will click on the ads we present to them. Our models use all kinds of features, yet the most interesting ones tend to be related to the users’ history. Understanding how to use these features well can have a huge impact on the model’s personalization capabilities, due to the user-specific knowledge they hold. User history features vary strongly between different users; for example, one popular feature is user categories – the topics a user had previously read. An example for such a list might look like this – {“sports”, “business”, “news”}. Each value in these lists is categorical and they have multiple entries, so we name them Multi-Categorical features. Multi-Categorical lists can have any number of values per user – which means our model must handle both very long lists and completely empty lists (for new users). Supplying inputs of unknown length […]
My First Time Running a Hackathon I’ve planned events before – but never anything like this 48-hour marathon spanning 200 participants across 3 continents and 2 time zones! When I was given the opportunity I was extremely excited, but at the same time somewhat anxious. Would I succeed in matching everyone’s energy and meeting their expectations? The task was daunting, but I took it slow and steady, step by step. First, I designed (with help from our talented graphic designer) a cool and eye-catching theme that would decorate all of our hackathon materials. Decorating slide decks, headers, banners, and t-shirts, our hackathon branding quickly became an R&D favorite. With the event only a month away, we held the official Taboola R&D Hackathon kickoff. Immediately, all of our participants – from engineers to designers to managers – began to gather in groups and cultivate project ideas. To encourage creativity and tech […]
I have not always worked at large scale companies such as Taboola. I have started my career in a small startup, where I was a full stack developer in a student role. As a first time developer, it was very important that I will be teamed-up with someone more experienced to learn from. Lucky for me, I was the only employee, working under the two founders – the CEO and CTO. Having a lot of one-on-one time with the CTO, working closely on bugs and features and discussing ideas, helped me improve my skills as a developer. Today, when I have more experience, I know how important it is to influence others who work with you. I try to spread the knowledge I have earned in the last few years, and help new and old employees as much as I can. In the day-to-day job, you mostly grow your skills […]
Discover the journey of creating synchronized analog clocks using microcontrollers. Learn about power-efficient design, NTP synchronization, and more.
A few years ago, one of my friends suggested me to become a cybersecurity teacher in high school once a week as part of a program called Gvahim. I have not planned that it will contribute to my professional career, but I find a lot of analogies to my day to day role. I hope you will enjoy a different angle of management 101 guidelines. Program overview The program’s goals were to increase the knowledge of high school students in cybersecurity and increase the number of girls who study computer science. For three years in the program, students studied about Assembly, networks and operating systems, with an emphasis on security. Unlike traditional materials learned in high school, the lessons in the program put an emphasis on self-learning. The first two semesters were dedicated to learning the theoretical background using self-reading and small coding exercises. The last semester of the […]