This document describes a system called DeviceAnalyzer that builds predictive models in near-real time using Apache Spark and Apache Lucene. It discusses:
1) Integrating Spark and Lucene to enable column search capabilities in Spark and add Spark operations to Lucene.
2) Representing Spark DataFrames as Lucene documents to build a distributed Lucene index from DataFrames.
3) Using the index for tasks like searching devices matching a query, generating statistical and predictive models on retrieved devices, and finding dimensions correlated with selected devices.
4) Architectural components like Trapezium for batch, streaming, and API services and a LuceneDAO for indexing DataFrames and querying the index.
Report
Share
Report
Share
1 of 28
Download to read offline
More Related Content
Spark Summit EU talk by Debasish Das and Pramod Narasimha
1. FUSING APACHE SPARK AND
LUCENE FOR NEAR-REALTIME
PREDICTIVE MODEL BUILDING
Debasish Das
Principal Engineer
Verizon
Contributors
Platform: Pankaj Rastogi, Venkat Chunduru, Ponrama Jegan, Masoud Tavazoei
Algorithm: Santanu Das, Debasish Das (Dave)
Frontend: Altaff Shaik, Jon Leonhardt
Pramod Lakshmi Narasimha
Principal Engineer
Verizon