SlideShare a Scribd company logo
1
ADOBE EXPERIENCE MANAGER
& EXTERNAL SEARCH PLATFORMS
Matthias Wermund, Senior Application Architect
2
WHY EXTERNAL SEARCH
Search is part of most implementation projects
• Most of today’s web sites offer any type of search feature
• Search exists in various flavors and mixtures
• Site search
• Typed search
• Search as navigation
• Relevance search
• Location based search
• AEM comes with its own search implementation
• “External search” in context of AEM means leveraging another platform,
hosted outside of the AEM Author/Publish environments
3
WHY EXTERNAL SEARCH
Publish search vs. Author search
• Publish search
• End user accessible
• Indexed content is in published state
• High frequency access
• Author search
• Internal AEM author search
• Index must include unpublished content
• Criteria can include additional content metadata
• Both are fundamentally different use cases with different index lifecycle
4
WHY EXTERNAL SEARCH
AEM standard search
• Part of AEM Java Content Repository (JCR) implementation
• AEM adds Predicate API layer
• Features
• Automatic index generation in all environments
• Full-text, facetted search
• Access restrictions based on repository access control lists (ACL)
• Used behind the scenes for many AEM features
So, why not always use JCR search? Here are some reasons.
5
WHY EXTERNAL SEARCH
Performance issues of standard search with growing result size
0
2
4
6
8
10
12
14
1 4 7 10 13 16 19 22 25 28
Time(s)
Results (x1000)
AEM query performance*
JCR Query Facet Generation
• Facet generation time
increases linear with
growing result size
• JCR query time not
impacted the same,
but increase is
noticeable
• Search results are
often impossible to
cache due to high
number of variations
* Synthetic content, full-text search, one facet, single requests
6
WHY EXTERNAL SEARCH
Scale independently of AEM
• External search platforms
decouple the search
infrastructure from AEM
• Search platform can scale
independently from AEM, both
horizontally and vertically
• Some platforms support cloud
deployments, e.g. ZooKeeper
for Apache Solr
• Client-side integration can fully
eliminate query impact on AEM
7
WHY EXTERNAL SEARCH
Extended feature offering
• External platforms provide functionality which AEM currently doesn’t bring
out-of-box
• A few examples:
• Geospatial search
• Dynamic relevance control
• Index-based type-ahead
• Index maintenance UI
• More about various possible uses of external index data later.
8
WHY EXTERNAL SEARCH
Search multiple data sources at once
• Search can span multiple
data sources besides AEM
• External platforms can join
data from any number and
type of source systems
• Users can query all data at
once, with combined
pagination, filters and
relevance calculation
9
• Created initially by CNET, since 2006 open source
• Incorporates and extends Apache Lucene
• Supports distributed indexing and searching
• Rich search capabilities
• HTTP interface, JSON/XML/BIN formats
• Integration clients
• Standalone Java web application
• Administration UI
• Widely used on prominent sites
wiki.apache.org/solr/PublicServers
APACHE SOLR
Lucene-powered Open Source Search Platform
10
• Schema fields
• Dynamic fields
• Custom field types
APACHE SOLR
Index schema configuration
11
• The main challenge is data extraction
• When should the extraction process get initiated?
• How to convert the AEM content tree structure into the index format?
• And how to transfer the converted data to the external platform?
• Once the external index is generated, data querying is a relatively easy step
• Query generation is highly specific to the use case
• Most search platforms offer standard interfaces or Java libraries to integrate
INDEXING AEM CONTENT
Steps to create an external index
12
• Pull
• Content downloaded by external search platform
• Platform needs trigger, e.g. scheduler
• Data generation can use same rendering as for user requests
• Push
• Data uploaded from AEM to external platform
• Can happened immediately on modification
• Requires to generate data standalone
• Combination is possible – Example:
• On modification, AEM notifies search platform (Push)
• Platform loads the modified content from AEM (Pull)
INDEXING AEM CONTENT
Integration patterns : Pull vs. Push
13
• Unstructured
• No index-specific format
• Metadata is extracted after loading
• Least effort, end user rendering can be used
• Structured
• Source data formatted to match search index structure
• Leaner
• Can carry different data than end user view
• Requires structure generation in AEM
• The typical unstructured pull data extraction is crawling
INDEXING AEM CONTENT
Integration patterns (II) : Unstructured vs. Structured
14
• EASE = External AEM Search Extension
• Primary goal of the framework is to reduce the complexity of integrating search
platforms with AEM
• The indexing approach is structured push triggered by content replication
• Open source, available starting today
• For documentation, API, Maven dependencies & more
see github.com/mwmd/ease
INDEXING AEM CONTENT
Introducing the EASE framework
15
• Supports generation of structured index data
• Binary asset indexing
• Integrates in AEM Author environment
• Incremental index updates triggered by AEM replication (push)
• Indexing of versioned content for scheduled replication
• Full index generation
• Generic integration with search platforms
• Apache Solr integration
INDEXING AEM CONTENT
(Current) EASE framework features
16
INDEXING AEM CONTENT
EASE Maven modules
17
INDEXING AEM CONTENT
EASE index generation approach
18
INDEXING AEM CONTENT
Basic sample project: ease/example
• Prerequisites
• AEM 5.6 Author
• Apache Solr 4.4
• Demonstrates use of
EASE framework
• No configuration
needed
• Uses Facets, full text
search, relevance
• Available on GitHub
19
INDEXING AEM CONTENT
Steps to integrate EASE and Solr into project
1. Include ease-core and ease-scr as Maven dependencies
2. Implement indexers matching your content
3. Create OSGi configurations:
• IndexService
• SolrIndexServer
4. Deploy to AEM Author:
• ease-core
• ease-solr & dependencies
When this is done, activated content will get automatically indexed.
20
INDEXING AEM CONTENT
Generation of index data with EASE
21
INDEXING AEM CONTENT
Resolving content structure with EASE
22
INDEXING AEM CONTENT
Encapsulate handling of proprietary requests
• IndexServer handles all communication with
search platform
• ease-core bundle doesn’t provide platform
specific implementation
• Implementation of IndexServer for Apache Solr in
ease-solr bundle
• New connectors to additional platforms are only
required to implement this interface
23
• When the data is indexed, it can get queried from custom components
• Leveraging platform specific features with proprietary clients
• While EASE currently focuses on simplifying the indexing, it helps with queries too
• EASE connector bundle per external search platform
• Proprietary clients are provided by the bundle (SolrJ for Apache Solr)
• In the following, an example implementation will walk through some use cases
USING THE EXTERNAL INDEX
24
USING THE EXTERNAL INDEX
Example implementation: AEM Know-How Database
• Central search for AEM
related information
• Uses EASE framework
• Server- and client-side
queries
• 50,000 pages in AEM
• stackoverflow
• Adobe offices
• 3,000 external pages
• Adobe AEM doc
• Adobe CRX doc
• Marketing Cloud doc
25
USING THE EXTERNAL INDEX
Example implementation: AEM Know-How Database
26
• User-input text query
• Query-based ranking
• Generation of extracts
and term highlighting
• Sorting on different
fields
USING THE EXTERNAL INDEX
Full-text search
Sorting
options
Pagination
Extract &
Highlighting
27
• Manipulation of
relevance calculation
• Boosting possible on
• Terms
• Fields
• Implementation can
leverage user data to
generate personalized
result (Client Context)
USING THE EXTERNAL INDEX
Boost manipulation / Personalization
Personalization
Same result
count…
…but different
ranking
28
USING THE EXTERNAL INDEX
Faceted search
• Navigation via facets
• AND combination of
multiple facet values
• Facet hit counts
calculated based on
current search resultFacet hits
29
• Offer search term
suggestions based on
user input
• Highly configurable
• Index data
• Dictionary
• Query parsing
• Client-side call to
Apache Solr
• Can use a standard
query or dedicated
feature
USING THE EXTERNAL INDEX
Type-ahead / Auto-complete / Auto-suggest
30
• Proximity search
• Distance calculation
• Sorting by distance
• Center of search is
dynamic, can be
based on user’s
location
(Client Context)
USING THE EXTERNAL INDEX
Geospatial search
Distance
calculation
Distance
sorting
Location-
based query
31
• Component rendering can include content maintained on other pages
• Aggregation logic could be easily mirrored in ResourceIndexer
• But: If the page-external content is modified, its activation won’t trigger re-
indexing of the aggregating page
• Use case:
• Inherited paragraph system
• Reference component
• Mitigation options:
• Content strategy: Index only standalone, unique page content
• Use WCM ReferenceSearch to find and re-index references
• Dangerous: reference loops, cascading re-indexing
REAL WORLD SEARCH IMPLEMENTATIONS
Handling aggregated page content
32
• External platforms have no information about AEM roles and permissions
• All index items are visible to everyone by default
• In some use cases, access to parts of the index must be restricted
• Use case:
• Closed User Groups
• Mitigation options:
• Check access rights for all items on the current result page at runtime
• Will break pagination information, type-ahead suggestions
• Performance hit
• Export effective role permissions as part of index item metadata
• Add filter for current user’s role to all queries or into search platform
• Requires re-indexing of content on ACL changes
• Only practical with a limited number of roles
REAL WORLD SEARCH IMPLEMENTATIONS
Permission Sensitive Search
33
REAL WORLD SEARCH IMPLEMENTATIONS
Index tuning
• Interpretation of raw index data dependent on
search technology and configuration
• Powerful platforms offer deep level of index
configuration
• Tuning of search behavior means significant effort
• Use case:
• Full text query and content parsing
• Type-ahead suggestions
• Result relevance calculation
List of available text
processors for Solr
34
Questions?
matthias.wermund@acquitygroup.com
Thank you!
ADOBE EXPERIENCE MANAGER
& EXTERNAL SEARCH PLATFORMS

More Related Content

EVOLVE'13 | Enhance | External Search | Matthias Wermund

  • 1. 1 ADOBE EXPERIENCE MANAGER & EXTERNAL SEARCH PLATFORMS Matthias Wermund, Senior Application Architect
  • 2. 2 WHY EXTERNAL SEARCH Search is part of most implementation projects • Most of today’s web sites offer any type of search feature • Search exists in various flavors and mixtures • Site search • Typed search • Search as navigation • Relevance search • Location based search • AEM comes with its own search implementation • “External search” in context of AEM means leveraging another platform, hosted outside of the AEM Author/Publish environments
  • 3. 3 WHY EXTERNAL SEARCH Publish search vs. Author search • Publish search • End user accessible • Indexed content is in published state • High frequency access • Author search • Internal AEM author search • Index must include unpublished content • Criteria can include additional content metadata • Both are fundamentally different use cases with different index lifecycle
  • 4. 4 WHY EXTERNAL SEARCH AEM standard search • Part of AEM Java Content Repository (JCR) implementation • AEM adds Predicate API layer • Features • Automatic index generation in all environments • Full-text, facetted search • Access restrictions based on repository access control lists (ACL) • Used behind the scenes for many AEM features So, why not always use JCR search? Here are some reasons.
  • 5. 5 WHY EXTERNAL SEARCH Performance issues of standard search with growing result size 0 2 4 6 8 10 12 14 1 4 7 10 13 16 19 22 25 28 Time(s) Results (x1000) AEM query performance* JCR Query Facet Generation • Facet generation time increases linear with growing result size • JCR query time not impacted the same, but increase is noticeable • Search results are often impossible to cache due to high number of variations * Synthetic content, full-text search, one facet, single requests
  • 6. 6 WHY EXTERNAL SEARCH Scale independently of AEM • External search platforms decouple the search infrastructure from AEM • Search platform can scale independently from AEM, both horizontally and vertically • Some platforms support cloud deployments, e.g. ZooKeeper for Apache Solr • Client-side integration can fully eliminate query impact on AEM
  • 7. 7 WHY EXTERNAL SEARCH Extended feature offering • External platforms provide functionality which AEM currently doesn’t bring out-of-box • A few examples: • Geospatial search • Dynamic relevance control • Index-based type-ahead • Index maintenance UI • More about various possible uses of external index data later.
  • 8. 8 WHY EXTERNAL SEARCH Search multiple data sources at once • Search can span multiple data sources besides AEM • External platforms can join data from any number and type of source systems • Users can query all data at once, with combined pagination, filters and relevance calculation
  • 9. 9 • Created initially by CNET, since 2006 open source • Incorporates and extends Apache Lucene • Supports distributed indexing and searching • Rich search capabilities • HTTP interface, JSON/XML/BIN formats • Integration clients • Standalone Java web application • Administration UI • Widely used on prominent sites wiki.apache.org/solr/PublicServers APACHE SOLR Lucene-powered Open Source Search Platform
  • 10. 10 • Schema fields • Dynamic fields • Custom field types APACHE SOLR Index schema configuration
  • 11. 11 • The main challenge is data extraction • When should the extraction process get initiated? • How to convert the AEM content tree structure into the index format? • And how to transfer the converted data to the external platform? • Once the external index is generated, data querying is a relatively easy step • Query generation is highly specific to the use case • Most search platforms offer standard interfaces or Java libraries to integrate INDEXING AEM CONTENT Steps to create an external index
  • 12. 12 • Pull • Content downloaded by external search platform • Platform needs trigger, e.g. scheduler • Data generation can use same rendering as for user requests • Push • Data uploaded from AEM to external platform • Can happened immediately on modification • Requires to generate data standalone • Combination is possible – Example: • On modification, AEM notifies search platform (Push) • Platform loads the modified content from AEM (Pull) INDEXING AEM CONTENT Integration patterns : Pull vs. Push
  • 13. 13 • Unstructured • No index-specific format • Metadata is extracted after loading • Least effort, end user rendering can be used • Structured • Source data formatted to match search index structure • Leaner • Can carry different data than end user view • Requires structure generation in AEM • The typical unstructured pull data extraction is crawling INDEXING AEM CONTENT Integration patterns (II) : Unstructured vs. Structured
  • 14. 14 • EASE = External AEM Search Extension • Primary goal of the framework is to reduce the complexity of integrating search platforms with AEM • The indexing approach is structured push triggered by content replication • Open source, available starting today • For documentation, API, Maven dependencies & more see github.com/mwmd/ease INDEXING AEM CONTENT Introducing the EASE framework
  • 15. 15 • Supports generation of structured index data • Binary asset indexing • Integrates in AEM Author environment • Incremental index updates triggered by AEM replication (push) • Indexing of versioned content for scheduled replication • Full index generation • Generic integration with search platforms • Apache Solr integration INDEXING AEM CONTENT (Current) EASE framework features
  • 17. 17 INDEXING AEM CONTENT EASE index generation approach
  • 18. 18 INDEXING AEM CONTENT Basic sample project: ease/example • Prerequisites • AEM 5.6 Author • Apache Solr 4.4 • Demonstrates use of EASE framework • No configuration needed • Uses Facets, full text search, relevance • Available on GitHub
  • 19. 19 INDEXING AEM CONTENT Steps to integrate EASE and Solr into project 1. Include ease-core and ease-scr as Maven dependencies 2. Implement indexers matching your content 3. Create OSGi configurations: • IndexService • SolrIndexServer 4. Deploy to AEM Author: • ease-core • ease-solr & dependencies When this is done, activated content will get automatically indexed.
  • 20. 20 INDEXING AEM CONTENT Generation of index data with EASE
  • 21. 21 INDEXING AEM CONTENT Resolving content structure with EASE
  • 22. 22 INDEXING AEM CONTENT Encapsulate handling of proprietary requests • IndexServer handles all communication with search platform • ease-core bundle doesn’t provide platform specific implementation • Implementation of IndexServer for Apache Solr in ease-solr bundle • New connectors to additional platforms are only required to implement this interface
  • 23. 23 • When the data is indexed, it can get queried from custom components • Leveraging platform specific features with proprietary clients • While EASE currently focuses on simplifying the indexing, it helps with queries too • EASE connector bundle per external search platform • Proprietary clients are provided by the bundle (SolrJ for Apache Solr) • In the following, an example implementation will walk through some use cases USING THE EXTERNAL INDEX
  • 24. 24 USING THE EXTERNAL INDEX Example implementation: AEM Know-How Database • Central search for AEM related information • Uses EASE framework • Server- and client-side queries • 50,000 pages in AEM • stackoverflow • Adobe offices • 3,000 external pages • Adobe AEM doc • Adobe CRX doc • Marketing Cloud doc
  • 25. 25 USING THE EXTERNAL INDEX Example implementation: AEM Know-How Database
  • 26. 26 • User-input text query • Query-based ranking • Generation of extracts and term highlighting • Sorting on different fields USING THE EXTERNAL INDEX Full-text search Sorting options Pagination Extract & Highlighting
  • 27. 27 • Manipulation of relevance calculation • Boosting possible on • Terms • Fields • Implementation can leverage user data to generate personalized result (Client Context) USING THE EXTERNAL INDEX Boost manipulation / Personalization Personalization Same result count… …but different ranking
  • 28. 28 USING THE EXTERNAL INDEX Faceted search • Navigation via facets • AND combination of multiple facet values • Facet hit counts calculated based on current search resultFacet hits
  • 29. 29 • Offer search term suggestions based on user input • Highly configurable • Index data • Dictionary • Query parsing • Client-side call to Apache Solr • Can use a standard query or dedicated feature USING THE EXTERNAL INDEX Type-ahead / Auto-complete / Auto-suggest
  • 30. 30 • Proximity search • Distance calculation • Sorting by distance • Center of search is dynamic, can be based on user’s location (Client Context) USING THE EXTERNAL INDEX Geospatial search Distance calculation Distance sorting Location- based query
  • 31. 31 • Component rendering can include content maintained on other pages • Aggregation logic could be easily mirrored in ResourceIndexer • But: If the page-external content is modified, its activation won’t trigger re- indexing of the aggregating page • Use case: • Inherited paragraph system • Reference component • Mitigation options: • Content strategy: Index only standalone, unique page content • Use WCM ReferenceSearch to find and re-index references • Dangerous: reference loops, cascading re-indexing REAL WORLD SEARCH IMPLEMENTATIONS Handling aggregated page content
  • 32. 32 • External platforms have no information about AEM roles and permissions • All index items are visible to everyone by default • In some use cases, access to parts of the index must be restricted • Use case: • Closed User Groups • Mitigation options: • Check access rights for all items on the current result page at runtime • Will break pagination information, type-ahead suggestions • Performance hit • Export effective role permissions as part of index item metadata • Add filter for current user’s role to all queries or into search platform • Requires re-indexing of content on ACL changes • Only practical with a limited number of roles REAL WORLD SEARCH IMPLEMENTATIONS Permission Sensitive Search
  • 33. 33 REAL WORLD SEARCH IMPLEMENTATIONS Index tuning • Interpretation of raw index data dependent on search technology and configuration • Powerful platforms offer deep level of index configuration • Tuning of search behavior means significant effort • Use case: • Full text query and content parsing • Type-ahead suggestions • Result relevance calculation List of available text processors for Solr