SlideShare a Scribd company logo
1© 2014 SAP AG or an SAP affiliate company. All rights reserved.
SAP HANA SPS 11 - What’s New?
Search, Text Analysis and Text Mining
SAP HANA Product Management December, 2015
(Delta from SPS 10 to SPS 11)
Search
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 3Public
Table
Search Models
In a search model you define the structure of your “search object”
and how it is exposed to an application
 Tables and joins
 Columns
– Defaults for search
– Weights for ranking
– Fuzziness
– Defaults for facets
Table
Model
Access
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 4Public
*any* View
search annotations
Search Models and Data Access
Table
Table
Model
Access
Table
Table
OData SQL
CDS View
w/ search annotations
Table
Table
OData SQL
*any* View
search annotations
JSON JSON JSON
CALL ESH_CONFIG(configuration)
Built-in procedure to add search annotations
(request/response, facets, UI areas etc.) to views
CALL ESH_SEARCH(query,?)
Built-in procedure to search on multiple search
models with an “OData” query and a “JSON”
response
Fiori
SAP HANA SPS10 SAP HANA SPS11
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 5Public
ESH_CONFIG and ESH_SEARCH
SAP HANA SPS11 supports adding “search annotations” to existing views.
Search annotations are added in a CDS-like format, using the built-in procedure ESH_CONFIG.
ESH_SEARCH is the new search API
• Federated search across multiple search models in a single call
• Based on OData v4, response is JSON
• Search specific extensions, e.g. Search.score(), Search.search()
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 6Public
Search Model Example
CALL ESH_CONFIG('[{
"uri": "~/$metadata/EntitySets", "method": "PUT",
"content":{
"Fullname": "DMM264/V_DOCUMENTS",
"EntityType": {
"@Search.searchable": true,
"@EnterpriseSearch.enabled": true,
"Properties": [
{"Name": "ID", "@Search.defaultSearchElement": true, "@EnterpriseSearch.key": true, "@EnterpriseSearch.presentationMode": [ "TITLE" ]},
{"Name": "AUTHOR", "@EnterpriseSearch.usageMode": [ "AUTO_FACET" ],"@EnterpriseSearch.presentationMode": [ "SUMMARY" ]},
{"Name": "CATEGORY", "@EnterpriseSearch.usageMode": [ "AUTO_FACET" ],"@EnterpriseSearch.presentationMode": [ "SUMMARY" ]},
{"Name": "TITLE", "@Search.defaultSearchElement": true, "@EnterpriseSearch.highlighted.enabled": true, "@Search.ranking":
"HIGH","@EnterpriseSearch.presentationMode": [ "TITLE" ]},
{"Name": "CONTENT", "@Search.defaultSearchElement": true, "@EnterpriseSearch.snippets.enabled": true,
@Search.fuzzinessThreshold": 0.9, "@Search.ranking": "MEDIUM","@EnterpriseSearch.presentationMode": [ "DETAIL" ]}
]
}
}
}]
',?);
existing view
expose as facet
search in this column relevance ranking
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 7Public
Data Access Example
CALL ESH_SEARCH('[
"/$all?facets=all&$filter=Search.search(
query=''scope:V_DOCUMENTS merkel'')&$top=10"
]', ?);
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 8Public
Search Response Example
1st result item
2nd result item
1st facet
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 9Public
How to find SAP HANA documentation on this topic?
SAP HANA Advanced Data Processing
 What’s New in the SAP HANA Advanced Data
Processing (Release Notes)
 Development
– File Loader Guide for SAP HANA
– SAP HANA Search Developer Guide
 References
– SAP HANA INA Search JavaScript
• In addition to this learning material, you can find SAP HANA documentation on the
SAP Help Portal knowledge center at http://help.sap.com/hana_options_adp.
• The knowledge center is structured according to the product lifecycle: installation, security, administration,
development.
Text Analysis
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 11Public
Agenda – Text Analysis
New or Improved Features
 Grammatical Role Analysis
 Text Analysis XS API – Document Metadata
 Dictionaries – Case Sensitivity
 Language Column – SAP Language Codes
 Tolerant Stemming: Dutch, English, German, Italian
 Linguistic Analysis: Hungarian and Romanian
 Core Extraction: Korean
 Voice of Customer: English and German
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 12Public
New Grammatical Role Analysis (1/2)
Optional analyzer for English that identifies syntactic relationships between elements of a sentence in
the form of subject–verb–object expressions, commonly known as ‘triples’.
The [SUBJECT]big brown cat[/SUBJECT] on the red couch was [VERB]eating[/VERB] a
[DIRECTOBJECT]dead mouse[/DIRECTOBJECT].
The following grammatical roles describe arguments of verbs that are supported:
• Subject person, place, thing, or idea that is doing or being something: Oracle bought Responsys.
• DirectObject recipient of the action: Oracle bought Responsys.
• IndirectObject affected by the action but not primary object: Oracle offered Responsys an improved contract.
• OtherObject often prepositional object: They talked about the contract.
• Predicate object of the verb to be: This is a revised version.
An additional grammatical role supported, which does not describe a function with respect to a verb:
• PredicateSubject subject of a predicative expression: The contract is new.
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 13Public
New Grammatical Role Analysis (2/2)
Input:
Oracle was rumored to buy marketing-software maker Responsys Inc. for $1.5 billion.
Output:
TA_RULE TA_COUNTER TA_TOKEN TA_TYPE TA_PARENT TA_OFFSET
Entity Extraction 1 Oracle ORGANIZATION/COMMERCIAL ? 0
Entity Extraction 2 marketing-software maker NOUN_GROUP ? 26
Entity Extraction 3 Responsys Inc. ORGANIZATION/COMMERCIAL ? 51
Entity Extraction 4 $1.5 billion CURRENCY ? 70
Grammatical Role 5 Oracle Subject 7 0
Grammatical Role 6 Oracle Subject 8 0
Grammatical Role 7 rumored Root/MainVerb/Passive ? 11
Grammatical Role 8 buy MainVerb/Active ? 22
Grammatical Role 9 marketing-software maker Responsys Inc. DirectObject 8 26
Grammatical Role 10 $1.5 billion OtherObject/for 8 70
Notes:
• Core extraction is included in the configuration (1 - 4)
• Each grammatical role is either the governor (verb) or dependent (verb argument)
• TA_TYPE holds the details about its grammatical role
• TA_PARENT holds the TA_COUNTER value of its corresponding governor
• It is possible for a single dependent to be the argument (5 and 6) of two different verbs
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 14Public
Improved Text Analysis XS API – Document Metadata
For on-demand processing, text analysis output can be accessed via the SAP HANA Extended
Application Services (XS) API:
• Alternative to persisting output data to the $TA table
• Bypasses creating the full-text index
Now the following metadata properties for documents can be optionally included:
• Author
• Date
• Date Created
• Date Modified
• Description
• Keyword
• Language
• Subject
• Title
• Version
• FromEmailAddress
• FromName
• ToEmailAddress
• ToName
• CcEmailAddress
• CcName
• BccEmailAddress
• BccName
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 15Public
Improved Dictionaries – Case Sensitivity
In the context of extraction,
dictionaries are user-defined
repositories of entities.
Dictionaries are used for customized information about the
entities your application must find.
Dictionaries can be used to store name variations in a structured
way that is accessible through the extraction process.
Dictionary XML syntax now includes the option:
<dictionary xmlns=“http://www.sap.com/ta/4.0” case-sensitive=“true”>
: :
</dictionary>
For example, adding the attribute will ensure the dictionary entry
WHO will match WHO and not who or Who.
The default behavior is case-insensitive.
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 16Public
Improved Language Column – SAP Language Codes
You can specify the input language for each row.
Use LANGUAGE COLUMN to bypass automatic language detection:
 Specify ISO 639 language code or…
 SAP language codes can now be optionally utilized
 English = E, French = F, German = D, etc.
This option allows configuring full-text search over existing SAP business applications without
modifying the underlying database tables.
CREATE FULLTEXT INDEX PRODUCT_REVIEWS_IDX ON PRODUCT_REVIEWS(CONTENT) FAST PREPROCESS OFF
LANGUAGE COLUMN LANGUAGE;
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 17Public
Improved Stemming – Dutch, English, German, Italian
Stemming identifies the base form referenced in a dictionary.
Tolerant stemming is introduced for Dutch, English, German and Italian. This default behavior allows
for handling non-standard spellings to better maximize recall.
For example in English, the stemmer handles spelling variation found in American and British English;
does not require correct capitalization and accentuation and allows required hyphens to be optional.
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 18Public
Improved Language Support for Hungarian and Romanian
Full linguistic analysis support by adding Part-of-
Speech (POS) tagging and Noun Group (concept)
extraction for Hungarian and Romanian.
Language
LINGANALYSIS_BASIC
LINGANALYSIS_STEMS
LINGANALYSIS_FULL
Arabic  
Catalan  
Chinese (Simplified)  
Chinese (Traditional)  
Croatian  
Czech  
Danish  
Dutch  
English  
Farsi  
French  
German  
Greek 
Hebrew  
Hungarian   NEW
Indonesian  
Italian  
Japanese  
Korean  
Norwegian (Bokmal)  
Norwegian (Nynorsk)  
Polish  
Portuguese  
Romanian   NEW
Russian  
Serbian  
Slovak  
Slovenian  
Spanish  
Swedish  
Thai  
Turkish  
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 19Public
TITLE President
PERSON Barak Obama
LOCALITY Cambridge
REGION@MINOR Napa County
REGION@MAJOR Connecticut
COUNTRY Brazil
CONTINENT South America
GEO_FEATURE Mount Fuji
GEO_AREA Scandinavia
FACILITY Logan International Airport
LOCALITY New Delhi
ORGANIZATION@COMMERCIAL AT&T
ORGANIZATION@EDUCATIONAL University of Washington
ORGANIZATION@OTHER FBI
SOCIAL_MEDIA@TWITTER_ID @SAP
SOCIAL_MEDIA@TWITTER_TOPIC #HANA
DATE 2/14/2011
DAY Monday
MONTH June
YEAR 2011
PHONE 617-677-2030
URI@EMAIL john.smith@sap.com
URI@URL http://sap.com
Improved Core Extraction for Korean
Higher precision and recall on existing predefined core extractions.
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 20Public
Improved Voice of Customer – English and German
Set of rules to extract sentiments expressed about a product or a service:
I [love] [my new phone]!  Strong positive sentiment about my new phone
He did [not like] [the book].  Weak negative sentiment about the book
More granular than competing systems because it can link sentiments with topics:
[love my new phone] = Sentiment
‘love’ = StrongPositiveSentiment
‘my new phone’ = Topic
Determiners (above strikethroughs) are now not included with the topic classifications for English and German.
This is a change to the previous behavior as it simplifies topic aggregation.
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 21Public
How to find SAP HANA documentation on this topic?
SAP HANA Advanced Data Processing
 What’s New in the SAP HANA Advanced Data
Processing (Release Notes)
 Development
– File Loader Guide for SAP HANA
– SAP HANA Search Developer Guide
– SAP HANA Text Analysis Developer Guide
 References
– SAP HANA Text Analysis Extraction Customization Guide
– SAP HANA Text Analysis Language Reference Guide
– SAP HANA Text Analysis XS JavaScript API
• In addition to this learning material, you can find SAP HANA documentation on the
SAP Help Portal knowledge center at http://help.sap.com/hana_options_adp.
• The knowledge center is structured according to the product lifecycle: installation, security, administration,
development.
Text Mining
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 23Public
Agenda – Text Mining
New or Improved Features
 Language Support
 Automatic Stop Words
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 24Public
Text Mining Recap
Text mining provides statistical functions that
can compare documents by examining the
terms used within them.
The term-document matrix (a.k.a. text mining
index) is an optional data structure that is
optimized through the results of text
analysis.
Text mining is bound to the full-text indexing
and text analysis process.
Full-textindex
Text
analysis
results
table
Full-text
indexing
with TA
and TM
Term-document
matrix
TM config.
Insert
ID TITLE
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 25Public
New Language Support
Text mining is now natively integrated with all 32 languages supported by text analysis.
It leverages the available text preprocessing steps:
 Tokenization
 Stemming
 Part-of-Speech tagging
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 26Public
Improved Stop Word Customization
Stop words are lists of literal terms to ignore in order to focus
on the important content.
Text mining automatically filters terms to include only nouns
based on their part-of-speech tags for 31 languages.
Optionally, users can manually include additional stop words
in the configuration properties.
Language Automatic Stop Words
Arabic 
Catalan 
Chinese (Simplified) 
Chinese (Traditional) 
Croatian 
Czech 
Danish 
Dutch 
English 
Farsi 
French 
German 
Greek
Hebrew 
Hungarian 
Indonesian 
Italian 
Japanese 
Korean 
Norwegian (Bokmal) 
Norwegian (Nynorsk) 
Polish 
Portuguese 
Romanian 
Russian 
Serbian 
Slovak 
Slovenian 
Spanish 
Swedish 
Thai 
Turkish 
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 27Public
How to find SAP HANA documentation on this topic?
SAP HANA Advanced Data Processing
 What’s New in the SAP HANA Advanced Data
Processing (Release Notes)
 Development
– File Loader Guide for SAP HANA
– SAP HANA Search Developer Guide
– SAP HANA Text Mining Developer Guide
 References
– SAP HANA Text Mining XS JavaScript API
– SQL Reference for Options
• In addition to this learning material, you can find SAP HANA documentation on the
SAP Help Portal knowledge center at http://help.sap.com/hana_options_adp.
• The knowledge center is structured according to the product lifecycle: installation, security, administration,
development.
© 2015 SAP SE or an SAP affiliate company. All rights reserved. 28Public
Disclaimer
This presentation outlines our general product direction and should not be relied on in making
a purchase decision. This presentation is not subject to your license agreement or any other
agreement with SAP.
SAP has no obligation to pursue any course of business outlined in this presentation or to
develop or release any functionality mentioned in this presentation. This presentation and
SAP’s strategy and possible future developments are subject to change and may be changed
by SAP at any time for any reason without notice.
This document is provided without a warranty of any kind, either express or implied, including
but not limited to, the implied warranties of merchantability, fitness for a particular purpose, or
non-infringement. SAP assumes no responsibility for errors or omissions in this document,
except if such damages were caused by SAP intentionally or grossly negligent.
© 2015 SAP SE or an SAP affiliate company. All rights reserved.
Thank you
Contact information
Anthony Waite
SAP HANA Product Management
AskSAPHANA@sap.com

More Related Content

What's new for Text in SAP HANA SPS 11

  • 1. 1© 2014 SAP AG or an SAP affiliate company. All rights reserved. SAP HANA SPS 11 - What’s New? Search, Text Analysis and Text Mining SAP HANA Product Management December, 2015 (Delta from SPS 10 to SPS 11)
  • 3. © 2015 SAP SE or an SAP affiliate company. All rights reserved. 3Public Table Search Models In a search model you define the structure of your “search object” and how it is exposed to an application  Tables and joins  Columns – Defaults for search – Weights for ranking – Fuzziness – Defaults for facets Table Model Access
  • 4. © 2015 SAP SE or an SAP affiliate company. All rights reserved. 4Public *any* View search annotations Search Models and Data Access Table Table Model Access Table Table OData SQL CDS View w/ search annotations Table Table OData SQL *any* View search annotations JSON JSON JSON CALL ESH_CONFIG(configuration) Built-in procedure to add search annotations (request/response, facets, UI areas etc.) to views CALL ESH_SEARCH(query,?) Built-in procedure to search on multiple search models with an “OData” query and a “JSON” response Fiori SAP HANA SPS10 SAP HANA SPS11
  • 5. © 2015 SAP SE or an SAP affiliate company. All rights reserved. 5Public ESH_CONFIG and ESH_SEARCH SAP HANA SPS11 supports adding “search annotations” to existing views. Search annotations are added in a CDS-like format, using the built-in procedure ESH_CONFIG. ESH_SEARCH is the new search API • Federated search across multiple search models in a single call • Based on OData v4, response is JSON • Search specific extensions, e.g. Search.score(), Search.search()
  • 6. © 2015 SAP SE or an SAP affiliate company. All rights reserved. 6Public Search Model Example CALL ESH_CONFIG('[{ "uri": "~/$metadata/EntitySets", "method": "PUT", "content":{ "Fullname": "DMM264/V_DOCUMENTS", "EntityType": { "@Search.searchable": true, "@EnterpriseSearch.enabled": true, "Properties": [ {"Name": "ID", "@Search.defaultSearchElement": true, "@EnterpriseSearch.key": true, "@EnterpriseSearch.presentationMode": [ "TITLE" ]}, {"Name": "AUTHOR", "@EnterpriseSearch.usageMode": [ "AUTO_FACET" ],"@EnterpriseSearch.presentationMode": [ "SUMMARY" ]}, {"Name": "CATEGORY", "@EnterpriseSearch.usageMode": [ "AUTO_FACET" ],"@EnterpriseSearch.presentationMode": [ "SUMMARY" ]}, {"Name": "TITLE", "@Search.defaultSearchElement": true, "@EnterpriseSearch.highlighted.enabled": true, "@Search.ranking": "HIGH","@EnterpriseSearch.presentationMode": [ "TITLE" ]}, {"Name": "CONTENT", "@Search.defaultSearchElement": true, "@EnterpriseSearch.snippets.enabled": true, @Search.fuzzinessThreshold": 0.9, "@Search.ranking": "MEDIUM","@EnterpriseSearch.presentationMode": [ "DETAIL" ]} ] } } }] ',?); existing view expose as facet search in this column relevance ranking
  • 7. © 2015 SAP SE or an SAP affiliate company. All rights reserved. 7Public Data Access Example CALL ESH_SEARCH('[ "/$all?facets=all&$filter=Search.search( query=''scope:V_DOCUMENTS merkel'')&$top=10" ]', ?);
  • 8. © 2015 SAP SE or an SAP affiliate company. All rights reserved. 8Public Search Response Example 1st result item 2nd result item 1st facet
  • 9. © 2015 SAP SE or an SAP affiliate company. All rights reserved. 9Public How to find SAP HANA documentation on this topic? SAP HANA Advanced Data Processing  What’s New in the SAP HANA Advanced Data Processing (Release Notes)  Development – File Loader Guide for SAP HANA – SAP HANA Search Developer Guide  References – SAP HANA INA Search JavaScript • In addition to this learning material, you can find SAP HANA documentation on the SAP Help Portal knowledge center at http://help.sap.com/hana_options_adp. • The knowledge center is structured according to the product lifecycle: installation, security, administration, development.
  • 11. © 2015 SAP SE or an SAP affiliate company. All rights reserved. 11Public Agenda – Text Analysis New or Improved Features  Grammatical Role Analysis  Text Analysis XS API – Document Metadata  Dictionaries – Case Sensitivity  Language Column – SAP Language Codes  Tolerant Stemming: Dutch, English, German, Italian  Linguistic Analysis: Hungarian and Romanian  Core Extraction: Korean  Voice of Customer: English and German
  • 12. © 2015 SAP SE or an SAP affiliate company. All rights reserved. 12Public New Grammatical Role Analysis (1/2) Optional analyzer for English that identifies syntactic relationships between elements of a sentence in the form of subject–verb–object expressions, commonly known as ‘triples’. The [SUBJECT]big brown cat[/SUBJECT] on the red couch was [VERB]eating[/VERB] a [DIRECTOBJECT]dead mouse[/DIRECTOBJECT]. The following grammatical roles describe arguments of verbs that are supported: • Subject person, place, thing, or idea that is doing or being something: Oracle bought Responsys. • DirectObject recipient of the action: Oracle bought Responsys. • IndirectObject affected by the action but not primary object: Oracle offered Responsys an improved contract. • OtherObject often prepositional object: They talked about the contract. • Predicate object of the verb to be: This is a revised version. An additional grammatical role supported, which does not describe a function with respect to a verb: • PredicateSubject subject of a predicative expression: The contract is new.
  • 13. © 2015 SAP SE or an SAP affiliate company. All rights reserved. 13Public New Grammatical Role Analysis (2/2) Input: Oracle was rumored to buy marketing-software maker Responsys Inc. for $1.5 billion. Output: TA_RULE TA_COUNTER TA_TOKEN TA_TYPE TA_PARENT TA_OFFSET Entity Extraction 1 Oracle ORGANIZATION/COMMERCIAL ? 0 Entity Extraction 2 marketing-software maker NOUN_GROUP ? 26 Entity Extraction 3 Responsys Inc. ORGANIZATION/COMMERCIAL ? 51 Entity Extraction 4 $1.5 billion CURRENCY ? 70 Grammatical Role 5 Oracle Subject 7 0 Grammatical Role 6 Oracle Subject 8 0 Grammatical Role 7 rumored Root/MainVerb/Passive ? 11 Grammatical Role 8 buy MainVerb/Active ? 22 Grammatical Role 9 marketing-software maker Responsys Inc. DirectObject 8 26 Grammatical Role 10 $1.5 billion OtherObject/for 8 70 Notes: • Core extraction is included in the configuration (1 - 4) • Each grammatical role is either the governor (verb) or dependent (verb argument) • TA_TYPE holds the details about its grammatical role • TA_PARENT holds the TA_COUNTER value of its corresponding governor • It is possible for a single dependent to be the argument (5 and 6) of two different verbs
  • 14. © 2015 SAP SE or an SAP affiliate company. All rights reserved. 14Public Improved Text Analysis XS API – Document Metadata For on-demand processing, text analysis output can be accessed via the SAP HANA Extended Application Services (XS) API: • Alternative to persisting output data to the $TA table • Bypasses creating the full-text index Now the following metadata properties for documents can be optionally included: • Author • Date • Date Created • Date Modified • Description • Keyword • Language • Subject • Title • Version • FromEmailAddress • FromName • ToEmailAddress • ToName • CcEmailAddress • CcName • BccEmailAddress • BccName
  • 15. © 2015 SAP SE or an SAP affiliate company. All rights reserved. 15Public Improved Dictionaries – Case Sensitivity In the context of extraction, dictionaries are user-defined repositories of entities. Dictionaries are used for customized information about the entities your application must find. Dictionaries can be used to store name variations in a structured way that is accessible through the extraction process. Dictionary XML syntax now includes the option: <dictionary xmlns=“http://www.sap.com/ta/4.0” case-sensitive=“true”> : : </dictionary> For example, adding the attribute will ensure the dictionary entry WHO will match WHO and not who or Who. The default behavior is case-insensitive.
  • 16. © 2015 SAP SE or an SAP affiliate company. All rights reserved. 16Public Improved Language Column – SAP Language Codes You can specify the input language for each row. Use LANGUAGE COLUMN to bypass automatic language detection:  Specify ISO 639 language code or…  SAP language codes can now be optionally utilized  English = E, French = F, German = D, etc. This option allows configuring full-text search over existing SAP business applications without modifying the underlying database tables. CREATE FULLTEXT INDEX PRODUCT_REVIEWS_IDX ON PRODUCT_REVIEWS(CONTENT) FAST PREPROCESS OFF LANGUAGE COLUMN LANGUAGE;
  • 17. © 2015 SAP SE or an SAP affiliate company. All rights reserved. 17Public Improved Stemming – Dutch, English, German, Italian Stemming identifies the base form referenced in a dictionary. Tolerant stemming is introduced for Dutch, English, German and Italian. This default behavior allows for handling non-standard spellings to better maximize recall. For example in English, the stemmer handles spelling variation found in American and British English; does not require correct capitalization and accentuation and allows required hyphens to be optional.
  • 18. © 2015 SAP SE or an SAP affiliate company. All rights reserved. 18Public Improved Language Support for Hungarian and Romanian Full linguistic analysis support by adding Part-of- Speech (POS) tagging and Noun Group (concept) extraction for Hungarian and Romanian. Language LINGANALYSIS_BASIC LINGANALYSIS_STEMS LINGANALYSIS_FULL Arabic   Catalan   Chinese (Simplified)   Chinese (Traditional)   Croatian   Czech   Danish   Dutch   English   Farsi   French   German   Greek  Hebrew   Hungarian   NEW Indonesian   Italian   Japanese   Korean   Norwegian (Bokmal)   Norwegian (Nynorsk)   Polish   Portuguese   Romanian   NEW Russian   Serbian   Slovak   Slovenian   Spanish   Swedish   Thai   Turkish  
  • 19. © 2015 SAP SE or an SAP affiliate company. All rights reserved. 19Public TITLE President PERSON Barak Obama LOCALITY Cambridge REGION@MINOR Napa County REGION@MAJOR Connecticut COUNTRY Brazil CONTINENT South America GEO_FEATURE Mount Fuji GEO_AREA Scandinavia FACILITY Logan International Airport LOCALITY New Delhi ORGANIZATION@COMMERCIAL AT&T ORGANIZATION@EDUCATIONAL University of Washington ORGANIZATION@OTHER FBI SOCIAL_MEDIA@TWITTER_ID @SAP SOCIAL_MEDIA@TWITTER_TOPIC #HANA DATE 2/14/2011 DAY Monday MONTH June YEAR 2011 PHONE 617-677-2030 URI@EMAIL john.smith@sap.com URI@URL http://sap.com Improved Core Extraction for Korean Higher precision and recall on existing predefined core extractions.
  • 20. © 2015 SAP SE or an SAP affiliate company. All rights reserved. 20Public Improved Voice of Customer – English and German Set of rules to extract sentiments expressed about a product or a service: I [love] [my new phone]!  Strong positive sentiment about my new phone He did [not like] [the book].  Weak negative sentiment about the book More granular than competing systems because it can link sentiments with topics: [love my new phone] = Sentiment ‘love’ = StrongPositiveSentiment ‘my new phone’ = Topic Determiners (above strikethroughs) are now not included with the topic classifications for English and German. This is a change to the previous behavior as it simplifies topic aggregation.
  • 21. © 2015 SAP SE or an SAP affiliate company. All rights reserved. 21Public How to find SAP HANA documentation on this topic? SAP HANA Advanced Data Processing  What’s New in the SAP HANA Advanced Data Processing (Release Notes)  Development – File Loader Guide for SAP HANA – SAP HANA Search Developer Guide – SAP HANA Text Analysis Developer Guide  References – SAP HANA Text Analysis Extraction Customization Guide – SAP HANA Text Analysis Language Reference Guide – SAP HANA Text Analysis XS JavaScript API • In addition to this learning material, you can find SAP HANA documentation on the SAP Help Portal knowledge center at http://help.sap.com/hana_options_adp. • The knowledge center is structured according to the product lifecycle: installation, security, administration, development.
  • 23. © 2015 SAP SE or an SAP affiliate company. All rights reserved. 23Public Agenda – Text Mining New or Improved Features  Language Support  Automatic Stop Words
  • 24. © 2015 SAP SE or an SAP affiliate company. All rights reserved. 24Public Text Mining Recap Text mining provides statistical functions that can compare documents by examining the terms used within them. The term-document matrix (a.k.a. text mining index) is an optional data structure that is optimized through the results of text analysis. Text mining is bound to the full-text indexing and text analysis process. Full-textindex Text analysis results table Full-text indexing with TA and TM Term-document matrix TM config. Insert ID TITLE
  • 25. © 2015 SAP SE or an SAP affiliate company. All rights reserved. 25Public New Language Support Text mining is now natively integrated with all 32 languages supported by text analysis. It leverages the available text preprocessing steps:  Tokenization  Stemming  Part-of-Speech tagging
  • 26. © 2015 SAP SE or an SAP affiliate company. All rights reserved. 26Public Improved Stop Word Customization Stop words are lists of literal terms to ignore in order to focus on the important content. Text mining automatically filters terms to include only nouns based on their part-of-speech tags for 31 languages. Optionally, users can manually include additional stop words in the configuration properties. Language Automatic Stop Words Arabic  Catalan  Chinese (Simplified)  Chinese (Traditional)  Croatian  Czech  Danish  Dutch  English  Farsi  French  German  Greek Hebrew  Hungarian  Indonesian  Italian  Japanese  Korean  Norwegian (Bokmal)  Norwegian (Nynorsk)  Polish  Portuguese  Romanian  Russian  Serbian  Slovak  Slovenian  Spanish  Swedish  Thai  Turkish 
  • 27. © 2015 SAP SE or an SAP affiliate company. All rights reserved. 27Public How to find SAP HANA documentation on this topic? SAP HANA Advanced Data Processing  What’s New in the SAP HANA Advanced Data Processing (Release Notes)  Development – File Loader Guide for SAP HANA – SAP HANA Search Developer Guide – SAP HANA Text Mining Developer Guide  References – SAP HANA Text Mining XS JavaScript API – SQL Reference for Options • In addition to this learning material, you can find SAP HANA documentation on the SAP Help Portal knowledge center at http://help.sap.com/hana_options_adp. • The knowledge center is structured according to the product lifecycle: installation, security, administration, development.
  • 28. © 2015 SAP SE or an SAP affiliate company. All rights reserved. 28Public Disclaimer This presentation outlines our general product direction and should not be relied on in making a purchase decision. This presentation is not subject to your license agreement or any other agreement with SAP. SAP has no obligation to pursue any course of business outlined in this presentation or to develop or release any functionality mentioned in this presentation. This presentation and SAP’s strategy and possible future developments are subject to change and may be changed by SAP at any time for any reason without notice. This document is provided without a warranty of any kind, either express or implied, including but not limited to, the implied warranties of merchantability, fitness for a particular purpose, or non-infringement. SAP assumes no responsibility for errors or omissions in this document, except if such damages were caused by SAP intentionally or grossly negligent.
  • 29. © 2015 SAP SE or an SAP affiliate company. All rights reserved. Thank you Contact information Anthony Waite SAP HANA Product Management AskSAPHANA@sap.com