SlideShare a Scribd company logo
Handling of Large Data by Salesforce
Agenda
1. Overview
2. Handling of Large data by Salesforce
• The Underlying Concept
• Infrastructure – Salesforce Components and Capabilities
3. Techniques for Optimizing Performance
4. Best Practices
5. Case Studies
Overview
 Salesforce automatically enables customers to easily scale their applications up from small to large
amounts of data.
 Ho e e if the e s la ge data olu e, ti e e ui ed fo e tai p o esses ight g o .
 The processing time is dependent upon the architecture and the design of the application.
 Main Processes that gets affected.
 Loading/Uploading large number of records.
 Extraction of data through Reports/Dashboards, List Views.
 Strategy for optimizing the processes
 Following industry-standard practices for accommodating schema changes and operations in database-enabled applications.
 Deferring or bypassing business rule and sharing processing.
 Choosing the most efficient operation for accomplishing a task.
Underlying Concept and Infrastructure
HOW DOES SALESFORCE
HANDLE LARGE DATA?
The Underlying Concept
• Multitenancy is a means of providing a single application
to multiple organizations.
• Multitenancy requires that applications behave reliably,
even when architects are making Salesforce-supported
customizations.
• When organizations create custom objects, the platform
tracks metadata about the objects and their fields,
relationships, and other object definition characteristics.
See the diagram below:
• As a customer, the SQL underlying many application
operations cannot be modified as it is generated by the
system, not written by each tenant.
Multitenancy of Metadata
• Search is the capability to query records based on free-
form text.
• Salesforce Search Architecture is based on its own data
which makes it easier to search.
• For data to be searched, it has to be indexed.
• Salesforce Search Capabilities.
– The sidebar
– Advanced and global searches
– Find boxes and lookup fields
– Suggested Solutions and Knowledge Base
– Web-to-Lead and Web-to-Case
– Duplicate lead processing
– Salesforce Object Search Language (SOSL) for
Apex and the API
Search Architecture
Infrastructure – Salesforce Components and Capabilities
Force.com Query Optimizer
• Helps the data ase s ste s opti ize p odu e
effective execution plans for Salesforce queries.
• Works on automatically generated queries to
handle reports, list views, and both SOQL
queries and the other queries that piggyback on
them.
• The platform must keep its own set of statistical
information to help the database understand the
best way to access the data.
• As a result, when large amounts of data are
created, updated, or deleted using the API, the
database must gather statistics before the application
can efficiently access data.
Indexes Index Tables
• The Salesforce multitenant architecture
makes the underlying data table for custom
fields unsuitable for indexing.
• To overcome this limitation, the platform
creates an index table that contains a copy of
the data, along with information about the data
types.Skinny Tables
Database Statistics
Salesforce supports custom indexes
to speed up queries, and one can
create custom indexes by contacting
Salesforce Customer Support.
• Salesforce creates skinny tables to contain
frequently used fields and to avoid joins, and it
keeps the skinny tables in sync with their source
tables when the source tables are modified.
• Can be created on custom objects, and on
Account, Contact, Opportunity, Lead, and Case
objects.
Divisions are a means of partitioning the data
of large deployments to reduce the number
of records returned by queries and reports.
Divisions
Techniques for Optimizing Performance
Using
Mashups
Defer
Sharing
Calculation
Deleting
Data
Search
Data
Archiving
BEST PRACTICES
Best Practices
BEST PRACTICES
Reporting
• Reduce number records to query-use value in data to segment query.
• Reduce number of objects queried and number of relationships use to generate the report.
• Reduce number of fields queried- only add fields to a report , list view, or SOQL query that is required.
• Reduce amount of data by archiving unused records.
• Use report filter that emphasize the use of standard or custom indexed fields.
Loading Data
from the API
• Use the Salesforce Bulk API when you have more than a few hundred thousand records.
• Use the fastest operation possible-—insert() is fastest, update() is next, and upsert() is next after that.
• Ensure that data is clean before loading when using the Bulk API.
• When updating, send only fields that have changed.
• For custom integrations : Authenticate once per load, not on each record.
• If possible for initial loads, populate roles before populating sharing rules.
• When changing child records, group them by parent.
• When using the SOAP API, use as many batches as possible.
Extracting
Data from the
API
• Use the getUpdated() and getDeleted() SOAP API to sync an external system with Salesforce at intervals greater than
5 minutes.
• When a query returning more than one million results, consider using the query capability of the Bulk API.
Searching • Keep searches specific and avoid using wildcards, if possible.
• Use single-object searches for greater speed and accuracy.
ACTIONS
Best Practices
SOQL and SOSL
• Decompose the query- break the query into two queries and join the results.
• If querying on formula fields is required, make sure that they are deterministic formulas.
• Use values such as NA to replace NULLS options.
• Use SOQL and SOSL where appropriate, and minimize the amount of data being queried or searched.
• Tune the SOQL query, reducing query scope, and using selective filters. Consider using Bulk API with bulk
query.
Deleting Data • When deleting large volumes of data use the hard delete option for Deleting large volumes of data of the Bulk
AP.
• When deleting records that have many children, delete the children first.
General
• Avoid having any user own more than 10,000 records.
• Use a data-tiering strategy that spreads data across multiple objects, and brings in data on demand.
• Whe eati g opies of p odu tio sa d o es, e lude field histo if it is t e ui ed, do t ha ge a lot of
data until sandbox copy is created.
ACTIONS BEST PRACTICES
CASE STUDIES
Case Study: Indexing with Nulls
Example :
1. Create formula field:
Status Value c = IF(ISBLANK(Status c),"blank",
Status__c)
2. Contact Salesforce to get the formula field indexed!
3. Update Query SELECT Name FROM Object c
WHERE Status Value = ' la k
Requirement
• The customer needed to allow nulls in
a field and be able to query against
them.
• Because single-column indexes for
picklists and foreign key fields exclude
rows in which the index column is equal
to null, an index could not have been
used for the null queries.
Solution
• Use some other string, such as N/A, in
place of NULL.
• If you cannot do that, possibly because
records already exist in the object with
null values, create a formula field that
displays text for nulls, and then index
that formula field.
Non
deterministic
Force.com
formula
fields can
Reference other entities
Include other formula fields that
span over other entities
Use dynamic date and time functions
Standard fields with special
functionalities E.g. Opportunity Amount etc
References to fields that Force.com
cannot index
Currency fields in a multicurrency
organization
Long text area fields
Blob, file, or encrypted text
Custom indexes can be created on a formula field, provided that the formula field is deterministic.
Formula field indexing considerations
Note: If the formula is modified after the index is created, the index is disabled. To re-enable an index, one needs to
contact Salesforce Customer Support.
• Create an aggregation custom object.
• Use Apex Batch.Data Aggregation
• To aggregate monthly and yearly metrics
using standard reports.
• Data is stored across two objects.
Requirement Solution
Custom Search
• To search in large data volumes across
multiple objects using specific values and
wildcards.
• 1-20 fields.
• Use only essential search fields.
• De-normalize the data.
• Dynamically determine the use of SOQL
or SOSL.
Related Lists
Detail page takes long time to load
due to large data volume in related
lists.
Enable Separate Loading of Related Lists.
Case Studies
Requirement Solution
Sharing rules
Large number of changes need to be made
to roles, territories, groups, users, portal
account ownership, or public groups
participating in sharing.
• Give Access to all data.
• Create a delta extraction, lowering the
volume of data that needed to be
processed.
API Performance
• Synchronize Salesforce data with external
customer applications.
• Integration used a specific API user that was
part of the sharing hierarchy.
• Queries taking minutes to complete .
Data Deletion
Defer Sharing Calculations
(Contact Salesforce).
Delete millions of records.
• Use the Bulk API s ha d delete fu tio .
• T u ate usto o je t Co ta t Salesfo e .
• If e o ds ha e a hild e , delete hild e
first.
Sharing as per
region
Share data with users on basis of location. Divisions(Contact Salesforce).
Case Studies
jQuery
SALESFORCE CUSTOMIZATION
SALESFORCE AUTOMATION
ADVISORY SERVICES
INTEGRATED SOLUTIONS
Lightning
Bootstrap
Visualforce
Appexc
hange
Commun
ities
Service
Cloud
Sales
Cloud
GitHub
Apex
Web
Services
Visit www.zen4orce.com for further details about Zen4orce Services & Offerings.
Skillset
Zen4orce Service Offerings
References
• http://www.salesforce.com/docs/en/cce/ldv_deployments/salesforce_large_data_volumes_bp.pdf
• http:// .salesfo e. o /do s/de elope /pages/Co te t/pages_ o t olle _ eado l
Get in Touch with us :
+16124545031
sales@zen4orce.com
www.zen4orce.com
THANK YOU !!

More Related Content

Handling of Large Data by Salesforce

  • 2. Agenda 1. Overview 2. Handling of Large data by Salesforce • The Underlying Concept • Infrastructure – Salesforce Components and Capabilities 3. Techniques for Optimizing Performance 4. Best Practices 5. Case Studies
  • 3. Overview  Salesforce automatically enables customers to easily scale their applications up from small to large amounts of data.  Ho e e if the e s la ge data olu e, ti e e ui ed fo e tai p o esses ight g o .  The processing time is dependent upon the architecture and the design of the application.  Main Processes that gets affected.  Loading/Uploading large number of records.  Extraction of data through Reports/Dashboards, List Views.  Strategy for optimizing the processes  Following industry-standard practices for accommodating schema changes and operations in database-enabled applications.  Deferring or bypassing business rule and sharing processing.  Choosing the most efficient operation for accomplishing a task.
  • 4. Underlying Concept and Infrastructure HOW DOES SALESFORCE HANDLE LARGE DATA?
  • 5. The Underlying Concept • Multitenancy is a means of providing a single application to multiple organizations. • Multitenancy requires that applications behave reliably, even when architects are making Salesforce-supported customizations. • When organizations create custom objects, the platform tracks metadata about the objects and their fields, relationships, and other object definition characteristics. See the diagram below: • As a customer, the SQL underlying many application operations cannot be modified as it is generated by the system, not written by each tenant. Multitenancy of Metadata • Search is the capability to query records based on free- form text. • Salesforce Search Architecture is based on its own data which makes it easier to search. • For data to be searched, it has to be indexed. • Salesforce Search Capabilities. – The sidebar – Advanced and global searches – Find boxes and lookup fields – Suggested Solutions and Knowledge Base – Web-to-Lead and Web-to-Case – Duplicate lead processing – Salesforce Object Search Language (SOSL) for Apex and the API Search Architecture
  • 6. Infrastructure – Salesforce Components and Capabilities Force.com Query Optimizer • Helps the data ase s ste s opti ize p odu e effective execution plans for Salesforce queries. • Works on automatically generated queries to handle reports, list views, and both SOQL queries and the other queries that piggyback on them. • The platform must keep its own set of statistical information to help the database understand the best way to access the data. • As a result, when large amounts of data are created, updated, or deleted using the API, the database must gather statistics before the application can efficiently access data. Indexes Index Tables • The Salesforce multitenant architecture makes the underlying data table for custom fields unsuitable for indexing. • To overcome this limitation, the platform creates an index table that contains a copy of the data, along with information about the data types.Skinny Tables Database Statistics Salesforce supports custom indexes to speed up queries, and one can create custom indexes by contacting Salesforce Customer Support. • Salesforce creates skinny tables to contain frequently used fields and to avoid joins, and it keeps the skinny tables in sync with their source tables when the source tables are modified. • Can be created on custom objects, and on Account, Contact, Opportunity, Lead, and Case objects. Divisions are a means of partitioning the data of large deployments to reduce the number of records returned by queries and reports. Divisions
  • 7. Techniques for Optimizing Performance Using Mashups Defer Sharing Calculation Deleting Data Search Data Archiving
  • 9. Best Practices BEST PRACTICES Reporting • Reduce number records to query-use value in data to segment query. • Reduce number of objects queried and number of relationships use to generate the report. • Reduce number of fields queried- only add fields to a report , list view, or SOQL query that is required. • Reduce amount of data by archiving unused records. • Use report filter that emphasize the use of standard or custom indexed fields. Loading Data from the API • Use the Salesforce Bulk API when you have more than a few hundred thousand records. • Use the fastest operation possible-—insert() is fastest, update() is next, and upsert() is next after that. • Ensure that data is clean before loading when using the Bulk API. • When updating, send only fields that have changed. • For custom integrations : Authenticate once per load, not on each record. • If possible for initial loads, populate roles before populating sharing rules. • When changing child records, group them by parent. • When using the SOAP API, use as many batches as possible. Extracting Data from the API • Use the getUpdated() and getDeleted() SOAP API to sync an external system with Salesforce at intervals greater than 5 minutes. • When a query returning more than one million results, consider using the query capability of the Bulk API. Searching • Keep searches specific and avoid using wildcards, if possible. • Use single-object searches for greater speed and accuracy. ACTIONS
  • 10. Best Practices SOQL and SOSL • Decompose the query- break the query into two queries and join the results. • If querying on formula fields is required, make sure that they are deterministic formulas. • Use values such as NA to replace NULLS options. • Use SOQL and SOSL where appropriate, and minimize the amount of data being queried or searched. • Tune the SOQL query, reducing query scope, and using selective filters. Consider using Bulk API with bulk query. Deleting Data • When deleting large volumes of data use the hard delete option for Deleting large volumes of data of the Bulk AP. • When deleting records that have many children, delete the children first. General • Avoid having any user own more than 10,000 records. • Use a data-tiering strategy that spreads data across multiple objects, and brings in data on demand. • Whe eati g opies of p odu tio sa d o es, e lude field histo if it is t e ui ed, do t ha ge a lot of data until sandbox copy is created. ACTIONS BEST PRACTICES
  • 12. Case Study: Indexing with Nulls Example : 1. Create formula field: Status Value c = IF(ISBLANK(Status c),"blank", Status__c) 2. Contact Salesforce to get the formula field indexed! 3. Update Query SELECT Name FROM Object c WHERE Status Value = ' la k Requirement • The customer needed to allow nulls in a field and be able to query against them. • Because single-column indexes for picklists and foreign key fields exclude rows in which the index column is equal to null, an index could not have been used for the null queries. Solution • Use some other string, such as N/A, in place of NULL. • If you cannot do that, possibly because records already exist in the object with null values, create a formula field that displays text for nulls, and then index that formula field.
  • 13. Non deterministic Force.com formula fields can Reference other entities Include other formula fields that span over other entities Use dynamic date and time functions Standard fields with special functionalities E.g. Opportunity Amount etc References to fields that Force.com cannot index Currency fields in a multicurrency organization Long text area fields Blob, file, or encrypted text Custom indexes can be created on a formula field, provided that the formula field is deterministic. Formula field indexing considerations Note: If the formula is modified after the index is created, the index is disabled. To re-enable an index, one needs to contact Salesforce Customer Support.
  • 14. • Create an aggregation custom object. • Use Apex Batch.Data Aggregation • To aggregate monthly and yearly metrics using standard reports. • Data is stored across two objects. Requirement Solution Custom Search • To search in large data volumes across multiple objects using specific values and wildcards. • 1-20 fields. • Use only essential search fields. • De-normalize the data. • Dynamically determine the use of SOQL or SOSL. Related Lists Detail page takes long time to load due to large data volume in related lists. Enable Separate Loading of Related Lists. Case Studies
  • 15. Requirement Solution Sharing rules Large number of changes need to be made to roles, territories, groups, users, portal account ownership, or public groups participating in sharing. • Give Access to all data. • Create a delta extraction, lowering the volume of data that needed to be processed. API Performance • Synchronize Salesforce data with external customer applications. • Integration used a specific API user that was part of the sharing hierarchy. • Queries taking minutes to complete . Data Deletion Defer Sharing Calculations (Contact Salesforce). Delete millions of records. • Use the Bulk API s ha d delete fu tio . • T u ate usto o je t Co ta t Salesfo e . • If e o ds ha e a hild e , delete hild e first. Sharing as per region Share data with users on basis of location. Divisions(Contact Salesforce). Case Studies
  • 16. jQuery SALESFORCE CUSTOMIZATION SALESFORCE AUTOMATION ADVISORY SERVICES INTEGRATED SOLUTIONS Lightning Bootstrap Visualforce Appexc hange Commun ities Service Cloud Sales Cloud GitHub Apex Web Services Visit www.zen4orce.com for further details about Zen4orce Services & Offerings. Skillset Zen4orce Service Offerings
  • 17. References • http://www.salesforce.com/docs/en/cce/ldv_deployments/salesforce_large_data_volumes_bp.pdf • http:// .salesfo e. o /do s/de elope /pages/Co te t/pages_ o t olle _ eado l Get in Touch with us : +16124545031 sales@zen4orce.com www.zen4orce.com THANK YOU !!