SlideShare a Scribd company logo
Storage Foundation and
Alfresco
Toni de la Fuente
Principal Solutions Engineer, Americas
toni.delafuente@alfresco.com
Blog: blyx.com – Twitter: @ToniBlyx
Agenda
•  Intro to Storage Concepts
•  Hardware
•  Alfresco Storage Related Solutions
–  Alfresco S3
•  Caching contentstore
–  Alfresco XAM
–  Content Store Selector
–  Replication / Geo-clusters / Redundancy
•  Partners Solutions
–  Alf2CAS, Star Storage
•  Storage Best Practices with Alfresco
•  Backup and Recovery
Intro to Storage Concepts: stack
File Protocol NFS, CIFS, SMB
File System Ext3, Ext4,
RaiserFS, XFS,
GFS, NTFS, FAT32,
GlusterFS, OCFS,
ZFS
Block Management MDM, LVM (Logical
Volume
Management)
Block Protocol SCSI, SATA, FC
RAID (HW or SW) Mirrors, Stripes
Hardware Disks, connectors,
racks, FC switches
Intro to Storage Concepts
•  Hard drive types and interfaces
–  PATA: Parallel Advanced Technology Attachment
•  AKA IDE or EIDE, older, 20pin connector, less efficient, use
to be 4K – 5K rpm.
–  SATA: Serial ATA
•  Similar to PATA, different connector, more energy efficient,
between 5K and 10K rpm.
–  SCSI: Small Computer System Interface
•  Spin at 10K and 15K rpm, need a controller
–  SSD: Solid State Drives
•  No mechanical, semiconductors, much faster than
mechanical and less likely to break down than others.
Intro to Storage Concepts
•  Hard drive types and interfaces
–  FC: Fibre Channel
•  Successor to parallel SCSI, broader usage than mere disk
interfaces, used for SANs.
–  SAS: Serial Attached SCSI
•  Similar to SCSI but serial rather than parallel.
–  Other interfaces end user oriented:
•  USB
•  Firewire
•  Thunderbolt
•  CAS Content-addressable storage, is a mechanism for storing
information that can be retrieved based on its content, not its
storage location. (EMC Centera / Caringo)
•  XAM standard interface for archiving in CAS.
Intro to Storage Concepts
•  RAID types (SW or HW)
ß Faster with parity
Intro to Storage Concepts
Main differences between SAN and NAS
A SAN is a shared "network" of
storage
•  Block access to LUNs
•  Online and offline storage
•  SAN device = storage array
•  Zoning: data integrity and
security
•  Dedicated fiber network
Protocols:
•  SCSI over Fibre Channel
•  SCSI over IP/Ethernet (iSCSI)
and FC, Infiniband
NAS is a file system shared over
a network
•  File access to data
•  Online storage only
•  NAS device = File server or
"filer” already formatted
Protocols:
•  NFS, CIFS over IP over
Ethernet
Intro to Storage Concepts
Who should need a SAN?
•  Database servers and ECM: Oracle, SQL Server, DB2 and
other database servers.
•  File servers: Using SAN-based storage for file servers lets
you expand file server resources quickly, makes them run
better, and enables you to manage your file-based NAS
storage through the SAN.
•  Backup servers: SAN-based backup is dramatically faster
than LAN-based backup.
•  Voice/video servers: Manage large amounts of data very
quickly.
•  High-performance application servers: Applications such
as document management, customer relationship
management, billing, data warehouses, and other high-
performance and critical applications all benefit by what a
SAN can provide.
Intro to Storage Concepts
•  Evolution
Internal Storage
Direct-Attach Storage
(DAS)
Network-Attached Storage
(NAS)
Hardware
H
B
A
C
A
R
D
Tape Library
Fibre
Cables
Storage Arrays
Alfresco Storage Related Solutions
Alfresco S3 Connector
•  An alternative contentstore implementation that uses S3 directly (S3
APIs)
•  Somewhat equivalent to XAM, but not identical
–  Unlike XAM, S3 doesn’t offer retention policies
•  Enterprise only
–  USD10K for Alfresco Standard
–  USD13.4K for Alfresco Enterprise
•  Shipped as a single repo-side AMP
•  Can only be installed into a new Alfresco instance (no migration!)
•  Configuration must be done before first start.
•  Can also configure caching content store (default cache size: 50GB)
•  Only supported if Alfresco is running on Amazon EC2
•  Amazon EBS still required for database files, indexes, etc.
•  Does not support S3 Encryption yet.
Alfresco Storage Related Solutions
Alfresco XAM Connector (deprecated)
•  Made to get access from Alfresco to XAM
enabled storage devices.
•  New XAM connector available
•  Only EMC Centera supported
•  Released with 3.4, Jan 2011.
•  Enterprise only
•  Still being supported for existing customers
–  until November 30th 2014 or their current subscription
runs out, whichever comes first.
Alfresco Storage Related Solutions
Content Store Selector
•  Storage policies based in
business rules
•  Since Alfresco 3.2
•  Examples
o  By type: Large video files on
fast expensive drives. Office
documents on slower, more
cost effective, drives.
o  By business unit, by age, by
usage, by ...
•  Leverage Rules and Actions
to drive
SSD
$$$
SATA
Drive
$
SSD = Solid State Drives
FC = Fibre Channel
Policy
Rules
Policy
Rules
FC
Drives
$$
Alfresco Storage Related Solutions
Content Replication (Alfresco on-premise to Alfresco on-
premise)
•  Distributed repository replication
–  Selective replication of spaces and content
–  Support for full, incremental and delete
–  One source – multiple destinations
–  Replicas are read-only (update at source only - re-
direct if needed)
•  Benefits
–  Support geographically dispersed companies
–  Provide fast local access
–  Remove single point of failure
–  Reduce wide area network traffic
Alfresco Storage Related Solutions
Content Replication / Geo-clusters / Redundancy
•  Alfresco Cloud Sync: On premise ßà Cloud
–  Content oriented not for storage replication
•  Synchronization feature between Alfresco on-
premises (Not available yet).
•  Alfresco Desktop Sync: from Windows or Mac
desktop to Alfresco on-premise (not available
yet)
Alfresco Storage Related Solutions
Geo-clusters and Redundancy
•  Geo-clusters can be done by replicating DB and Content
store. Supported?
–  Low level replication/sync
–  Some customers has
this.
–  Some customer uses NetApp
NAS storage and
Golden-gate for DB
replication
–  Other replication
tools: EMC Clariion,
EMC Symmetrix or
IBM Total Storage.
Partners Solutions
•  Xenit Alf2Cas
–  Caringo Castor integration
–  Deprecated?
•  Star Storage – Hitachi Content Platform (HCP)
–  Content archiving, additional storage and faster content backup
–  Alfresco Enterprise: 3.4.x, 4.0.x
–  Hitachi Content Platform (HCP): 4.x, 5.x, 6.x
Third Party – Community Solutions
•  StorNext
–  It is not a connector is a solution for data life cycle management in the
background
–  Alfresco can see it as mount point and is not aware about that
–  Runs over FC
•  EMC Atmos
–  XAM connector for Alfresco
•  Alfresco Cloud Store
–  Amazon S3
–  https://code.google.com/p/alfresco-cloud-store/
•  Amazon S3 for on premise
–  https://issues.alfresco.com/jira/browse/AMZNSSS-26
•  Walrus? The S3 alternative for Eucalyptus
Storage Best Practices
•  Content Store
–  Use Content Store Selector for managing different
size of contents.
–  Default content store should be faster than others for
writing to avoid bottlenecks (contents come to default
then copied to other content store)
–  WORM disks as non default content store (cleaner -
Jefferies)
–  SAN if possible
–  If NAS use a dedicated LAN if possible
–  LVM if possible (scalability, snapshot)
–  Clean trash bin often
–  Delete “contentstore.deleted” often
Storage Best Practices
•  Indexes (SOLR or Lucene)
–  Dedicated disk local or SAN.
–  Avoid NAS.
–  Have at least 50-75% of space free (backup and
merge)
–  Consider using different file system for Lucene
backup and Solr backup.
•  Logs
–  Set your logs directory in different file system as
Content Store and Indexes.
Backup and Recovery
•  Recovery Time Objective: (RTO) The amount of time
that it takes to get your systems back online.
•  Recovery Point Objective: (RPO)This is the last
consistent data transaction prior to the disaster. If you
had a disaster, how much data would be lost?
•  The Disaster Recovery plan (DR) focuses on getting
your business back up and running after a major outage
•  The Business Continuance plan (BCP) focuses on
keeping your business running DURING the disaster.
Backup and Recovery
•  Alfresco Backup and Recovery Tool is
available:
–  http://blyx.com/open-source-contributions/alfresco-
bart/
•  Alfresco Backup and Recovery White
Paper:
–  http://www.slideshare.net/toniblyx/alfresco-backup-
and-disaster-recovery-white-paper
Common Questions to SE?
•  Best practices to storage.
–  You got it
•  NAS or SAN?
–  SAN if possible! Or NAS backed by a SAN is common as well. NAS is not bad
but now you know why is different.
•  Required space for DB, Indexes, Content Store?
–  It depends of any case but DB and Indexes use to be a 20% of the Content Store
space (each).
•  Do you have an Archiving solution?
–  Alfresco can be integrated with Archiving solutions like mentioned above and
implemented with Content Store Selector.
•  Do you have a backup/recovery solution?
–  http://www.slideshare.net/toniblyx/alfresco-backup-and-disaster-recovery-white-
paper
•  Do you have an data encryption solution?
–  Yes, Alfresco Encryption at Rest:
http://docs.alfresco.com/5.0/concepts/encrypted-overview.html
What kind of storage can I use
with Alfresco?
•  Any mountable volumes that can be made to
appear as standard local filesystems (local disks,
NAS, SAN, etc.)
•  Amazon S3 (for Alfresco installations in AWS)
•  Centera (through the now open source
connector)
•  EMC Atmos (through a partner-created
integration)
•  CAStor (through a dated partner-created
integration)
Appendix 1: Deleting content
Deleting Content
•  A complex process
•  You need to know this because it impacts
–  Disk space management
–  Backup and recovery procedures (and their integrity)
–  Security and auditing
•  You have a wide degree of control over what happens
and when
•  You need to do some work
•  More info page 24
http://www.slideshare.net/toniblyx/alfresco-security-best-
practices-guide
Node deletion
workspace://SpacesStore	
   alf_node	
  
alf_content_data	
  
alf_content_url	
  
alf_node_properties	
  
others	
   2e3839d2d345.bin	
  
archive://SpacesStore	
  
contentstore	
  
~/alf_data	
  
contentstore.deleted	
  
filesystem	
  database	
  
User	
  deletes	
  document	
  
workspace://SpacesStore	
   alf_node	
  
alf_content_data	
  
alf_content_url	
  
alf_node_properties	
  
others	
   2e3839d2d345.bin	
  
archive://SpacesStore	
  
contentstore	
  
~/alf_data	
  
contentstore.deleted	
  
filesystem	
  database	
  
Node deletion
Wastebasket	
  emp5es	
  
workspace://SpacesStore	
   alf_node	
  
alf_content_data	
  
alf_content_url	
  
orphan_time	
  =	
  'now'	
  
alf_node_properties	
  
2e3839d2d345.bin	
  
archive://SpacesStore	
  
contentstore	
  
~/alf_data	
  
contentstore.deleted	
  
filesystem	
  database	
  
workspace://SpacesStore	
   alf_node	
  
alf_content_data	
  
alf_content_url	
  
alf_node_properties	
  
others	
   2e3839d2d345.bin	
  
archive://SpacesStore	
  
contentstore	
  
~/alf_data	
  
contentstore.deleted	
  
filesystem	
  database	
  
Node deletion
workspace://SpacesStore	
   alf_node	
  
alf_content_data	
  
alf_content_url	
  
orphan_time	
  =	
  'now'	
  
alf_node_properties	
  
2e3839d2d345.bin	
  
archive://SpacesStore	
  
contentstore	
  
~/alf_data	
  
contentstore.deleted	
  
filesystem	
  database	
  
contentStoreCleaner	
  
Runs	
  
workspace://SpacesStore	
   alf_node	
  
alf_content_data	
  
alf_content_url	
  
alf_node_properties	
  
2e3839d2d345.bin	
  
archive://SpacesStore	
  
contentstore	
  
~/alf_data	
  
contentstore.deleted	
  
filesystem	
  database	
  
Questions?
Storage and Alfresco

More Related Content

Storage and Alfresco

  • 1. Storage Foundation and Alfresco Toni de la Fuente Principal Solutions Engineer, Americas toni.delafuente@alfresco.com Blog: blyx.com – Twitter: @ToniBlyx
  • 2. Agenda •  Intro to Storage Concepts •  Hardware •  Alfresco Storage Related Solutions –  Alfresco S3 •  Caching contentstore –  Alfresco XAM –  Content Store Selector –  Replication / Geo-clusters / Redundancy •  Partners Solutions –  Alf2CAS, Star Storage •  Storage Best Practices with Alfresco •  Backup and Recovery
  • 3. Intro to Storage Concepts: stack File Protocol NFS, CIFS, SMB File System Ext3, Ext4, RaiserFS, XFS, GFS, NTFS, FAT32, GlusterFS, OCFS, ZFS Block Management MDM, LVM (Logical Volume Management) Block Protocol SCSI, SATA, FC RAID (HW or SW) Mirrors, Stripes Hardware Disks, connectors, racks, FC switches
  • 4. Intro to Storage Concepts •  Hard drive types and interfaces –  PATA: Parallel Advanced Technology Attachment •  AKA IDE or EIDE, older, 20pin connector, less efficient, use to be 4K – 5K rpm. –  SATA: Serial ATA •  Similar to PATA, different connector, more energy efficient, between 5K and 10K rpm. –  SCSI: Small Computer System Interface •  Spin at 10K and 15K rpm, need a controller –  SSD: Solid State Drives •  No mechanical, semiconductors, much faster than mechanical and less likely to break down than others.
  • 5. Intro to Storage Concepts •  Hard drive types and interfaces –  FC: Fibre Channel •  Successor to parallel SCSI, broader usage than mere disk interfaces, used for SANs. –  SAS: Serial Attached SCSI •  Similar to SCSI but serial rather than parallel. –  Other interfaces end user oriented: •  USB •  Firewire •  Thunderbolt •  CAS Content-addressable storage, is a mechanism for storing information that can be retrieved based on its content, not its storage location. (EMC Centera / Caringo) •  XAM standard interface for archiving in CAS.
  • 6. Intro to Storage Concepts •  RAID types (SW or HW) ß Faster with parity
  • 7. Intro to Storage Concepts Main differences between SAN and NAS A SAN is a shared "network" of storage •  Block access to LUNs •  Online and offline storage •  SAN device = storage array •  Zoning: data integrity and security •  Dedicated fiber network Protocols: •  SCSI over Fibre Channel •  SCSI over IP/Ethernet (iSCSI) and FC, Infiniband NAS is a file system shared over a network •  File access to data •  Online storage only •  NAS device = File server or "filer” already formatted Protocols: •  NFS, CIFS over IP over Ethernet
  • 8. Intro to Storage Concepts Who should need a SAN? •  Database servers and ECM: Oracle, SQL Server, DB2 and other database servers. •  File servers: Using SAN-based storage for file servers lets you expand file server resources quickly, makes them run better, and enables you to manage your file-based NAS storage through the SAN. •  Backup servers: SAN-based backup is dramatically faster than LAN-based backup. •  Voice/video servers: Manage large amounts of data very quickly. •  High-performance application servers: Applications such as document management, customer relationship management, billing, data warehouses, and other high- performance and critical applications all benefit by what a SAN can provide.
  • 9. Intro to Storage Concepts •  Evolution Internal Storage Direct-Attach Storage (DAS) Network-Attached Storage (NAS)
  • 11. Alfresco Storage Related Solutions Alfresco S3 Connector •  An alternative contentstore implementation that uses S3 directly (S3 APIs) •  Somewhat equivalent to XAM, but not identical –  Unlike XAM, S3 doesn’t offer retention policies •  Enterprise only –  USD10K for Alfresco Standard –  USD13.4K for Alfresco Enterprise •  Shipped as a single repo-side AMP •  Can only be installed into a new Alfresco instance (no migration!) •  Configuration must be done before first start. •  Can also configure caching content store (default cache size: 50GB) •  Only supported if Alfresco is running on Amazon EC2 •  Amazon EBS still required for database files, indexes, etc. •  Does not support S3 Encryption yet.
  • 12. Alfresco Storage Related Solutions Alfresco XAM Connector (deprecated) •  Made to get access from Alfresco to XAM enabled storage devices. •  New XAM connector available •  Only EMC Centera supported •  Released with 3.4, Jan 2011. •  Enterprise only •  Still being supported for existing customers –  until November 30th 2014 or their current subscription runs out, whichever comes first.
  • 13. Alfresco Storage Related Solutions Content Store Selector •  Storage policies based in business rules •  Since Alfresco 3.2 •  Examples o  By type: Large video files on fast expensive drives. Office documents on slower, more cost effective, drives. o  By business unit, by age, by usage, by ... •  Leverage Rules and Actions to drive SSD $$$ SATA Drive $ SSD = Solid State Drives FC = Fibre Channel Policy Rules Policy Rules FC Drives $$
  • 14. Alfresco Storage Related Solutions Content Replication (Alfresco on-premise to Alfresco on- premise) •  Distributed repository replication –  Selective replication of spaces and content –  Support for full, incremental and delete –  One source – multiple destinations –  Replicas are read-only (update at source only - re- direct if needed) •  Benefits –  Support geographically dispersed companies –  Provide fast local access –  Remove single point of failure –  Reduce wide area network traffic
  • 15. Alfresco Storage Related Solutions Content Replication / Geo-clusters / Redundancy •  Alfresco Cloud Sync: On premise ßà Cloud –  Content oriented not for storage replication •  Synchronization feature between Alfresco on- premises (Not available yet). •  Alfresco Desktop Sync: from Windows or Mac desktop to Alfresco on-premise (not available yet)
  • 16. Alfresco Storage Related Solutions Geo-clusters and Redundancy •  Geo-clusters can be done by replicating DB and Content store. Supported? –  Low level replication/sync –  Some customers has this. –  Some customer uses NetApp NAS storage and Golden-gate for DB replication –  Other replication tools: EMC Clariion, EMC Symmetrix or IBM Total Storage.
  • 17. Partners Solutions •  Xenit Alf2Cas –  Caringo Castor integration –  Deprecated? •  Star Storage – Hitachi Content Platform (HCP) –  Content archiving, additional storage and faster content backup –  Alfresco Enterprise: 3.4.x, 4.0.x –  Hitachi Content Platform (HCP): 4.x, 5.x, 6.x
  • 18. Third Party – Community Solutions •  StorNext –  It is not a connector is a solution for data life cycle management in the background –  Alfresco can see it as mount point and is not aware about that –  Runs over FC •  EMC Atmos –  XAM connector for Alfresco •  Alfresco Cloud Store –  Amazon S3 –  https://code.google.com/p/alfresco-cloud-store/ •  Amazon S3 for on premise –  https://issues.alfresco.com/jira/browse/AMZNSSS-26 •  Walrus? The S3 alternative for Eucalyptus
  • 19. Storage Best Practices •  Content Store –  Use Content Store Selector for managing different size of contents. –  Default content store should be faster than others for writing to avoid bottlenecks (contents come to default then copied to other content store) –  WORM disks as non default content store (cleaner - Jefferies) –  SAN if possible –  If NAS use a dedicated LAN if possible –  LVM if possible (scalability, snapshot) –  Clean trash bin often –  Delete “contentstore.deleted” often
  • 20. Storage Best Practices •  Indexes (SOLR or Lucene) –  Dedicated disk local or SAN. –  Avoid NAS. –  Have at least 50-75% of space free (backup and merge) –  Consider using different file system for Lucene backup and Solr backup. •  Logs –  Set your logs directory in different file system as Content Store and Indexes.
  • 21. Backup and Recovery •  Recovery Time Objective: (RTO) The amount of time that it takes to get your systems back online. •  Recovery Point Objective: (RPO)This is the last consistent data transaction prior to the disaster. If you had a disaster, how much data would be lost? •  The Disaster Recovery plan (DR) focuses on getting your business back up and running after a major outage •  The Business Continuance plan (BCP) focuses on keeping your business running DURING the disaster.
  • 22. Backup and Recovery •  Alfresco Backup and Recovery Tool is available: –  http://blyx.com/open-source-contributions/alfresco- bart/ •  Alfresco Backup and Recovery White Paper: –  http://www.slideshare.net/toniblyx/alfresco-backup- and-disaster-recovery-white-paper
  • 23. Common Questions to SE? •  Best practices to storage. –  You got it •  NAS or SAN? –  SAN if possible! Or NAS backed by a SAN is common as well. NAS is not bad but now you know why is different. •  Required space for DB, Indexes, Content Store? –  It depends of any case but DB and Indexes use to be a 20% of the Content Store space (each). •  Do you have an Archiving solution? –  Alfresco can be integrated with Archiving solutions like mentioned above and implemented with Content Store Selector. •  Do you have a backup/recovery solution? –  http://www.slideshare.net/toniblyx/alfresco-backup-and-disaster-recovery-white- paper •  Do you have an data encryption solution? –  Yes, Alfresco Encryption at Rest: http://docs.alfresco.com/5.0/concepts/encrypted-overview.html
  • 24. What kind of storage can I use with Alfresco? •  Any mountable volumes that can be made to appear as standard local filesystems (local disks, NAS, SAN, etc.) •  Amazon S3 (for Alfresco installations in AWS) •  Centera (through the now open source connector) •  EMC Atmos (through a partner-created integration) •  CAStor (through a dated partner-created integration)
  • 26. Deleting Content •  A complex process •  You need to know this because it impacts –  Disk space management –  Backup and recovery procedures (and their integrity) –  Security and auditing •  You have a wide degree of control over what happens and when •  You need to do some work •  More info page 24 http://www.slideshare.net/toniblyx/alfresco-security-best- practices-guide
  • 27. Node deletion workspace://SpacesStore   alf_node   alf_content_data   alf_content_url   alf_node_properties   others   2e3839d2d345.bin   archive://SpacesStore   contentstore   ~/alf_data   contentstore.deleted   filesystem  database   User  deletes  document   workspace://SpacesStore   alf_node   alf_content_data   alf_content_url   alf_node_properties   others   2e3839d2d345.bin   archive://SpacesStore   contentstore   ~/alf_data   contentstore.deleted   filesystem  database  
  • 28. Node deletion Wastebasket  emp5es   workspace://SpacesStore   alf_node   alf_content_data   alf_content_url   orphan_time  =  'now'   alf_node_properties   2e3839d2d345.bin   archive://SpacesStore   contentstore   ~/alf_data   contentstore.deleted   filesystem  database   workspace://SpacesStore   alf_node   alf_content_data   alf_content_url   alf_node_properties   others   2e3839d2d345.bin   archive://SpacesStore   contentstore   ~/alf_data   contentstore.deleted   filesystem  database  
  • 29. Node deletion workspace://SpacesStore   alf_node   alf_content_data   alf_content_url   orphan_time  =  'now'   alf_node_properties   2e3839d2d345.bin   archive://SpacesStore   contentstore   ~/alf_data   contentstore.deleted   filesystem  database   contentStoreCleaner   Runs   workspace://SpacesStore   alf_node   alf_content_data   alf_content_url   alf_node_properties   2e3839d2d345.bin   archive://SpacesStore   contentstore   ~/alf_data   contentstore.deleted   filesystem  database