Chapter 8. The Basics of the Alfresco Content Store

Content is the heart of the ECM system. All functionalities and features of the ECM system are surrounded by content. For the architecture and maintenance of the ECM system, the understanding of the lifecycle of content in an ECM application is very important. Once content gets inside the CMS application, it passes through different phases, which is common in most standard ECM applications.

However, the storage mechanism of the content varies in different ECM applications.

In this chapter, we will understand, in detail, the lifecycle of content in Alfresco and how these different phases impact different components of Alfresco. We will also try to understand the Alfresco database schema.

By the end of this chapter, you will have learned about:

  • The content lifecycle
  • Content store types
  • Alfresco database schema

Before going into detail about lifecycles, let's understand the content store and database schema. We already covered indexes in Chapter 5, Search.

Understanding the content store architecture

The content store controls the creation and deletion of binary content in the filesystem. We have already covered a few details on this in earlier chapters. The dir.root property in the alfresco-global.properties (<Tomcat_Home>/shared/classes) file defines the root binary file storage location.

Let's, for example, examine the path specified in dir.root which is /mnt/alf_data. Beneath this directory, there are two folders: contentstore and contentstore.deleted, which will be created the first time Alfresco is started. Let's have a look at the details of the folder:

  • contentstore: All active and archive content is being stored here. Based on content creation time, a directory hierarchy is created. All the files will have a unique name and the .bin extension. Let's say there is a file named Employee Handbook.doc being uploaded in Alfresco on January 20, 2015 at 10:50 A.M., then the file will be stored in /mnt/alf_data/contentstore/2015/1/20/10/50/<unique name>.bin.
  • contentstore.deleted: The orphaned content which is permanently deleted by Alfresco is being moved to this directory by an orphan cleaner scheduler. From this directory, files can be removed at any time using the standard operating system remove command. For example, by executing the rm /mnt/alf_data/contentstore.deleted/2015/1/23/13/34/xxxxx.bin command.

This is the general architecture of a default content store. The default content store is named FileContentStore. Based on this default content store, Alfresco also provides various different types of content store. Here are a few details about each type of content store.

Encrypted ContentStore

As the name suggests, the content is stored encrypted in the filesystem. All content is encrypted with its unique key. This unique key is again encrypted with a master key and is stored in the Alfresco database. The encrypted ContentStore was introduced in version 5.0 of Alfresco . To enable the encrypted ContentStore, you will need a license file, which has enabled content store encryption from Alfresco.

Enabling the encrypted ContentStore

Here are the steps required to enable and configure the encrypted ContentStore.

  1. Get the license file with encrypted ContentStore enabled. Install the new license file using the admin console. Refer to Chapter 4, Administration of Alfresco.
  2. An RSA key needs to be generated in a new keystore using keytool. A sample keytool command can be used to generate the master key.

    keytool -genkey -alias key1 -keyalg RSA -keystore <master keystore path> -keysize 2048

  3. Configure the following properties in alfresco-global.properties to enable content encryption. These properties can also be changed via JMX:
    • filecontentstore.subsystem.name=encryptedContentStore

      This will enable the encrypted content store

    • cryptodoc.jce.keystore.type=

      This is the keystore type for master keys like jceks

    • cryptodoc.jce.keystore.path=

      Provides the path of the keystore where the master key was generated

    • cryptodoc.jce.keystore.password=

      Password for keystore

    • cryptodoc.jce.key.aliases=

      A comma separated list of all aliases of the master key

    • cryptodoc.jce.key.passwords=

      A comma separated list of all passwords for fetching the master key from the keystore

    • cryptodoc.jce.keygen.defaultSymmetricKeySize=

      The size of the symmetric key size by default is 128 bit

  4. The dir.root path specified in alfresco-global.properties remains the same.

    Tip

    Once enabled, you cannot revert back to the normal ContentStore. Also, if you are upgrading from an old version, only new content will be encrypted. Old content will still remain un-encrypted. Be careful when you choose the encrypted ContentStore. Multi-tenancy is not supported with an encrypted store.

Caching ContentStore

Caching ContentStore works as a wrapper around any ContentStore to provide caching and faster access of data. Caching ContentStore should be used with either a slow disk, Amazon s3, or so on. If the normal content storage mechanism is slow, set up the caching ContentStore around it. Don't use it around FileContentStore if you have a fast disk.

Configuring the caching of ContentStore

Follow the steps below to configure the caching of ContentStore (assuming the backing store is already configured).

  1. Enable the caching-content-store-context.xml file located at <ALFRESCO_HOME>/shared/classes/alfresco/extension by renaming it from .sample to .xml.
  2. Configure the context file as per your system requirements. Refer to the bean ID cachingContentStore. Make sure the backingStore and quota are configured properly. Quota can be standard quota or unlimited. With a standard quota manager, you can control the disk usage of cached files:
    <bean id="cachingContentStore" class="org.alfresco.repo.content.caching.CachingContentStore" init-method="init">
        <property name="backingStore" ref="backingStore"/>
        <property name="cache" ref="contentCache"/>
        <property name="cacheOnInbound" value="${system.content.caching.cacheOnInbound}"/>
        <property name="quota" ref="standardQuotaManager"/>
    </bean>
  3. In the sample context file, the backingStore bean is referring to FileContentStore. Change the bean definition based on the backing store used. With FileContentStore, there is no use of caching. For example, if you are using S3ContentStore (details about this content store will be covered later on in this chapter) where caching is required, make sure the backingStore is referring to the correct ContentStore, as shown in the following sample code snippet:
    <bean id="backingStore" class="org.alfresco.integrations.s3store.TenantS3ContentStore">
        <constructor-arg>
            <value>${dir.contentstore}</value>
        </constructor-arg>
    </bean>
  4. Based on the context file configuration, add and modify the following important properties in the alfresco-global.properties file. Default values are set in the repostiory.properties file:
    • dir.cachedcontent=${dir.root}/cachedcontent:

      Change this value if you want the cached content in a different path to the content root directory.

    • system.content.caching.cacheOnInbound=true

      This is the property to enable the caching of content while running the write operation. That way, whenever content is read, it is already in the cache.

    • system.content.caching.maxDeleteWatchCount=1:

      The number of times the file is observed as deleted before cleanup from the cache.

    • system.content.caching.contentCleanup.cronExpression=0 0 3

      Specify the cron expression to clean up the cached content.

    • system.content.caching.minFileAgeMillis=60000

      Specify the minimum live time for the file before it is deleted from the cache.

    • system.content.caching.maxUsageMB=4096

      This property is associated with a quota, the maximum amount of disk space can be used for the cache.

    • system.content.caching.maxFileSizeMB=0

      Change this value if you want any limitations with the file size to be maintained in the cache.

Alfresco S3 content store

This is a special content store which will be required only when the Alfresco instance is on the Amazon cloud (EC2) (refer to https://en.wikipedia.org/wiki/Amazon_Web_Services for more details). Alfresco provides this additional module to use Amazon's Simple Storage Service (S3) for file storage. The Alfresco S3 content store is slower than the standard FileContentStore, so you can use this in combination with the caching ContentStore.

Configuring the Alfresco S3 connector

Follow these steps to configure the S3 connector:

  1. Download the amp package for the S3 connector from Alfresco support.
  2. Install this amp package using the Alfresco Module Package (AMP) installation procedure.
  3. Configure the following properties in the alfresco-global.properties file
    • s3.accessKey=

      Specify the access key for Amazon Web service identification.

    • s3.secretKey=

      Specify the Amazon web service secret key.

    • s3.bucketName=

      Specify the bucket name which will be used for content storage. This bucket name should be unique.

  4. Once the bucket name is defined, the same bucket can be used to multipurpose. Define the contentstore and contentstore.deleted paths using the same bucket name in the alfresco-global.properties file.
    dir.contentstore=/AmazonBucketPath/contentstore
    dir.contentstore.deleted=/AmazonBucketPath/contentstore.deleted

    Note

    When upgrading from the local content store to S3, the content store will not be supported by S3. It will corrupt the repository.

Content store selector

The content store selector provides users with a mechanism to bind content with a specific content store. Alfresco provides the flexibility to have multiple content stores, and you can decide what content needs to be stored in which store. This is very useful in a scenario where you need to store different folder data in a completely different store. You get the flexibility to place the less read, old content to any slow disk and all new content to any fast disk.

Using the content store selector

Follow the steps mentioned here to enable the content store selector:

  1. Create a content store selector context file in <Alfresco_home>/shared/classes/alfresco/extension. A sample context file is provided with support files of this book.
  2. Define the store as you require by defining beans, as shown in the following code sample:
    <bean id="projectMarketingContentStore" class="org.alfresco.repo.content.filestore.FileContentStore">
        <constructor-arg>
            <value>${dir.root}/storeProjectA</value>
        </constructor-arg>
    </bean>
  3. List all the store beans with a store name that will be visible in the user interface in the storeSelectorContentStore bean. Take a look at the sample code snippet:
    <bean id="storeSelectorContentStore" parent="storeSelectorContentStoreBase">
       <property name="defaultStoreName">
          <value>default</value>
       </property>
       <property name="storesByName">
          <map>
              <entry key="default">
                 <ref bean="fileContentStore" />
              </entry>
              <entry key="projectMarketing">
                 <ref bean="projectMarketingContentStore " />
              </entry>
                ...
       <bean>
  4. Configure the eagerOrphanCleanup bean to map this list, so all this additional content store can be cleaned up in the same fashion as the default content store.
  5. Set a proper scheduler cron expression for the system.content.orphanCleanup.cronExpression property in alfresco-global.properties.
  6. Now restart Alfresco.
  7. For Share, you need to enable the cm:storeSelector aspect and cm:storeName, which is a property associated with this aspect.

    Find the aspects tag in the share-config-custom.xml file located at <ALFRESCO_HOME>/tomcat/shared/classes/alfresco/web-extension and below the storeSelector aspect, add it in the list as shown in the following code snippet:

    <aspects>
        <!-- Aspects that a user can see -->
            <visible>
                ..
                <aspect name="cm:storeSelector" />
             </visible>
                ..
    </aspects>

    Also define the user-friendly name of the aspect in the slingshot.properties file to be shown in the Share user interface.

        aspect.cm_storeSelector=Store Selector
  8. Now apply this aspect to any content and set the storeName based on the store you want the content to be in, for example if you want to store all marketing documents in the projectMarketing store, set the storeName value to projectMarketing as defined in the store selector bean. The file will be copied from the default content store to the new content store. If no value is specified in storeName, it takes the default. The file in the old content store will remain as it is, but it will be marked as orphan so the cleanup process can clean these documents.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset