Today’s the day for our Drupal developer’s blog post again, which means a lot of attention to detail and practical tips — this time on caching in Drupal 7. The 7th release is a strong platform, and many developers prefer it for site building, as well as for blog writing ;) We have posts about creating CTools popups, working with the Book module, building apps with PhoneGap, using Drupal 7 tools, configuring search with ApacheSolr, and much more in the development category. But today is the day to delve into the nuances of caching! ;)
A cache is a web developer’s tool that helps get the requested data as quickly as possible and with the least system effort. Essentially, it is an intermediate buffer which stores data that is often requested. How does this buffer work in practice? Usually it is a table in the database with the data at the ready, or, instead of a table, the data can often be stored in a file.
We can give a simple example of caching. Suppose that a certain website page is often visited by new users, and, for each of these users, the page script has to perform a series of operations that take some time (let’s say, 30 seconds). Only after all these calculations performed does a user sees the desired result on the screen. So, every time a user visits a page, he waits 30 seconds to get the desired result. You might say, after the first function execution, why don’t we save the result of this execution and just show it to him, skipping the long execution process? This means that the data of the 30-second performance will be just saved in the above described “intermediate buffer” and instantly delivered to the user. That’s the main philosophy of caching.
Drupal has a number of tools to manipulate data and cache them according to specific algorithms. These tools and methods of their use are exactly what we are going to discuss.
Drupal caching tools and their use
As mentioned above, the Drupal cache is stored in a database table. However, our content management system distributes cash and divides it into certain segments, each of which is in a separate table with an individual name. Why does the segmentation take place, and what benefits do we get from this kind of pattern? Let’s consider these segments and give examples of their practical use.
Fields of the cache segments
By default, each cache segment consists of 5 fields:
$сid — identification cache (ID) for the data in the repository. You can use any character string, so long as it is no more than 255 characters long. We recommend starting an identifier with a module name.
$data — as is clear from the name, it’s the place to store our data. Complex data types will be automatically serialized.
$expire — (optional). The cache lifetime. It can have the following values:
- CACHE_PERMANENT: shows that the data should never be removed, except in the case of the expressly specified element’s ID being listed during the cache_clear_all() function callback.
- CACHE_TEMPORARY: shows that the element should be removed during the next general cache clearing.
- UNIX is a timestamp: it shows the element’s lifetime in the UNIX format. After that it behaves as CACHE_TEMPORARY.
$created is the cache saving time in Unix format.
$serialized is a flag (0 or 1) that shows whether the cached data was(1) or wasn’t(0) serialized.
Functions for working with the cache
• cache_clear_all() — clears the cache data.
• cache_get() — returns the cache data.
• cache_get_multiple() — returns the data from cache, from the array of cache ID-s.
• cache_is_empty() — checks whether the cache is empty.
• cache_set() — stores data in the cache.
• _cache_get_object() — gets the cache object by the segment ($bin).
For example:
function cache_set($cid, $data, $bin = 'cache', $expire = CACHE_PERMANENT) { return _cache_get_object($bin)->set($cid, $data, $expire); }
Descriptions of the cache segments
1. {cache} is a (default) segment for storing the general cache. It is commonly used to register themes, local dates, lists of SimpleTest tests and so on, or when the data is impossible to classify, and creating a separate segment for it makes no sense. It is often used to cache data from custom modules.
2. {cache_block} is added when the Block module is enabled. When a region theme is loaded, the data is loaded by all blocks and the region is checked for being built from the cache. Of course, the hook_block_view() hook will not be executed if the cache for the block is given. A request to the {cache_block} returns a block which is already rendered.
Do not forget also about the caching settings when creating a block through hook_block_info(), which you can optionally specify to one of the following parameters:
- DRUPAL_CACHE_PER_ROLE (by default): The block may vary according to the user role and the rights granted to him.
- DRUPAL_CACHE_PER_USER: The block may vary depending on the user during a page view. However, this option can be resource-intensive for sites with many users, and it should be used only when DRUPAL_CACHE_PER_ROLE is not enough.
- DRUPAL_CACHE_PER_PAGE: The block may vary depending on the page being viewed.
- DRUPAL_CACHE_GLOBAL: The block is the same for every user on every page where it is visible.
- DRUPAL_CACHE_CUSTOM: The module implements its own caching system.
- DRUPAL_NO_CACHE: The block should not be cached. And, when you create a block, it gets a CACHE_TEMPORARY flag, which means that the data will be deleted during the next general cache clearing.
3. {cache_bootstrap} — this cache segment is responsible for the Drupal initialization data (module lists, variables, hooks enabled in the system, etc.) For this segment, there exist the following identificators ($cid) for the data in the repository:
- bootstrap_modules is the list of modules implemented by hooks of the top-priority loading (which are loaded before other modules).
- hook_info is the list of all hooks that are available for implementation.
- lookup_cache is the list of classes that are dynamically loaded and the files where they are stored.
- module_implements is the list of hooks together with modules that have the implementation of these hooks.
- system_list is the list of enabled modules and themes together with dependencies.
- variables is a cache of variables (the "variables" table).
4. {cache_field} — in this segment, the data for all fields is stored. ID($ cid) is formed according to the field: type_of entity: id_of entity rule.
For example, for the taxonomy fields with a tid equal to 14: field: taxonomy_term: 14
Here, a set of fields and their values related to a given entity are stored.
5. {cache_filter} is a segment created by the Filter module (core) which is responsible for filtering the text formats. That is, for texts, you can create a ready-made cache according to its MD5 hash. For example, the check_markup() function uses the SHA-2 hash to check whether the cache segment {cache_filter} has an already filtered text, and returns it in case of positive result. If there is no filtered text in the cache, the list of enabled filters of the current input format is loaded, and the text is processed according to the filter settings, which entails a certain time expense. So, when possible, use the cache when enabling the check_markup().
6. {cache_form} is a caching segment for form building. By its logic, it is different from the above described segments and is not used to improve performance, but rather to avoid any security vulnerabilities when building forms with FORM API. This table has the ability to grow rapidly (it has a lifetime of 6 hours specified in the core includes/form.inc), so if a site has many forms plus a large attendance, you should consider setting up Cron (or other custom solutions) to regularly clean this segment.
7. {cache_image} is a segment reserved by the Image module. As such, there is no image cache in it, and the table basically serves to save the data about the manipulations with pictures.
8. {cache_menu} is a segment reserved by the Menu module which serves for caching and storing the data on links from all menus created via Drupal interface. The cache ID itself is formed in this way:
$cid = 'links:' . $menu_name . ':tree-data:' . $GLOBALS['language']->language . ':' . hash('sha256', serialize($parameters));
For example:
links:main-menu:page:admin/reports/status:ru:1:1
9. {cache_page} is one of the most important cache segments. It is used to store cached pages for anonymous users. If, during a page being built for a certain user, a cache has been found, only 2 hooks will be executed: hook_boot() and hook_exit(), the rest will be skipped. To enable this cache, you just need to go to admin/config/development/performance and enable the appropriate caching.
10. {cache_path} is a segment responsible for maintaining the correspondences between the system path and its alias.
11. {cache_update} is a segment reserved by the Update manager module. The cache data contains all the information on the releases for the enabled modules. However, the segment acts more like a data storing table, and does not influence the performance. Updating the table happens only when Drupal gets new data on the releases of the modules.
Also, additional segments create third-party modules (hacked, l10n_update, token, views). Among them it is worth mentioning {ctools_object_cache} which is a cache segment used to store large objects being edited at the moment. That's why, instead of the $cid field, this table has $sid(Session ID), the identifier of the current user session. Similarly, the same table has no $expire, so it is not erased after clearing the cache via the interface, but is cleaned every day according to Cron. An example can be the cache of the configured but not yet saved views.
To sum up, we can say that a cache segment does not always work as a buffer, exactly, as was stated at the beginning. Therefore, knowing each segment and its purpose, you can use this tool skillfully and successfully in practice.
Creating custom cache segments
How about creating your own segment for individual purposes, such as, for example, a custom module? In this situation, you will need to just clone a standard segment {cache} and create your own on its basis.
For example, create the "My module" module, for which "cache_my_module" will be your new cache segment.
To clone the cache table, implement hook_schema() in my_module.install and add your table.
/** * Implements hook_schema(). */ function my_module_schema() { $schema['cache_my_module'] = drupal_get_schema_unprocessed('system', 'cache'); $schema['cache_my_module']['description'] = 'Cache table for My module.'; return $schema; }
Here you go — the table has been created. The next important step is cleaning the table during the system cache cleaning. For this purpose, Drupal has a hook called hook_flush_caches(). Apply it. To do so, write the following in the my_module.module file:
/** * Implements hook_flush_caches(). */ function my_module_flush_caches() { return array('cache_my_module'); }
The function example below shows the cache use. A specially created resource-intensive cycle calculates the amount according to a certain rule. During the first run, this function was executed in 5.85 sec., while next time and ever after it took approximately 0.001s.
/** * Example function. */ function my_module_cache_page() { timer_start('cache_module'); $cache_id = 'cache_example_files_count'; $sum = 0; if ($cache = cache_get($cache_id, 'cache_my_module')) { //if there is cache for this ID, assign data to it $sum = $cache->data; } else { // do some resource-intensive calculations for ($i = 0; $i < 30; $i++){ $files_count = count(file_scan_directory('.', '/.*/')); $sum += $files_count * $i; } //save our calculations in the cache cache_set($cache_id, $sum, 'cache_my_module', REQUEST_TIME + 15 * 60); } //display the time spent $timer = timer_read('cache_module') / 1000; print $sum . ' We got a sum in = ' . $timer . 'seconds.'; }
As you can see, the difference is considerable. However, for significant loads on the site, you should focus on minimizing the requests to the database. So the question is how to move the cache outside the database.
Moving the cache segments outside the database
You can do it, for example, with Memcached. Memcached should be installed on the server. You can either install it yourself, if it’s a dedicated server, or ask your hoster. Next, you need to move all cache segments. For this purpose, there is a ready Memcache Storage module. Before installing it, you need to link the module’s functionality with the Memcached daemon on the server. This is done by adding the following code lines to settings.php:
$conf['cache_backends'][] = 'sites/all/modules/memcache_storage/memcache_storage.inc'; $conf['cache_default_class'] = 'MemcacheStorage';
Now you can enable the module. But it should be noted that Memcached is using memory. And, as we have already mentioned, the cache segment {cache_form} can be very significant in size, especially if there is a large number of users. So keeping this data in RAM is not a great idea. Therefore, use these lines to ask Drupal to return this segment cache to your database:
$conf['cache_class_cache_form'] = 'DrupalDatabaseCache';
There are several common practices in the effective use of Memcached. Since we have already moved the cache to RAM, there is no point in Drupal’s request to the database when building a page at the bootstrap phase, therefore:
// sending the cache page to RAM $conf['cache_class_cache_page'] = 'MemcacheStorage'; // deactivating the link to the database $conf['page_cache_without_database'] = TRUE;
In addition, there is functionality that directly affects the database with the cache, since, at the bootstrap stage, the hook_boot() and hook_exit(0) hooks are used anyway. We will fix it also:
$conf['page_cache_invoke_hooks'] = FALSE;
Besides Memcached, you can use other solutions. For example, APC, Boost, Varnish, XSache, Redis. All these systems can be used simultaneously, moving certain cache segments in this or other solution.
Good luck in your work with cache segments and caching in Drupal 7!