neatComponents - Performance and Scale

clearString neatComponents
Documentation

neatComponents is the hybrid-cloud database engine that powers clearString.
Previous page	Process Management	Next page
Performance and Scale

Once the fundamental design has been determined, with a fully normalised data structure, it is important to consider whether performance and scale issues need to be examined. Naturally for ‘quiet’ systems with little data, these aspects may be ignored, but for systems containing a substantial amount of data, or subject to high usage – or both, then issues of performance will require the design to be adapted.

In considering these issues, it is very difficult to define a threshold at which they will become important. However, most systems collect more data over their lifetime, and usually also become busier, so these issues should still be considered for sites that may not seem to qualify at the outset.

Measuring performance

Before attempting to optimise the performance of the system we must find a way to objectively measure that performance, so any issues, and improvements, can be quantified.

In essence, we are looking at the speed of the system. This can be divided into two aspects:

The time taken to generate the page
The time taken to deliver the generated page

Most of our consideration will be concerned with the time taken to generate the page, as this is more likely to be within our control. The time taken to deliver the page will be affected by the connectivity of the server to the internet, the connectivity of the user to the internet, and the distance and complexity of the path between them. This may be a factor in choosing where to locate the server, but other factors (such a security and cost) may outweigh this aspect.

In focusing on the time taken to generate the page, the system provides a measure of the lapsed time each page took to create, and includes this as a comment at the bottom of the source of the delivered page. Choosing View Source in the browser and scrolling to the bottom will reveal this number, in milliseconds.

This time is taken by the system setting a ‘stopwatch’ when work is first started on the page, and stopping it when finished. Thus the time captures how long it took, but is also affected by how busy the server may have been with other tasks at the time. So when measuring the performance of a page it is usually worth taking a few readings and averaging them, particularly if the first reading seems excessive.

Each page within the site will need to be considered individually. Pages which receive high traffic should be given particular attention, not just for their own performance, but because their load may, in affecting the general loading on the server, affect the performance of other pages.

In this section we outline the tools available to improve the performance.

Hardware choices

Clearly affecting everything is the choice of the hardware. Without going into excessive detail, the aspects to be considered include:

CPU speed
Memory speed, and amount
Hard disk speed
Network connectivity

With all of these, clearly more-is-better, with the specifics determined by the particular application, and the available budget.

Other hardware aspects, such as security, resilience, backups etc, are outside of the scope of this discussion, but should not be ignored.

A policy for regular hardware re-evaluation and renewal should be put in place. Not only will this maintain resilience, but it will give an opportunity for new technologies to be used that will enhance performance. For example, the emergence of SSD drives has led to a significant improvement, and can be easily swapped in to replace older drives. Similar improvements may be expected in the future for other aspects.

Avoid becoming an inadvertent data warehouse

It is easy to store data, and add to amount of data being stored, but it is worth considering how long that data needs to be stored, and if it does need to be stored long-term, whether it needs to be stored in this system, or would be better being archived off elsewhere.

Storing excessive data will cause a number of issues that will only become apparent over time, and which will gradually degrade performance. These include:

Slowing query execution and thus page generation time.
Increasing backup and restore duration.

As well as non-performance issues such as increasing legal discovery risk, and increasing storage costs.

Caching

Generally, what takes the most time is the querying and calculating of stored data, before it is displayed. While improving the hardware will speed this up, the work still needs to be done. Unless the server has already done it ahead of time, and has a record of this cached somewhere. Such caching mechanisms provide the greatest speed improvements, since we now are not having to perform the calculations at all, but only need to find the cached results and deliver them.

There are several caching approaches that can be applied, which can be used in combination as required:

Expression pre-computing (field caching)

Database normalisation suggests we should avoid having extra fields repeating information stored elsewhere, and should use query expressions to derive values whenever we need them. However since these expressions take time we can ignore this principle, and add extra fields to tables to store pre-computed values, ready for instant delivery.

For example, you might have a table with two numeric fields: a quantity and a value, and you need to display the product of these multiplied together on various reports. Clearly each query could easily include an expression of “quantity * value” multiplying them every time the report is run. Each instance only takes a small amount of time, but it all adds up.

So instead, add another field to the table, called ‘total’, and use an Update Field action to store the multiplied value. The Update Field action still uses a Query to perform the calculation for it, but this query is only executed once (or each time the record is updated), rather than each time the record is queried.

To maintain data integrity, it is important the field is updated every time the record is saved, in case either of the fields it relies upon have changed. Simply placing the Update Field action in the Record Change event achieves this.

The performance increase for this trivial example may not be that large, but the improvement will be significant if the expression being executed can avoid the need for a database join on the final query.

For example, if a messaging application allows messages to be sent with multiple attached files, which are being stored in a separate table, and you need to provide the recipient with an indicator showing how many files are attached. This would normally require the view of the messages to include a join between the messages table and the attachments table. Pre-computing the number of attachments and storing this in the messages table avoids the need for this join and will be much faster to display.

For versioning, this type of caching can be characterised as ‘near-perfect’ in that there is only a brief interval between the record being updated and the Update Field action being executed, during which the value is not guaranteed. This interval, while brief, is sufficient to mean that the post-submission surface shown to a user immediately after they have submitted a form, should not expect to have the computed values available to it yet. Instead, the user flow should include an interstitial page to give the system time to do the calculations in the background.

View caching

Where a View contains a lot of data it may be worth looking at caching the output of the View, effectively storing the fragment of HTML markup that the View has constructed, for later display in the same circumstances.

This approach can be used for any View, but is particularly useful for Custom Views and Sale View, as these often include complex contents, either directly involving Queries with complex joins, or including other Views embedded within the main one. All this complexity serves to slow the generation of the surface.

View Caching involves specifying how long the View that has just been rendered can be used for. This may be a few minutes, or even a few hours or days, depending on the circumstances. Subsequent requests for the same View during this period will use the cached version, and the database will not be queried.

View caches take into account all the calling parameters, and are sensitive to whether the output is varying based on which user is looking at it, which usergroup they are in, which domain they are viewing the site on etc., so can be relied upon to show the correct data.

View caches are caching fragments of the final page, so a complex individual page may be constructed using a mix of cached and uncached content.

For versioning, this type of caching can be characterised as ‘imperfect’, as the underlying data may have changed during the cache period. If it is essential that the user is shown the latest values you should not use this cache mechanism. Alternatively, if this is generally an acceptable trade-off, but there are occasions when everyone should see the latest version, the Clear Site Cache action can be used to flush the cache.

External caching

External caching is generally beyond the scope of this discussion, but is provided by software running on separate servers, either near to the server, acting as a reverse proxy, or near to the user, acting as part of a content delivery network. These generally work at the whole-page level, rather than acting on fragments of pages, and are best for high volume public sites where the cache does not need to be concerned with per user personalisation or security issues.

For versioning, this type of cache can be characterised as ‘imperfect’, since the underlying data may have changed in the interim.

Additional Resources

Process Management

Translate:

clearString neatComponents Documentation

clearString neatComponents
Documentation