Wednesday, May 5, 2010

In-Memory DataSets: ClientDataSets and .NET DataTables Part 6: Applying Updates to a Database

In the preceding article in this series I discussed various techniques that you can use to manage the change cache. In this installment I will conclude that discussion by looking at how you can apply the changes held in the change cache to the underlying database from which the data was originally loaded.

Call the ApplyUpdates method of a ClientDataSet to save any changes made to the in-memory data to the underlying database. Specifically, if you edit data obtained through a dataset provider, and then close or free a client dataset without calling ApplyUpdates, any changes stored in the change log are lost.

When ApplyUpdates is called, the contents are sent back to the dataset provider for resolution in the context of a transaction (so long as you pass a non-negative integer as the sole parameter of the ApplyUpdates method. When you pass –1, no transaction is initiated). The dataset provider, in turn, generates the necessary calls to apply the updates to the underlying dataset, applying the changes to the underlying dataset one record at a time.

By default this process is handled by a SQLResolver instance, which is created by the dataset provider. The SQL resolver determines the database that needs to be updated, and then creates the necessary SQL statement to apply the changes based on the contents of the change log. Specifically, one SQL statement will be generated for each change that needs to be applied.

Alternatively, the dataset provider can be configured to use the dataset from which it originally read the data to apply the changes. This approach is only possible when the dataset from which the records were read permits data changes.

For example, if the dataset provider gets its data from a TTable, the dataset provider can edit the TTable directly, inserting, deleting, or posting the changes using the TDataSet interface. Again, these changes are applied one at a time. It should be noted that when the dataset provider resolves the changes through the dataset, the dataset's event handlers, such as BeforeDelete and BeforePost can be used to perform data validation.

By comparison, if the dataset provider gets its data from a dataset that does not permit data editing, the dataset provider cannot resolve the data to the dataset. For example, if the dataset provider gets its data from a SQLDataSet, a dataset that retrieves its data using a unidirectional cursor and which does not permit editing, the dataset provider cannot resolve the changes directly to the dataset.

In these cases, there are two options. Either the default SQLResolver described earlier can be used, or you can write a BeforeUpdateRecord event handler. From within this event handler your code is given a reference to the changes that must be applied, and your code can take any action necessarily to explicitly apply these changes. This approach is the most flexible, although the most difficult to implement.

What kind of SQL (or direct edit) is generated is controlled by the UpdateMode of the DataSetProvider. If set to upWhereAll, a record is updated only if that exact record currently exists in the database. If set to upWhereChanged, the record is updated if a record with the same key fields and same values in the modified fields are found (this is a merge). When set to upWhereKeyOnly, an update is made if a record with the same key is found (last to post wins).

Updating .NET DataSets is similar to updating ClientDataSets. In that case, however, the DbDataAdapter class is the one typically used to apply the updates. DbDataAdapters, such as DataStoreDataAdapter, have four DbCommand properties. The SelectCommand property contains the SQL statement that returns the result set that is inserted into the DataTable when you call the Fill method. The other DbCommand properties, DeleteCommand, InsertCommand, and UpdateCommand, are designed to hold parameterized queries that will get their parameter values from the change log at runtime when you call the DbDataAdapter.Update method.

Some developers write the SQL statements that define the DeleteCommand, InsertCommand, and UpdateCommand objects manually. Doing so give them control over the queries, permitting the queries to be optimized for the underlying database. Other developers use a CommandBuilder to generate these queries. CommandBuilders are easy to use, but do not always generate optimized queries.

To use a CommandBuilder, you call its constructor, passing to it the DbDataAdapter whose SelectCommand command has already been defined. Based on this query, the CommandBuilder generates the DeleteCommand, InsertCommand, and UpdateCommand queries.

The following code shows the configuration of a CommandBuilder:

Connection1 := DataStoreConnection.Create(
'host=LocalHost;user=sysdba; ' +
'password=masterkey;database="C:\Users\Public\Documents\' +
'Delphi Prism\Demos\database\databases\BlackfishSQL\employee"';
//Sql statements are executed by IDbCommand objects
Command1 := DataStoreCommand.Create('SELECT * FROM customers',
//DbDataAdapters are used to populate DataTables and resolve data
DataAdapter1 := SqlDataAdapter.Create(Command1);
//CommandBuilders create DbCommand objects for a DataAdapter's
//DeleteCommand, InsertCommand, and UpdateCommand properties based
//on the DbDataAdapter.SelectCommand IDbCommand property
CommandBuilder1 := DataStoreCommandBuilder.Create(DataAdapter1);
DataSet1 := DataSet.Create;

The following code shows how the changes made to the DataTable are applied.

DataTable1: DataTable;
DataTable1 := DataSet1.Tables[0].GetChanges;
if DataTable1 <> nil then

Applying Updates and Persisted Data

Probably one of the most important characteristics of an in-memory dataset's ability to apply its updates to the underlying database is its concurrent ability to persist its data and state to a file, stream, or database. Together, these features permit an in-memory dataset to apply its updates at some future time, regardless of whether it has been persisted or not (and independent of its duration of persistence).

In order for a dataset to be able to apply its updates to a database subsequent to its persistence, the dataset's change log must be intact. Without this information, you lack the data required to determine which changes have been made to the dataset since it was originally populated.

If you are relying on a DataSetProvider (for ClientDataSets) or an DbDataAdapter implementation (for .NET) to apply the updates to the database, the object must be in a state compatible with applying the dataset's updates. For example, a DataSetProvider that you use to apply a previously persisted ClientDataSet's updates to a database must point to a TDataSet whose structure is consistent with the one that was used to originally load the ClientDataSet (unless you are using the DataSetAdapter's BeforeUpdateRecord event handler to programmatically apply the update, in which case it's all up to your code).

In the case of a .NET DataTable, the DbDataAdapter that is used to apply its updates must hold DbCommand instances in its DeleteCommand, InsertCommand, and UpdateCommand properties that contain parameterized queries sufficient to the task of applying those updates. This can be achieved by defining these queries manually, or by having an adequate DbDataAdapter.SelectCommand instance that can be used by a CommandBuilder to construct the necessary queries (of course, you must then use a CommandBuilder to create those delete, insert, and update queries objects). Otherwise, you once again must take matters into your own hands and generate all calls to update the underlying database by programmatically examining the change log and generating the necessary queries.

What If Updates Cannot Be Applied?

There is another issue that is worth mentioning. Specifically, even when you can construct the necessary queries to apply updates contained in a persisted in-memory dataset, those updates may not be possible. Specifically, it is conceivable that between the time that the data was originally loaded into the in-memory dataset and the point in time at which you want to write it back to the underlying database, the corresponding records in the database have been changed.

While many developers worry about this possibility, it is normally less of a concern than it might at first appear. For example, if you delete a record from an in-memory dataset, and then attempt to apply that deletion (at a later time), but find that the record was already deleted by someone else, who cares? It's gone. Mission accomplished (by someone, at least).

Similarly, if you attempt to apply a record insertion, only to find a record with that key already inserted, then the insertion is not necessary. But what, you might ask, if another user inserted a record with the same key as you (which causes your insertion to fail), but the other user's inserted record is different than the one you intended?

This is really an architectural issue, isn't it? Many of today's developers avoid this issue altogether by assigning arbitrary (and more or less meaningless) primary keys to each and every record. For example, many developers use GUIDs (globally unique identifiers ¾ 128-bit values that are guaranteed to be unique), to each record. In these cases, duplicate records are impossible, no matter who inserts the record.

Updates are a bit more problematic, but nonetheless generally workable. In most cases it is once again an architectural issue. There are, in fact, three options. In the first case, an attempt to update a record that you are holding in memory fails if someone else has changed any part of that record (in the underlying database) since the time you originally loaded it into memory. In this situation, your update query should include a test for all original field values in the WHERE clause of the update query.

In reality, failure to update a record that has been subsequently updated by another user, when those updates must be entirely exclusive, is rarely a tragedy.

The second case is to permit two or more users to update a record, so long as neither user changes the key field values, or a common field. This is a fine compromise in many situations. For example, if, since the time that you read your record into memory and the time that you are attempting to apply your update, another user has changed that record, but not one of the fields that your user changed, who cares? Simply include the record's key fields, and your changed fields, in the WHERE clause of the update query, ignoring those non-key fields you did not change. (If you use an arbitrary key field, such as a GUID, it is unreasonable, and should be prohibited, that anyone should ever change the key field value).

The third option, and one that is appropriate for some database applications, is simply that the last user to attempt to write to a record be permitted to do so, overwriting any previous user's applied updates. For example, imagine that five different sensors record the current temperature in a given area. If all sensors are equally correct, the most current update attempt is the most accurate. Who cares about a temperature that was written ten minutes ago? A similar analogy can be made with respect to stock prices. A more current stock price is more accurate. Past updates are old news.

Copyright (c) 2010 Cary Jensen. All Rights Reserved.


  1. Mr Jensen,
    Last time I worked on ADO.NET I find it much useful for server processing, given it's array-oriented approach to extracting, filtering and manipulation. Expressing server logic on ADO.NET is easier than TClientDataset.
    On the other way, I find the binding thing (at least with Winforms) of ADO.NET clunsy. Too much things to worry - I hope it got better with the new Frameworks. In this fiels, TClientDataset is unbeatable.
    Best regards,

  2. Dear Carey
    I have your book on ClientDataSets and by the end of Chapter 3, I had poked pencil holes in it in frustration! Here's the issue: There are almost no real-world applications where the Provider's Dataset is a table OR where it points to a SQL statement that doesn't contain a join. In most cases, everyone developing a real-world application is going to have to code an OnBeforeUpdate event handler. You barely touch this issue in your book (similarly above) and instead spend endless chapters on matters that anyone familiar with Delphi programming *should* already know well (eg. how to navigate/filter a dataset - duh)

    Above you say "This approach is the most flexible, although the most difficult to implement." On the contrary sir, from where I sit, trying to understand the possibile permutations of the Provider's UpdateMode and TField ProviderFlats is difficult to understand.

    I would enjoy reading a focused description of coding a OnBeforeUpdate event handler, especially in the context of a Datasnap Server, with attention to the contents of the DeltaDS when inserts, updates and deletes are pending, and when the corresponding ClientDataSet is in a different executable.