Wednesday, September 30, 2009

In-Memory DataSets: ClientDataSet and .NET DataTable Compared: Part 1 Overview

As some of you know, I have been a big fan of Delphi's ClientDataSets since they were first introduced in Delphi 3 (that's way back in 1997). When .NET shipped, its data access framework, ADO.NET, also included an in-memory dataset, named the DataTable. (.NET also includes the DataSet class, but in most cases, the DataTable class bears the strongest resemblance to the ClientDataSet.)

Both ClientDataSets and DataTables are in-memory datasets, and as such, share a lot of features in common. On the other hand, they are radically different in a number of interesting ways. In this series of articles, which begins with this one, I will examine the general features of in-memory datasets, and provide a direct comparison between ClientDataSets and .NET DataTables.

This article begins with a introduction to in-memory datasets in general. In future posts I will provide explicit code examples of how to perform various tasks with these two datasets, including how to create them in code, reading and writing data, sorting, filtering, persisting, navigating, and so on.

I hope you enjoy.

Developing with Disconnected Datasets

Disconnected datasets are database table-like structures that are stored in memory. These types of datasets are sometimes referred to as cached datasets or in-memory datasets. In this series they will be referred to as in-memory datasets.

In-memory datasets are structured, high-performance, self-describing data structures temporarily stored in memory. A significant feature of in-memory datasets is that they maintain, and can persist, a change log. The change log permits you to programmatically determine what changes have been made to the data since some point in time, often when the data was originally loaded into the dataset. This information is essential if you need to persist these changes back to an original source, such as a Web service, underlying database, or other persistence mechanism.

Data persistance in the .NET framework is based on in-memory datasets, and Delphi has included this capability since Delphi 3 in the form of the ClientDataSet.

This series begins with an overview of the features that make in-memory datasets so useful, including their self-descriptive nature, ability to hold sophisticated relational data structures, their close association with XML, and their persistence and management of change information. How these features are surfaced in both ClientDataSets and .NET datasets is discussed in this section.

In-memory datasets are the cornerstone of modern software development. Nothing confirms this statement as much as Microsoft's commitment to in-memory datasets as a central aspect of the database framework in .NET, ADO.NET.

While most developers consider in-memory datasets for the presentation layer in applications, this use represents only a fraction of the possibilities for these powerful data structures. As this series will demonstrate, the characteristics of in-memory datasets make them a valuable tool for many different aspects of application development. The following are the essential feature of in-memory datasets:


  • High performance

  • Self describing

  • Flexible

  • Change log managing

  • Persistable

Individually, these characteristics provide a compelling argument for using in-memory datasets in your applications. But it is the combination of these features in a single, easy to use class that makes them so valuable for a wide range of software features. The following sections look at each of these features in greater depth.

High Performance

In-memory datasets reside entirely in RAM (random access memory). Consequently, operations on the data they contain, including searches, filters, and sorts, are very fast. This is particularly true with respect to ClientDataSets, since these can have indexes on this data as well. But even for .NET datasets, which currently support a single index at any given moment (the primary index), data-related operations are many times faster than those that require disk reads (as is the case with a physical database).

Self Describing

In-memory datasets are formally designed around the concept of a database table. Unlike an array or sequence, whose data elements have a data type, and that's about it, the fields of a data table each have a name, a data type, and sometimes a data size (for example, the size of a text field or precision of a floating point number).

In addition, the fields of a data table may have constraints, such as a required field constraint, or referential integrity constraints when two or more in-memory tables are related. This information is typically referred to as metadata, which is data about data.

In ClientDataSets, you access the metadata of a dataset using the Fields property of the dataset, which contains a collection of TField instances. In .NET data tables, you access this information using the Columns property, which contains a collection of DataColumn instances.

Flexible

In-memory datasets are designed to hold nearly any kind of data that might be stored in a physical database. This includes primitive data values, such as integers, strings, real numbers, and date/time values. But it also includes variable length objects, such as memos and Blobs (binary large objects). As a result, an in-memory dataset can hold the pages of a Web site, PDF files, and even executables (.EXEs and .DLLs). If it can be stored in a file on disk, it can be stored in an in-memory dataset (obviously, subject to the limits imposed by your available RAM).

Change Log Managing

Both ClientDataSets and .NET DataTables have a change log. The change log permits you to manage the unresolved changes that have been posted to the dataset's data since you loaded it into memory. This management includes the ability to determined precisely what changes have occurred (which records were inserted, deleted, and field-level modifications), revert changes to their prior state, cancel all changes, or commit those changes permanently, thereby erasing the change log. With ClientDataSets, this change log if held in the Delta property. For .NET DataTables, you use the RowStateFilter of a DataView to access the change log.

To manage the change log for a ClientDataSet, you use its methods, such as RevertRecord, UndoLastChange, CancelChanges, and ApplyUpdates. In addition, you can use the RecordStatus, StatusFilter, and Fields properties to examine the change log contents.

With .NET DataTables, you use the methods of the DataTable and DataView classes to control the change log, including NewRow, DeleteRow, AcceptChanges, and RejectChanges. To examine the change log, you use the RowStateFilter and Rows properties.

Persistable

Of all the features supported by in-memory datasets, the ability to persist state is arguably the most powerful. Not only can you save an in-memory dataset's data, but you can save its change log as well. Specifically, it is possible to save the current state of an in-memory dataset to a file, Web service, or memo field of a database, and then to restore that dataset at a later time to its exact prior state. In short, there is absolutely no difference between the in-memory dataset prior to, and following, its persistence.

Consider the following scenario: After loading data into memory, and making several edits to an in-memory dataset, that dataset can be written to a file. At a future time, that dataset can be restored from the file, and the edits that were previously performed can be examined and rejected or accepted.

Furthermore, since the change log is restored to its exact prior state, that information can be used to resolve those edits to the underlying database from which the data was originally loaded. No information is lost during the time that the dataset is in storage, no matter how long its state was persisted.

Copyright (c) 2009 Cary Jensen. All Rights Reserved

6 comments:

  1. Do they really have change *logs*? I was under the impression that they just kept a "before" version of each record, along with the "after" (the current modifications). "Change log" suggests that they actually keep full track of what changes were made and in what order.

    ReplyDelete
  2. Good call. I call it a change log, but as a recent student in one of my classes pointed out, it is best thought of as a change cache. You are correct that in-memory datasets only save enough information to permit changes to be posted to an underlying database. They do not include anything like an audit trail. Old habits are hard to break (I've been calling it a change log for a long time), but I agree that the term change cache is appropriate, and I will try to use that term from now on.

    ReplyDelete
  3. Hi Cary,
    As a proponent of both ClientDataSets and the Advantage Database Server, have you noticed a memory leak caused by connecting a TClientDataSet to a TAdsDataSet? Unless I'm doing something wrong there appears to be a bug in AdsCnnct.pas (this is with Delphi 7 and ADS v8). The protected class member TAdsConnection.FStmtList is not freed when the AdsConnection is destroyed. I reported this on the Advantage.Delphi newsgroup last week but got no response at all.

    ReplyDelete
  4. Doug

    Not familiar with this one. I am going to forward your comments along to the team at Sybase.

    ReplyDelete
  5. Thanks Cary - actually Jed Thomet did confirm on Feb 16 that this was indeed a bug.

    ReplyDelete
  6. Let me say this for anybody who does not already know this. The Advantage team from Sybase is the most responsive that I've ever seen when it comes to confirming and reacting to reports of issues in their software. The result is that problems in Advantage products, when revealed, are often short lived. Good job, Advantage team. And thanks for reporting this issue, Doug.

    ReplyDelete