This article is a continuation of my previous post about using IBM U2 UniVerse‘s ADO.NET driver with Visual Studio. Today I’m going to deal with the part of the process that makes .Net programmers cringe but is the key to crafting effective SELECT statements: the UniVerse dictionary.
You probably already have dictionary entries that you (or others) use with UniVerse Basic code, but there are a few cases where you may need to modify them in order to access the fields through ADO.Net. UniVerse Basic is pretty forgiving about incorrect dictionary entries — as long as the entry tells the code which attribute to look in, UniVerse Basic is happy. Unfortunately, ADO.Net (and other SQL-oriented APIs) are picky about data types and conversion codes — they frown on UniVerse’s attempt to portray every piece of data as a string.
IBM has provided a utility named HS.SCRUB which analyzes your dictionaries and flags any fields which might cause problems for the database driver. If you are brave, you can even let HS.SCRUB "autofix" your dictionaries. This utility was actually introduced with the UniVerse ODBC driver, and you’ll find the documentation for it in the UniVerse ODBC Guide and the Using UniOLEDB manual. (You can download PDF copies of any of the UniVerse manuals from here.) Frankly, the utility found so many problems in our dictionaries, including a lot of false positives, that I ended up ignoring it and building new dictionary entries from scratch. Your mileage may vary.
However, even after letting HS.SCRUB have its way with your dictionaries, there are some cases that will require special attention:
1) Non-string fields that may be null.
If you don’t specify the data type of a field, the database driver will helpfully try to guess the data type on-the-fly by looking at the data. HS.SCRUB uses a similar approach to determine the correct data type. Unfortunately, both of them are tripped up by non-string fields which contain some zero-length values — if you define these as any data type except strings, the zero-length values will result in "invalid data type" .Net Exceptions at run time.
In cases like this, I’d suggest you force the field to be read as a string, by specifying "CHAR" in the data type attribute of the dictionary entry. The data type attribute is 6 for A and S-type dictionary entries, and attribute 8 for D- and I-type correlatives. If you have no idea what that means, then you’d better have a look at Chapter 5 of the UniVerse System Description manual. (As mentioned earlier, PDF copies of all of the UniVerse manuals can be found here.)
2) Time fields with contain milliseconds.
In Universe files, time fields are written in a "Julian" format which converts the time to the number of seconds past midnight: for example, 1:30 am is 5400. Optionally, these time fields can contain the number of milliseconds, which is written as a decimal value: for example, 5400.5 is half a second after 1:30 am. Unfortunately, the UniVerse ADO.Net driver can’t convert a string with that format into a TimeSpan. One workaround is to define the field as a string, by specifying "CHAR" as the data type (see point (a), above). You will then have to write a bit of code to convert the string to a .Net TimeSpan value:
public static TimeSpan TimeSpan_FromUniverseString(string strUniverseTime)
{
int intUniverseTime = 0;
// if the string contains milliseconds, discard them
int intIndex = strUniverseTime.IndexOf('.');
if (intIndex >= 0)
{
strUniverseTime = strUniverseTime.Substring(0, intIndex);
}
if (!Int32.TryParse(strUniverseTime, out intUniverseTime))
{
// if not a numeric, then return 00:00:00
return new TimeSpan(0);
}
int intDays = intUniverseTime / 86400;
int intRemainder = intUniverseTime % 86400;
int intHours = intRemainder / 3600;
intRemainder = intRemainder % 3600;
int intMinutes = intRemainder / 60;
int intSeconds = intRemainder % 60;
return new TimeSpan(intDays, intHours, intMinutes, intSeconds);
}
However, an easier workaround (which I stumbled across after using the above approach for months) is to take advantage of the "correlative" code.
This code only exists in A- and S-type dictionary entries, in attribute 8. Ordinarily, time fields are defined in UniVerse dictionaries by specifying a "conversion code", such as "MTS" in attribute 7.
ACCESS.TIME
001 A
002 3
005 S
007 MTS
009 R
010 5
This tells ADO.Net (or UniVerse Basic) that the value in this field can be regarded as a time. However, by specifying the same conversion code in attribute 8, the conversion to a time value is handled internally by the UniVerse database before ADO.Net sees it.
ACCESS.TIME.SQL
001 A
002 3
005 S
008 MTS
009 R
010 5
Therefore, a value such as "3600.5" will be seen by the ADO.Net driver as 1:30 am, without that nasty millisecond value that it so dislikes.
3) Multivalues
Ah yes, multivalues. This is a method for storing multiple values into the same field, separated by a special delimiter. For example, attribute 1 might contain a list of order numbers, and attribute 2 the amount (in cents) of each of order:
001 C2000ýM3000ýS3000 002 600782ý700422ý101456
Multivalued strings are the primary feature that distingishes UniVerse and its brethern databases from all the others, but .Net doesn’t really understand the concept (nor do many .Net programmers, actually). UniVerse provides some tools which allow you to dress up the multivalued data to resemble columns in a normalized SQL database — it is up to you whether you want to play dress-up, or just process the multivalued fields in their underlying form: delimited strings.
There are 3 different approaches to definining multivalued strings in the dictionary so that they are safe for consumption by .Net:
i) The documented approach is to use a D-type dictionary entry which contains an "M" in attribute 6 and an "association" name in attribute 7. For example, the definition of the Order ID and Order Amount fields might look like:
ORDER.IDS
001 D Order IDs
002 1
004 Order IDs
005 10L
006 M
007 DETAILS
ORDER.AMOUNTS
001 D Order Amounts
002 2
003 MD0,$
004 Order Amount
005 7R
006 M
007 DETAILS
And the dictionary entry which defines the virtual "DETAILS" table would be:
DETAILS
001 PH ASSOCIATION OF ORDER DETAIL FIELDS
002 ORDER.IDS ORDER.AMOUNTS
If the main table which contained the multivalued fields was named SALES, then there would now be a virtual table named SALES_DETAILS containing three fields: ORDER.IDS, ORDER.AMOUNTS, and a foreign key field that points back to the primary key of the SALES table.
For details, see chapter 5 of the UniVerse System Description manual, and for a thorough example see the sample code which is included with Part 2 of IBM’s tutorial on the ADO.Net driver. In the third part of this series, I will show an example of how you would use a JOIN clause in a SELECT statement to return the fields in the virtual table in the same record as the fields in the main table.
The advantage of this approach is that the resulting SQL statements will closely resemble those that would be used for a relational database, and this should make it easier to port your application from UniVerse to a relational database should you wish to. The downside is that it can become quite cumbersome to code when your records contain multivalued attributes that aren’t related on one another. For example, if a customer record contains a list of order numbers in attribute 1, and a list of customer contacts in attribute 2, you’ll have to specify a different association name for these 2 entries, and retrieve them using either a very complicated 3-way JOIN, or using 2 different SELECT statements. Also, since UniVerse does not offer any constraints or other mechanisms for enforcing the integrity of your data, you are likely to find that bad data fouls up your application, resulting in missing rows or runtime Exceptions. In our application, we use this approach only for small and simple tables.
ii) You can return the multivalued strings in their "native" form — strings with delimiters separating the values. You do this by using an I-type dictionary entry that converts the multivalue delimiter (ASCII 253) to whatever character you want. For example, to return attribute 1 with the multivalued fields delimited by commas:
FIELD.LIST 001 I 002 CONVERT(@VM,',',@RECORD<1>) 004 FIELD.LIST 005 50L 006 S 008 VARCHAR,32767
You might be wondering why you can’t just return the strings using the original delimiter, ASCII 253. For reasons known only to IBM, this results in a runtime exception. Incidentally, there is is a second type of delimiter which is also common in UniVerse, the subvalue delimiter (ASCII 252). UniVerse subvalues occur when the parts of a multivalued fields are, in turn, delimited into smaller fields. For reasons also known only to IBM, these are automatically converted by the ADO.NET driver to a carriage return and line feed (i.e. "rn"). IBM support has indicated that this delimiter conversion is actually a bug and will be changed back to ASCII 252 to a future Fix Pack.
Once you have the delimited string, you’ll need to use .Net’s string parsing capabilities to separate it into individual fields and convert the fields to their correct data type. You’ll find that .Net isn’t ideally suited for this type of string processing. UniVerse, on the other hand, is very much suited to it, and IBM provides a library named UniVerse Objects.Net that contains a variety of methods for extracting data from delimited strings. Frustratingly, your ability to use UniVerse Objects to parse your data is hampered by the fact that ADO.NET has forced you to replace all the multivalue and subvalue delimiters with other delimiters that UniVerse Objects doesn’t support.
If you decide to read UniVerse multivalued data in its native form, you might want to consider bypassing ADO.NET altogether and reading the data using UniVerse Objects. This is what our application does when dealing with tables that contain data that is mostly multivalued. It is much faster, and will likely require less code.
iii) While "official" UniVerse multivalued fields are delimited using ASCII 253 and 252, many UniVerse Basic programmers grow so fond of multivalues that they use them all over the place, with whatever delimiter character tickles their fancy. I’ve seen multivalued strings used as the primary key, and I’ve seen multivalued strings that contain embedded dates and time fields. While that kind of data structure would make SQL programmer cringe and a SQL database crawl, UniVerse is tuned to handle these strings efficiently and provides functions that you can insert in your dictionary to extract these fields relatively easily.
The trick is to place a correlative in attribute 8. Appendix C of the UniVerse Basic manual documents these correlatives, but be forewarned: they are cryptic enough to make a Perl programmer weep with frustration. Here are a couple of examples to get you started.
Say you had a key that was delimited with "@" signs. Just to make things interesting, say that one of the fields was a UniVerse Julian date, and another was a UniVerse Julian time. A typical key would be: FIRSTPART@14927@49055.75.
If you wanted to return the 1st part of the key ("FIRSTPART") as a separate field in the SELECT statement, you could define it as follows in the UniVerse dictionary:
THE.STRING
001 S
002 0
005 S
008 F;0;(G0@1)
009 L
010 20
The correlative in attribute 8 means "take attribute 0 (the key), then split it into sections with ‘@’ as the delimiter and return part 0". If you want all the gory details of what the "F code" can do, see Appendix C of the UniVerse Basic manual.
To return the date string as a date, you would use a similar "F code" string in attribute 8, along with a conversion code in attribute 7:
THE.DATE
001 S
002 0
005 S
007 D2
008 F;0;(G1@1)
009 R
010 9
The time field is trickier – as mentioned above, the ADO.NET driver doesn’t like time fields with milliseconds. The best way to handle these fields is by putting the conversion code in attribute 8, but that’s where we need to put the correlative, and (as far as I know) you can’t put both a conversion code and a correlative in the same dictionary attribute. You can, however, combine 2 extract commands in a correlative:
THE.TIME 001 S 002 0 005 S 007 MTS 008 F;0;(G2@1);(G0.1) 009 R 010 12
This correlative tells the database to extract the time field from the key and then, having done that, extract the part before the decimal point. The conversion code in attribute 7 then converts that value to a .Net TimeSpan. (The Universe Basic manual refers to this as "reverse Polish format (Lukasiewicz)" — how geeky is that?)
So, there you have it: an overview of the tricks that I’ve used to whip our UniVerse data into a form suitable for consumption by .Net.
If you are new to UniVerse, you are likely thinking at this point: "um, so how do I edit a dictionary". As a crash course on UniVerse for Windows programmers, I would strongly recommend the "Learner Pack" put together by the U2 User Group. The UniVerse manuals are a well written, exhaustive reference when you are ready to dive deeper, but as a crash couse you can’t do better than the U2UG Learner Pack.
In third and final part of this series, I’ll show you the (relatively simple) coding required to load UniVerse data into a DataGridView using ADO.Net.