Web-News v.1.6.2

Subject	Re: complex indexes vs index key fields
From	Mervyn Bick <invalid@invalid.invalid>
Date	Sat, 25 Jul 2020 14:05:08 +0200
Newsgroups	dbase.getting-started
On 2020-07-24 22:59, Gaetano De Luisi wrote: > My DBF file has the potential of getting quyite large and to reduce the processing time for queries i need a couple of complex indexes. > > While creating the indexes, I noticed that saving the files took a fair amount of time for my test data of 25,000 records and appending the CSV data into the DBF also took noticeably longer than without the complex indexes. > > For large tables, what do you think would optimize performance: use a complex index (e.g. ["SO2"+left("0000000"+siteID,13)+DTOS(eTimeStamp)], or create a calculated field in the DBF file to create a simple index on? > > The database could potentially have daily uploads or deletions of data and exceed 1 million records. > To solve the immediate problem of slow loading add REINDEX to the APPEND command. Instead of updating the index after each record is added, dBASE rebuilds the index after all the records have been added. It depends on the number of records being added but where more than just a few records are being added the deferred rebuild should speed things up. Building an index does, however, take time. The more records, the more time. It depends on so many variables that the only way to decide if creating a compound index on multiple fields is faster than creating a simple index on a calculated field is to time both. Adding a literal value to an index expression serves no purpose. The index is only applied to values in exiting fields. Creating an index is done solely in the BDE. Populating a separate calculated field has to be done in dBASE, in conjunction with the BDE, before the BDE can work on the index. My gut feeling is that the compound index is going to win but I wouldn't put money on it. :-) Your present index expression has some problems. Unless all your siteID values are 6 characters you should use RIGHT(replicate('0',13)+siteId,13). (I've used replicate() to avoid having to enter 13 zeros but hard-coding a string of zeroes will work just as well. Always provide enough zeroes so that right() has enough characters to work with if a blank siteID manages to slip in.) If any siteID's are less than 6 characters only providing 7 zeroes for padding will result in not all index values being the same length. This is a definite "NO NO" and is a recipe for disaster. If the siteId's are all 6 characters then you don't need to left-pad the values to order them in ascending order. You should also have a look at the value 13. That represents a LOT of sites. Do you really anticipate ever having so many? Any excess leading zeroes simply waste space in the .mdx file and add to the index build time. DTOS(eTimestamp) only extracts the date from eTimestamp. dBASE does not provide a DTTOS() function so your index expression needs to be right(replicate('0',13)+siteID,13)+dtos(eTimeStamp)+ttoc(eTimestamp) With the potential for your database to grow to over a million records you should consider changing to a SQL server now rather than later. Unless you already have access to an existing server you should have a good look at Firebird before you make a decision. Firebird is open source and has one of the smallest footprints of available SQL servers. 64-bit or 32-bit version? If the machine that will host Firebird is running 64-bit Windows (or Linux :-) ) and has plenty of memory then by all means go for the 64-bit version. The more memory available to Firebird, the faster it will perform. If the machine only has 4Gb (or not much more) then it doesn't really matter which version you choose. You can, in any event, always change later. A backup from the 32-bit version can be restored to the 64-bit version. Even without the need to handle large tables it is worthwhile changing to a SQL server. It will immediately allow you to use the full power of SQL to extract information from your data. Things that are impossible in localSQL with .dbf tables without creating and populating extra fields are completely straightforward. Keeping data on a SQL server will mean a new approach to reading and writing data but it is a fairly short, and not particularly steep, learning curve. Mervyn.