| Subject |
Re: complex indexes vs index key fields |
| From |
Mervyn Bick <invalid@invalid.invalid> |
| Date |
Sat, 25 Jul 2020 14:05:08 +0200 |
| Newsgroups |
dbase.getting-started |
On 2020-07-24 22:59, Gaetano De Luisi wrote:
> My DBF file has the potential of getting quyite large and to reduce the processing time for queries i need a couple of complex indexes.
>
> While creating the indexes, I noticed that saving the files took a fair amount of time for my test data of 25,000 records and appending the CSV data into the DBF also took noticeably longer than without the complex indexes.
>
> For large tables, what do you think would optimize performance: use a complex index (e.g. ["SO2"+left("0000000"+siteID,13)+DTOS(eTimeStamp)], or create a calculated field in the DBF file to create a simple index on?
>
> The database could potentially have daily uploads or deletions of data and exceed 1 million records.
>
To solve the immediate problem of slow loading add REINDEX to the APPEND
command. Instead of updating the index after each record is added,
dBASE rebuilds the index after all the records have been added. It
depends on the number of records being added but where more than just a
few records are being added the deferred rebuild should speed things up.
Building an index does, however, take time. The more records, the more
time.
It depends on so many variables that the only way to decide if creating
a compound index on multiple fields is faster than creating a simple
index on a calculated field is to time both. Adding a literal value to
an index expression serves no purpose. The index is only applied to
values in exiting fields.
Creating an index is done solely in the BDE. Populating a separate
calculated field has to be done in dBASE, in conjunction with the BDE,
before the BDE can work on the index. My gut feeling is that the
compound index is going to win but I wouldn't put money on it. :-)
Your present index expression has some problems.
Unless all your siteID values are 6 characters you should use
RIGHT(replicate('0',13)+siteId,13). (I've used replicate() to avoid
having to enter 13 zeros but hard-coding a string of zeroes will work
just as well. Always provide enough zeroes so that right() has enough
characters to work with if a blank siteID manages to slip in.) If any
siteID's are less than 6 characters only providing 7 zeroes for padding
will result in not all index values being the same length. This is a
definite "NO NO" and is a recipe for disaster.
If the siteId's are all 6 characters then you don't need to left-pad the
values to order them in ascending order. You should also have a look at
the value 13. That represents a LOT of sites. Do you really anticipate
ever having so many? Any excess leading zeroes simply waste space in
the .mdx file and add to the index build time.
DTOS(eTimestamp) only extracts the date from eTimestamp. dBASE does not
provide a DTTOS() function so your index expression needs to be
right(replicate('0',13)+siteID,13)+dtos(eTimeStamp)+ttoc(eTimestamp)
With the potential for your database to grow to over a million records
you should consider changing to a SQL server now rather than later.
Unless you already have access to an existing server you should have a
good look at Firebird before you make a decision. Firebird is open
source and has one of the smallest footprints of available SQL servers.
64-bit or 32-bit version? If the machine that will host Firebird is
running 64-bit Windows (or Linux :-) ) and has plenty of memory then by
all means go for the 64-bit version. The more memory available to
Firebird, the faster it will perform. If the machine only has 4Gb (or
not much more) then it doesn't really matter which version you choose.
You can, in any event, always change later. A backup from the 32-bit
version can be restored to the 64-bit version.
Even without the need to handle large tables it is worthwhile changing
to a SQL server. It will immediately allow you to use the full power of
SQL to extract information from your data. Things that are impossible
in localSQL with .dbf tables without creating and populating extra
fields are completely straightforward.
Keeping data on a SQL server will mean a new approach to reading and
writing data but it is a fairly short, and not particularly steep,
learning curve.
Mervyn.
|
|