Forums before death by AOL, social media and spammers... "We can't have nice things"
|    comp.databases.ms-sqlserver    |    Notorious Rube Goldberg contraption    |    19,505 messages    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
|    Message 18,748 of 19,505    |
|    rja.carnegie@gmail.com to All    |
|    Re: table design for massive GPS data    |
|    14 Sep 12 05:05:05    |
      Off the top of a semi-qualified head:              If the table is indexed well, then query performance will depend on the       volume of data in a query, but not so much on the size of a table and the       rows that you aren't looking at.              If the typical query is all the records for one vehicle on one calendar day       (one working day), then a primary key of date, vehicle id, time, looks       possible. If you want to count work after midnight for the previous day,       that could be messier - you could think about using datetimeoffset.              I expect such a design also to be efficient when you are deleting       records that are 5 years old every day.              If you use datetime order, then vehicle, then the server has to scan       records of 1000 vehicles to get the data that you actually want to see.       On the other hand, if you want the query for 1000 vehicles, then       it's probably okay if they're mixed together.              If data arrives in real time, then a clustered index of date, vehicle, time,       may slow down updates, while favouring the query. Usually you want it       that way around. The clustered index makes SQL Server physically store       neighbouring records together. So you'll have 1000 distinct points in       the data file where each vehicle's records are being added. You'd also       see data filling pages that are then split into two, a lot. But I expect       the server still to perform well in that case. There also may be an       implication of the growth of the transaction log, but if data is inserted       one row at a time then it won't make much of a difference - the log ray be       pretty big whatever you do.              Your primary key is of course unique and doesn't allow NULL and by       default is a clustered index, but none of this is necessary,       including even having a primary key as such. Indexes that aren't       the primary key will work just as well. The clustered index does       matter; it's also desirable for its key to be short, since it's       incorporated into any nonclustered indexes, so if there isn't a need       to use the clustered index to reorganise the data then a clustered       index on an IDENTITY(int, 1, 1) column can be pretty good.       /Then/ you set your nonclustered primary key (which does have       to be unique and not NULL). Including a non-meaningful IDENTITY       in a primary key wouldn't be right, but it's right if it is,       in fact, the key of your table in relationships with other data.              The point is, if your data is not unique in the date, time, and       vehicle ID fields, then don't get sidetracked into worrying about       a primary key that you can't have. Alternatively, add 1 second       to one of the duplicate key values, and it'll be unique.              Primary key also is a constraint that must have a unique object name       (PK_nameoftable probably), whereas you can and I do name other indexes       IX_1, IX_2, IX_3, on every table in your database. This doesn't       help other developers understand the database, but other developers       don't often help me, so if they want to know what my indexes are,       they can look 'em up. I probably have 500 nearly identical tables       created at the same time. (Arguably I should have one table with       500 times as many rows and a couple of extra columns, but that isn't       how we do things here.) So I don't want to invent 500 meaningful       index names.              By the way, this project sounds important enough that you'll be       thinking carefully about data backup and recovering from disaster       or downtime, if you don't already have that covered thoroughly       for your platform. So, your thinking about backup and recovery       also may bear on your database design. For instance, do you       want that big, fat transaction log that holds everything that       happened in your database between the last backup and the failure -       or do you have files of the last 7 days' GPS data that you can       re-load as part of your disaster recovery process, so that it       doesn't matter if the database is blown away except for the       twice-weekly taken-off-site backup?              --- SoupGate-Win32 v1.05        * Origin: you cannot sedate... all the things you hate (1:229/2)    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
(c) 1994, bbs@darkrealms.ca