... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"
comp.databases.ms-sqlserver
Notorious Rube Goldberg contraption
19,505 messages
[ << oldest | < older | list | newer > | newest >> ]
Message 18,748 of 19,505
rja.carnegie@gmail.com to All
Re: table design for massive GPS data
14 Sep 12 05:05:05
   Off the top of a semi-qualified head:   
      
   If the table is indexed well, then query performance will depend on the   
   volume of data in a query, but not so much on the size of a table and the   
   rows that you aren't looking at.   
      
   If the typical query is all the records for one vehicle on one calendar day   
   (one working day), then a primary key of date, vehicle id, time, looks   
   possible.  If you want to count work after midnight for the previous day,   
   that could be messier - you could think about using datetimeoffset.   
      
   I expect such a design also to be efficient when you are deleting   
   records that are 5 years old every day.   
      
   If you use datetime order, then vehicle, then the server has to scan   
   records of 1000 vehicles to get the data that you actually want to see.   
   On the other hand, if you want the query for 1000 vehicles, then   
   it's probably okay if they're mixed together.   
      
   If data arrives in real time, then a clustered index of date, vehicle, time,   
   may slow down updates, while favouring the query.  Usually you want it   
   that way around.  The clustered index makes SQL Server physically store   
   neighbouring records together.  So you'll have 1000 distinct points in   
   the data file where each vehicle's records are being added.  You'd also   
   see data filling pages that are then split into two, a lot.  But I expect   
   the server still to perform well in that case.  There also may be an   
   implication of the growth of the transaction log, but if data is inserted   
   one row at a time then it won't make much of a difference - the log ray be   
   pretty big whatever you do.   
      
   Your primary key is of course unique and doesn't allow NULL and by   
   default is a clustered index, but none of this is necessary,   
   including even having a primary key as such.  Indexes that aren't   
   the primary key will work just as well.  The clustered index does   
   matter; it's also desirable for its key to be short, since it's   
   incorporated into any nonclustered indexes, so if there isn't a need   
   to use the clustered index to reorganise the data then a clustered   
   index on an IDENTITY(int, 1, 1) column can be pretty good.   
   /Then/ you set your nonclustered primary key (which does have   
   to be unique and not NULL).  Including a non-meaningful IDENTITY   
   in a primary key wouldn't be right, but it's right if it is,   
   in fact, the key of your table in relationships with other data.   
      
   The point is, if your data is not unique in the date, time, and   
   vehicle ID fields, then don't get sidetracked into worrying about   
   a primary key that you can't have.  Alternatively, add 1 second   
   to one of the duplicate key values, and it'll be unique.   
      
   Primary key also is a constraint that must have a unique object name   
   (PK_nameoftable probably), whereas you can and I do name other indexes   
   IX_1, IX_2, IX_3, on every table in your database.  This doesn't   
   help other developers understand the database, but other developers   
   don't often help me, so if they want to know what my indexes are,   
   they can look 'em up.  I probably have 500 nearly identical tables   
   created at the same time.  (Arguably I should have one table with   
   500 times as many rows and a couple of extra columns, but that isn't   
   how we do things here.)  So I don't want to invent 500 meaningful   
   index names.   
      
   By the way, this project sounds important enough that you'll be   
   thinking carefully about data backup and recovering from disaster   
   or downtime, if you don't already have that covered thoroughly   
   for your platform.  So, your thinking about backup and recovery   
   also may bear on your database design.  For instance, do you   
   want that big, fat transaction log that holds everything that   
   happened in your database between the last backup and the failure -   
   or do you have files of the last 7 days' GPS data that you can   
   re-load as part of your disaster recovery process, so that it   
   doesn't matter if the database is blown away except for the   
   twice-weekly taken-off-site backup?   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)
[ << oldest | < older | list | newer > | newest >> ]