... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"
comp.databases.ms-sqlserver
Notorious Rube Goldberg contraption
19,505 messages
[ << oldest | < older | list | newer > | newest >> ]
Message 18,750 of 19,505
migurus to rja.ca...@gmail.com
Re: table design for massive GPS data
14 Sep 12 10:25:00
   From: migurus@yahoo.com   
      
   On Friday, September 14, 2012 5:05:05 AM UTC-7, rja.ca...@gmail.com wrote:   
   > Off the top of a semi-qualified head:   
   >   
   >   
   >   
   > If the table is indexed well, then query performance will depend on the   
   >   
   > volume of data in a query, but not so much on the size of a table and the   
   >   
   > rows that you aren't looking at.   
   >   
   >   
      
   Thanks, I had an impression that hundreds of millions of small records is not   
   a sound design.   
      
   >   
   > If the typical query is all the records for one vehicle on one calendar day   
   >   
   > (one working day), then a primary key of date, vehicle id, time, looks   
   >   
   > possible.  If you want to count work after midnight for the previous day,   
   >   
   > that could be messier - you could think about using datetimeoffset.   
   >   
   >   
   >   
   > I expect such a design also to be efficient when you are deleting   
   >   
   > records that are 5 years old every day.   
   >   
   >   
   >   
   > If you use datetime order, then vehicle, then the server has to scan   
   >   
   > records of 1000 vehicles to get the data that you actually want to see.   
   >   
   > On the other hand, if you want the query for 1000 vehicles, then   
   >   
   > it's probably okay if they're mixed together.   
   >   
      
   One vehicle at a time queries required.   
      
   >   
   >   
   > If data arrives in real time, then a clustered index of date, vehicle, time,   
   >   
   > may slow down updates, while favouring the query.  Usually you want it   
   >   
   > that way around.  The clustered index makes SQL Server physically store   
   >   
   > neighbouring records together.  So you'll have 1000 distinct points in   
   >   
   > the data file where each vehicle's records are being added.  You'd also   
   >   
   > see data filling pages that are then split into two, a lot.  But I expect   
   >   
   > the server still to perform well in that case.  There also may be an   
   >   
   > implication of the growth of the transaction log, but if data is inserted   
   >   
   > one row at a time then it won't make much of a difference - the log ray be   
   >   
   > pretty big whatever you do.   
   >   
      
   There is no updates, as I mentioned. Inserts only.   
   I love your idea of building three part index: date, vehicle_id, time.   
   This is right to the point and not a big overhead space wise.   
      
   >   
   >   
   > Your primary key is of course unique and doesn't allow NULL and by   
   >   
   > default is a clustered index, but none of this is necessary,   
   >   
   > including even having a primary key as such.  Indexes that aren't   
   >   
   > the primary key will work just as well.  The clustered index does   
   >   
   > matter; it's also desirable for its key to be short, since it's   
   >   
   > incorporated into any nonclustered indexes, so if there isn't a need   
   >   
   > to use the clustered index to reorganise the data then a clustered   
   >   
   > index on an IDENTITY(int, 1, 1) column can be pretty good.   
   >   
   > /Then/ you set your nonclustered primary key (which does have   
   >   
   > to be unique and not NULL).  Including a non-meaningful IDENTITY   
   >   
   > in a primary key wouldn't be right, but it's right if it is,   
   >   
   > in fact, the key of your table in relationships with other data.   
   >   
   >   
   >   
   > The point is, if your data is not unique in the date, time, and   
   >   
   > vehicle ID fields, then don't get sidetracked into worrying about   
   >   
   > a primary key that you can't have.  Alternatively, add 1 second   
   >   
   > to one of the duplicate key values, and it'll be unique.   
   >   
   >   
   >   
   > Primary key also is a constraint that must have a unique object name   
   >   
   > (PK_nameoftable probably), whereas you can and I do name other indexes   
   >   
   > IX_1, IX_2, IX_3, on every table in your database.  This doesn't   
   >   
   > help other developers understand the database, but other developers   
   >   
   > don't often help me, so if they want to know what my indexes are,   
   >   
   > they can look 'em up.  I probably have 500 nearly identical tables   
   >   
   > created at the same time.  (Arguably I should have one table with   
   >   
   > 500 times as many rows and a couple of extra columns, but that isn't   
   >   
   > how we do things here.)  So I don't want to invent 500 meaningful   
   >   
   > index names.   
   >   
   >   
   >   
   > By the way, this project sounds important enough that you'll be   
   >   
   > thinking carefully about data backup and recovering from disaster   
   >   
   > or downtime, if you don't already have that covered thoroughly   
   >   
   > for your platform.  So, your thinking about backup and recovery   
   >   
   > also may bear on your database design.  For instance, do you   
   >   
   > want that big, fat transaction log that holds everything that   
   >   
   > happened in your database between the last backup and the failure -   
   >   
   > or do you have files of the last 7 days' GPS data that you can   
   >   
   > re-load as part of your disaster recovery process, so that it   
   >   
   > doesn't matter if the database is blown away except for the   
   >   
   > twice-weekly taken-off-site backup?   
      
   I was thinking about an hourly log shipping and daily or weekly full backups.   
   Would really appreciate a comment on this idea.   
      
   Thanks for your thoughts.   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)
[ << oldest | < older | list | newer > | newest >> ]