... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"
comp.databases.ms-sqlserver
Notorious Rube Goldberg contraption
19,505 messages
[ << oldest | < older | list | newer > | newest >> ]
Message 18,745 of 19,505
migurus to All
table design for massive GPS data
13 Sep 12 14:47:42
   From: migurus@yahoo.com   
      
   Windows Server/SQL Server 2008, we need to store GPS readings from roughly   
   1000 vehicles, ~ 300 readings per vehicle per day. The data comes in in   
   real-time. Records are never updated after insert. That gives us roughly 120   
   million records per year. We    
   need to keep last 5 years of records.    
      
   The usage of stored data is to query vehicle positions and event attributes   
   within entire day by given vehicle and date. Geospatial features and functions   
   are not really needed for the reporting.   
      
   I'd like to discuss table design for these requirements.   
      
   Here is a preliminary record content:   
   - vehicle id   
   - gps reading X and Y   
   - gps reading time   
   - one-letter event attribute   
   - event id (nullable)   
      
   First idea is to keep data normalized in one table, one record per GPS   
   reading. It looks very simple to implement, but I doubt the server will be   
   happy with this high number of small records. I we were to do this, what would   
   be a good candidate for    
   primary key here?   
   Just for illustration I show table def for idea# 1   
   create table idea1 (   
   	VEHICLE_ID	int	not null   
   ,	GPS_X		real	not null   
   ,	GPS_Y		real	not null   
   ,	GPS_TIME	datetime not null   
   ,	EVENT_LETTER	char	not null   
   ,	EVENT_ID	int	null   
   );   
   I don't show indexes as I am not clear what to use.   
   The typical query would look like this:   
   select   
   	GPS_X	   
   ,	GPS_Y	   
   ,	GPS_TIME   
   ,	EVENT_LETTER   
   ,	EVENT_ID   
   from   
   	idea1   
   where   
   	VEHICLE_ID=@id   
   and	GPS_TIME between @from and @to   
      
   	   
   Second idea is to keep a small table where records are inserted as they come   
   in the structure shown above and then on the daily basis aggregate it and   
   store as one record per vehicle in the another table, which will be used for   
   queries. By doing this we    
   shrink number of records from 120 million to be ~ 350 thousands per year. This   
   looks much more manageable. The cons here are additional effort to come up   
   with the way of storing and retrieving gps readings as a blob.   
   So, the table would look:   
   create table idea2 (   
   	VEHICLE_ID	int	not null   
   ,	RUN_DATE	date	not null   
   ,	GPS_READINGS	varbinary or whatever appropriate to keep daily portion of   
   readings   
   -- we maight need reading_count as int here to help extract them from blob   
   );   
   And typical query would look like:   
   select   
   	GPS_READINGS   
   from   
   	idea2   
   where   
   	VEHICLE_ID=@id   
   and	RUN_DATE=@date   
   This result blob will need to be 'expanded', we might use sp to do it.   
      
   Any comments are welcome, critique is appreciated.   
   Thanks in advance.	   
   migurus.   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)
[ << oldest | < older | list | newer > | newest >> ]