... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"
comp.databases.ms-sqlserver
Notorious Rube Goldberg contraption
19,505 messages
[ << oldest | < older | list | newer > | newest >> ]
Message 18,563 of 19,505
Bob Barrows to All
Re: Break Up Large Table Query Into Resu
01 Feb 12 15:36:07
   463d59b2   
   From: reb01501@NOyahooSPAM.com   
      
   pbd22 wrote:   
   > On Jan 31, 5:03 pm, Erland Sommarskog  wrote:   
   >> pbd22 (dush...@gmail.com) writes:   
   >>> On Jan 31, 9:57 am, pbd22  wrote:   
   >>>> I am working in SQL Server 2005 and want to break up a table of 1M   
   >>>> rows into distinct results of 80,0000.   
   >>   
   > Thanks both a bundle for your responses. Looks like there is life on   
   > planet   
   > Google Groups after all, much appreciated!   
   >   
   > The specifics is that we are doing email deployments but google is   
   > moving   
   > all of the email sent to gmail users to their spam boxes. As a result,   
   > we have   
   > to "chunk" the gmail users out of the total amount and send in   
   > manageable   
   > batches. We have figured that 80,000 per batch out of the total gmail   
   > users   
   > in the table is possible.   
   >   
   > And, Erland, to answer your question, I would prefer to turn this into   
   > a stored   
   > procedure we can use for the purpose of "chunking" email addresses.   
   > Accordingly,   
   > the number per batch should be a variable parameter.   
   >   
   > Bob, the table we are querying against is pretty simple. Essentially,   
   > it has one   
   > one column - "email_address" which is a varchar. Its data is about 1   
   > million email   
   > addresses (but that number changes often). The result set table(s)   
   > should only have   
   > two columns, the count (INT) and the email_address (varchar). Please   
   > see below.   
   >   
   > The query I am trying to write is supposed to dump each result set   
   > (batch)   
   > to a text file in some folder on the hard drive. Each result set   
   > should have   
   > a count and the email addresses as columns. Something like this:   
   >   
   > RESULT 1:   
   >   
   > [COUNT] [EMAIL_ADDRESS]   
   >  1              name@gmail.com   
   >  2              name@gmail.com   
      
   > And so on up to the to total amount of the gmail addresses out of the   
   > original table.   
   >   
   > So, the statement should read something like this (pseudo code):   
   >   
   > select all distinct users   
      
   You keep making a point of saying "distinct". Does that imply that there are   
   duplicate email addresses in that 1-million row table?   
      
      
   > from the master table   
   > where email_address like '%gmail.com'   
   > return in batches of N (such as 80,000)   
   > and write each batch to a text file on the   
   > hard drive.   
      
   I assume you can handle this part, correct? It's only batching the data that   
   you need help with?   
      
   >   
   > I hope I have explained myself well. Let me know if anything is   
   > unclear.   
   >   
   > Thanks a bundle for your help.   
   SQL 2008 has some paging functionality builtin but you're using SQL 2005.   
   I can think of a couple approaches. Here's one:   
   1. Create a temp table (#batches) with an identity column (indentcol) and an   
   email column (email). Insert the distinct email addresses into it:   
   insert #batches (email)   
   select distinct email_address from master_table where email_address like   
   '%gmail.com'   
      
   Then use a WHILE loop to retrieve the batches, using a variable to keep   
   track of them.   
   declare @batchsize int --convert this to a parameter for your sproc   
   set @batchsize=80000   
      
   declare @lastrec int, @endrec int   
   set @lastrec=(select max(identcol) from #batches)   
   set @endrec=@batchsize   
   WHILE @endrec-@batchsize<=@lastrec   
   BEGIN   
       select email from #batches where identcol >=@lastrec   
       --process the batch   
       delete #batches  where identcol >=@lastrec   
       set @endrec=@endrec + @batchsize   
   END   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)
[ << oldest | < older | list | newer > | newest >> ]