home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   comp.databases.ms-sqlserver      Notorious Rube Goldberg contraption      19,505 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 17,612 of 19,505   
   Erland Sommarskog to Dave   
   Re: foreign key references only part of    
   10 Aug 09 21:32:19   
   
   2d3c8d92   
   From: esquel@sommarskog.se   
      
   Dave (djohannsen2@gmail.com) writes:   
   > ok here you go, lots of details.  i don't have access to   
   > ABC_Staging.dbo.Stage_ABC_ContactKey until tomorrow but i can recall   
   > it only had one index on it, its PK on NSO_IndividualID, and C.NSO_ID.   
   >   
   > you can see the offending tables ABC_Person and ABC_Organization each   
   > only have 1/2 of the fields of the Stage_ABC_ContactKey PK.   
   >   
   > SELECT DISTINCT   
   >        B.ABC_PersonID,   
   >        C.ABC_OrganizationID,   
   >        GETDATE() AS ABC_UploadedDate,   
   >        NULL AS ABC_ChangeDate,   
   >        1 AS IsActive   
   >   FROM ABC_Staging.dbo.Stage_ABC_ContactKey AS A WITH (NOLOCK)   
   >   JOIN dbo.ABC_Person AS B WITH (NOLOCK)   
   >     ON A.NSO_IndividualID = B.NSO_IndividualID   
   >   JOIN dbo.ABC_Organization AS C WITH (NOLOCK)   
   >     ON C.NSO_ID = A.NSO_ID;   
      
   The query was a bit like I suspected, just the other way round:   
   you don't retrieve any columns from Stage_ABC_ContactKey.   
      
   Before I go on, I will make the assumption that ABC_Person and   
   ABC_Organization are fairly small tables.   
      
   Let's first look at this part:   
      
     SELECT DISTINCT   
            B.ABC_PersonID,   
   --         C.ABC_OrganizationID,   
            GETDATE() AS ABC_UploadedDate,   
            NULL AS ABC_ChangeDate,   
            1 AS IsActive   
       FROM ABC_Staging.dbo.Stage_ABC_ContactKey AS A   
       JOIN dbo.ABC_Person AS B   
         ON A.NSO_IndividualID = B.NSO_IndividualID   
      
   This query is good so far that the join is over the first column of   
   the index of the big table. But you are retrieving all rows for the   
   persons, when only one per person would do and then you use DISTINCT   
   to weed out the duplicates. So if you wrote this as:   
      
     SELECT B.ABC_PersonID,   
   --         C.ABC_OrganizationID,   
            GETDATE() AS ABC_UploadedDate,   
            NULL AS ABC_ChangeDate,   
            1 AS IsActive   
     FROM   dbo.ABC_Person AS B   
     WHERE  EXISTS (SELECT *   
                    FROM   ABC_Staging.dbo.Stage_ABC_ContactKey AS A   
                    WHERE  A.NSO_IndividualID = B.NSO_IndividualID)   
      
   The query would a run a lot faster, and the PK index could be used   
   effectively. (Although depending on the cardinality of ABC_Persons   
   a non-clustered index on NSO_IndividualID alone could be needed.)   
      
   The same rewrite could be applied to the part of the query that involves   
   organisations:   
      
     SELECT --B.ABC_PersonID,   
            C.ABC_OrganizationID,   
            GETDATE() AS ABC_UploadedDate,   
            NULL AS ABC_ChangeDate,   
            1 AS IsActive   
     FROM   dbo.ABC_Organization AS C   
     WHERE  EXISTS (SELECT *   
                    FROM   ABC_Staging.dbo.Stage_ABC_ContactKey AS A   
                    WHERE  A.NSO_ID = C.NSO_ID)   
      
   But since there is no index with NSO_ID with a leading column, you   
   need to add one, and the best choice is probably a non-clustered index   
   on NSO_ID alone.)   
      
   Now you need to bring this together, and here is an interesting   
   observation: the query does in fact have a cartesian join between   
   persons and organisations. Thus:   
      
     SELECT B.ABC_PersonID,   
            C.ABC_OrganizationID,   
            GETDATE() AS ABC_UploadedDate,   
            NULL AS ABC_ChangeDate,   
            1 AS IsActive   
     FROM   dbo.ABC_Person AS B   
     CROSS  JOIN dbo.ABC_Organization AS C   
     WHERE  EXISTS (SELECT *   
                    FROM   ABC_Staging.dbo.Stage_ABC_ContactKey AS A   
                    WHERE  A.NSO_IndividualID = B.NSO_IndividualID)   
       AND  EXISTS (SELECT *   
                    FROM   ABC_Staging.dbo.Stage_ABC_ContactKey AS A   
                    WHERE  A.NSO_ID = C.NSO_ID)   
      
   Now, there is still the question whether this query actually produces   
   any useful and correct result. This is something I cannot determine,   
   sinec I know nothing of your business. The query certainly looks funny,   
   but it is not unrealistic. It seems that the query spans some person-   
   organisation mapping.   
      
   I think there is all reason to investigate whether the query is correct.   
   But if the query is correct, I don't think you should add any columns   
   or tables, only the indexes on the staging tables I suggested above.   
   And it is imperative that you get rid of the DISTINCT in favour of   
   using EXISTS.   
      
      
   --   
   Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se   
      
   Links for SQL Server Books Online:   
   SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx   
   SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx   
   SQL 2000: http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca