... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"

comp.lang.c++.moderated

Moderated discussion of C++ superhackery

33,346 messages

[ << oldest | < older | list | newer > | newest >> ]

Message 32,469 of 33,346

George Neuner to All

Re: Does calling an element of an array

10 Jul 12 01:28:22

   From: gneuner2@comcast.net   

   On Sat,  7 Jul 2012 13:00:28 -0700 (PDT), sizil.krishna23@gmail.com   
   wrote:   

   >I am working on code to multiply matrices of higher dimensions eg.   
   10000x10000.   
   >Code is working, but i want to reduce the time elapsed as much as possible.   
   >   
   >So i used Cannon's Matrix Multiplication algorithm, (In this algorithm   
   >Matrix is divided into partitions of smaller size, as 5,10,20,100 or whatever   
   >we want). I have a doubt that whenever i access any partition(element) of   
   >that 2d array, Does the whole row or column of that 2d array is fetched   
   >to memory or cpu ? Or only that specific partition(element) is fetched ?   

   The granularity of caching is the "line".  The number of cache levels   
   and the size of their lines is CPU dependent, but L1 lines typically   
   are sized to hold 2 or 4 doubles on a 32-bit chip, 4 or 8 doubles on a   
   64-bit chip.  If your chip has SIMD registers, its L1 line size is   
   likely to be the same as the SIMD register width.   

   When you reference an element which is not already in cache, the base   
   address of its covering cache line is determined from the element's   
   address and the contents of that entire line will be fetched from   
   memory (or from a higher level cache if possible).   

   >If the whole row or column is fetched. Suggest some ways to fetch only   
   >that specific partition(element). Can vectors do that? Or i have to create   
   >N smaller 2d matrices (N = number of partitions) and copy data of matrix   
   >into smaller matrices.   

   Rearranging matrices for cache aware processing is called "tiling",   
   and there are numerous papers on the subject (Google is your friend).   
   The trick is to size and place the tiles into memory so that tiles   
   which are used together can be simultaneously loaded into the L1   
   cache.  How to do this, of course, is CPU dependent so you will have   
   to study the architecture of your target chip.   

   Regardless of tiling, you can speed up processing by prefetching   
   values before you need them.  Most compilers offer cache control   
   intrinsics.  Prefetching also can be done semi-portably by doing a   
   (useless) read of an(y) element belonging to a line group that you   
   shortly will want to use.   

   Hope this helps.   
   George   

   --   
         [ See http://www.gotw.ca/resources/clcm.htm for info about ]   
         [ comp.lang.c++.moderated.    First time posters: Do this! ]   

   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)

[ << oldest | < older | list | newer > | newest >> ]