Forums before death by AOL, social media and spammers... "We can't have nice things"
|    comp.lang.c++.moderated    |    Moderated discussion of C++ superhackery    |    33,346 messages    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
|    Message 32,469 of 33,346    |
|    George Neuner to All    |
|    Re: Does calling an element of an array     |
|    10 Jul 12 01:28:22    |
      From: gneuner2@comcast.net              On Sat, 7 Jul 2012 13:00:28 -0700 (PDT), sizil.krishna23@gmail.com       wrote:              >I am working on code to multiply matrices of higher dimensions eg.       10000x10000.       >Code is working, but i want to reduce the time elapsed as much as possible.       >       >So i used Cannon's Matrix Multiplication algorithm, (In this algorithm       >Matrix is divided into partitions of smaller size, as 5,10,20,100 or whatever       >we want). I have a doubt that whenever i access any partition(element) of       >that 2d array, Does the whole row or column of that 2d array is fetched       >to memory or cpu ? Or only that specific partition(element) is fetched ?              The granularity of caching is the "line". The number of cache levels       and the size of their lines is CPU dependent, but L1 lines typically       are sized to hold 2 or 4 doubles on a 32-bit chip, 4 or 8 doubles on a       64-bit chip. If your chip has SIMD registers, its L1 line size is       likely to be the same as the SIMD register width.              When you reference an element which is not already in cache, the base       address of its covering cache line is determined from the element's       address and the contents of that entire line will be fetched from       memory (or from a higher level cache if possible).                     >If the whole row or column is fetched. Suggest some ways to fetch only       >that specific partition(element). Can vectors do that? Or i have to create       >N smaller 2d matrices (N = number of partitions) and copy data of matrix       >into smaller matrices.              Rearranging matrices for cache aware processing is called "tiling",       and there are numerous papers on the subject (Google is your friend).       The trick is to size and place the tiles into memory so that tiles       which are used together can be simultaneously loaded into the L1       cache. How to do this, of course, is CPU dependent so you will have       to study the architecture of your target chip.              Regardless of tiling, you can speed up processing by prefetching       values before you need them. Most compilers offer cache control       intrinsics. Prefetching also can be done semi-portably by doing a       (useless) read of an(y) element belonging to a line group that you       shortly will want to use.              Hope this helps.       George                     --        [ See http://www.gotw.ca/resources/clcm.htm for info about ]        [ comp.lang.c++.moderated. First time posters: Do this! ]              --- SoupGate-Win32 v1.05        * Origin: you cannot sedate... all the things you hate (1:229/2)    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
(c) 1994, bbs@darkrealms.ca