home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   comp.lang.c      Meh, in C you gotta define EVERYTHING      243,242 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 241,875 of 243,242   
   Michael S to David Brown   
   Re: 16:32 far pointers in OpenWatcom C/C   
   09 Nov 25 14:40:31   
   
   From: already5chosen@yahoo.com   
      
   On Sun, 9 Nov 2025 12:29:32 +0100   
   David Brown  wrote:   
      
   > On 09/11/2025 10:46, Michael S wrote:   
   > > On Sat, 8 Nov 2025 00:00:06 -0000 (UTC)   
   > > cross@spitfire.i.gajendra.net (Dan Cross) wrote:   
   > >   
   > >>   
   > >>> I'd say, if you (SOC designer) absolutely have to play these   
   > >>> games, just use Cortex-M4.   
   > >>   
   > >> Sometimes you really do need an M7 class part.   
   > >>   
   > >> 	- Dan C.   
   > >>   
   > >   
   > > Somehow I suspect that [at the same clock frequency] M4 could access   
   > > uncached memory faster that M7. May be, even significantly faster.   
   > >   
   >   
   > I suspect you would be wrong.  The M7 can do more per clock than the   
   > M4, has wider buses, and has support for direct data and instruction   
   > memories with their own dedicated buses.   
      
   If I am not mistaken, with exception of caches, M4 and M7 have   
   3 identical "fast" 32-bit busses - I+D+AHB. Plus some slower auxiliary   
   stuff.   
      
   >  I can appreciate the gut   
   > feeling that because there is the option of caching accesses, that   
   > extra functionality may slow down accesses when the cache is not   
   > used, but I don't believe that happens on the M7.  And everything   
   > other than the accesses themselves (the loads, stores, address   
   > increments, looping, etc.) can be quite a lot faster at the same   
   > clock speed.   
      
   Except that every branch mispredict is more than twice slower. I'd   
   guess that the latency of the cache/TCM *hit* is also 1 clock slower   
   that latency of internal SRAM access on M4, but absence of docs   
   prevents me from proving it.   
   As to cache miss, I am pretty sure that it completely stalls M7   
   pipeline. In case of M4, I think that after external Load pipeline makes   
   one more step before it stalls. And, of course, the stall itself is less   
   expensive.   
   Once again, I can't prove it because of absence of docs.   
      
   >   
   > But as you say, public data on timings is limited -   
      
   In case of M7, public data is not "limited", it is absent.   
   AFAIK, it's not the case for all other Cortex-M cores. Back when M7 was   
   new, Arm claimed that the data is not made available because the core   
   is more complicated that the rest of Cortex-M line. As silly as it   
   sounds they could continue to claim it with sort of straight face for as   
   long as other Cortex-M cores were, indeed, simpler. Which is not the   
   case since 2022, because Cortex M85 is no less complicated than M7 and   
   arguably even a little more so. Despite that, there exist M85 Software   
   Optimization Guide that contains instruction tables with latency and   
   throughput data. Yes, it has few omissions, but it proves that there is   
   nothing impossible in documenting cores of this level of complexity,   
   even if you as lazy as Cortex M documentation team appears to be   
   (relatively, for example, to Cortex-A/Neoverse side of the company).   
      
   > and even when the   
   > data on the core is available, timings can be very dependent on   
   > details of the implementation and connections outside the core.   
   >   
   > We could always appeal to authority - Scott's company knows what they   
   > are doing, have access to far more detailed information and technical   
   > assistance from ARM than we do, and have picked an M7 rather than an   
   > M4. But speculation is more fun :-)   
   >   
   > > Unfortunately, info about M7 instructions timing does not appear to   
   > > be public.   
   > >   
   > > If one needs something like DP floating or when uncached accesses   
   > > are only small part of the job and the rest of the load is compute   
   > > -intensive then I can see how M7 could look attractive vs M4.   
   > > But personally in such case I'd start to look for non-Cortex-M   
   > > solution. May be R4, although I don't like it. May be A5. In huge   
   > > SoCs of sort Scott is working on - A34 or even 510. Plus, another   
   > > M4 to handle more typical MCU tasks.   
   > >   
   > >   
   > >   
   > >   
   >   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca