From: kegs@provalid.com   
      
   In article <6d32f8c2-9de2-49f2-86c9-96e76d2ff491n@googlegroups.com>,   
   Anthony Ortiz wrote:   
   >> The 65C832 as proposed is basically a 32 bit version of the 65C816. In   
   >> order to implement it you’ll need to make some decisions that WDC never   
   got   
   >> around to:   
   >> * opcode and byte count for XFE to switch between bit modes;   
   >> * how to handle XBA in 32 bit mode (swap the top and bottom 16 bit groups,   
   >> or bytes 1 and 0 like in 16 bit mode)   
   >> * whether to clear or preserve the top 16 bits of the A, X, and Y   
   >> registers when switching between 32 bit and 16 bit modes;   
   >> * register transfer ops in 32 bit mode (TDC, TSC, TXA, TYA clear top 16   
   >> bits of C?); and   
   >> * probably other things I haven’t thought of.   
   >>   
   >>   
   >> In choosing to emulate a 65C832 you limit yourself to   
   >> * 8 bit data bus (4 memory cycles to load a 32 bit register)   
   >> * 24 bit program address space (16Mb limit)   
   >> * 24 bit data address space (unless you pretend it is an ASIC version with   
   >> 32 bit data address space)   
   >>   
   >> You’ll also need to develop a new software development tool chain for this   
   >> ‘preliminary’ processor.   
   >>   
   >> By comparison, if you chose an ARM coprocessor you’d have the 32 bit   
   >> address space and tool chain ready to go.   
   >   
   >This is what I don't understand... I'm talking about a spiritual   
   >successor to the 65C816 that looks like a duck, walks like a duck, and   
   >quacks like a duck... unlike what the 65C832 would be to the 65C816 as   
   >that is to the 65C02 as that is to the 6502, the ARM has no resemblance   
   >whatsoever to the 6502 line despite it having been the inspiration for   
   >the ARM; you might as well put an Intel inside and program a new GS/OS   
   >in x86 and run it and claim it's an Apple IIgs, but it's not, you can't   
   >leverage any existing software, not even a single instruction, so it   
   >doesn't make any sense in an Apple II. With the 65C832 you'd be able to   
   >leverage what's already out there, and any assemblers and compilers   
   >would simply need to be extended, not replaced. What I'm saying is that   
   >I think we're at the point where we can create a much faster Apple II   
   >accelerator (via FPGA or emulation as I'm doing on my Pi) so we can   
   >achieve that 1ghz GS/OS , and while we're at it maybe we can add some   
   >things that we've always wanted in the process, like 32-bitness or some   
   >badly-needed instructions.   
   >   
   >Also I'm not stuck on the 65C832, right now this is all just talk, just   
   >trying to see what the veterans here think the successor should look   
   >like if one had been made for the 32-bit world, just a bunch of   
   >locker-room talk for now. I'll be happy just to get this 1ghz 6502   
   >going, lol!   
      
   As for what to shoot for: it will not be easy to make an FPGA processor   
   which is faster than software emulation. Software can emulate a 65816   
   at an effective speed of 1GHz already, which is actually much faster   
   than the speed of a real 65816 running at 1GHz. This works out to about   
   300 million instructions per second (since 65816 instruction average a   
   little over 3 clocks each). FPGAs at reasonable prices are basically   
   limited to around 300-350MHz clock speeds. A complex FPGA design which   
   executed one 65816 instruction every clock cycle would just about match   
   the speed of emulation on today's CPUs. But since accessing all memory   
   couldn't sustain that 350MHz speed, it's effective rate will be lower   
   (think caches and cache misses).   
      
   So if not the fastest experience, what do you want?   
      
   Theorizing about CPU designs can be fun, but a 65832 has a lot of   
   headwinds against it. A lot of software is needed to get anywhere   
   (assemblers, compilers, disassemblers, etc. etc.). There are many ways   
   to add 32-bit support, so there are a lot of choices to be made, where   
   easy would be in direct violation of making it run fast. To have any   
   kind of speed, it will need to run one instruction per cycle (or more!),   
   which means a new mode (since 6502/65816 compatibility needs to keep the   
   byte fetches). One approach is WDM is a prefix for existing   
   instructions, and changes how they work--WDM STA could always write 32   
   bits, for example, and WDM BNE could use 32-bit (or 16-bit)   
   displacements. But what should WDM CLC do? This is where new   
   operations can be added. The 65816 makes some mistakes (like SEP #$20;   
   STA; REP #$30 to do a store of one byte), which would be nice to fix in   
   some way. Another approach is WDM REP #$30 enters 32-bit mode, and then   
   you just widen all the existing instructions to work on 32-bit data.   
   But this can be harder to make fast. So, if you create a new   
   instruction set, then you've got write a lot of software to support this   
   (compilers, assemblers), plus then write software which takes advantage   
   of it. I think that's what the previous poster was saying: if WDM was a   
   switch to an ARM instruction set, then you get a whole lot of support   
   for the software needed.   
      
   Kent   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|