In his talk http://www.youtube.com/watch?v=HW9AWBFH1sA#t=3m01s Michael Steil cla...

pvg · on Jan 4, 2011

I'll take a look at that, thanks. The typical Z80 had ran 4 times the clock rate at the 6502. Both CPUs had their pluses and minuses and I think for many practical purposes were roughly comparable. The design differences and choices are certainly interesting. I don't think the OP was right to claim the 6502 'ran rings' around the Z80 and that remains the case, after all the comments I got.

rbanffy · on Jan 4, 2011

> while the Z80 had more registers

To somewhat counter that, the 6502 could read and write to the first 256 bytes of memory with shorter instructions. The 65816 expanded that idea to allow you to do that to any place in memory.

leoc · on Jan 4, 2011

The Western Design Centre actually describes the 65xx as an addressible register architecture http://www.westerndesigncenter.com/wdc/Presentation_Artwork/... [zipped PPT], and by that I assume they're describing the zero-page addresses as registers.

mgedmin · on Jan 4, 2011

Specifically, the talk mentions that the 6502 was pipelined and could do many instructions in 2 clock cycles, while the Z80 needed at least 4.

This translates to 2x the speed at the same clock rate.

Flow · on Jan 4, 2011

The 6510(6502 plus IO-ports) could do no such thing. Fastest instructions used 2 cycles and 1 byte op-code.

rbanffy · on Jan 4, 2011

You use two cycles, but the 6502 could execute the instruction while fetching the next one.

Flow · on Jan 4, 2011

That's not the pattern I see when looking at the op-code/cycle chart. I recently implemented part of a C64-emulator in JavaScript and it seems very much like every step takes a cycle.

For example the instructions NOP(or CLI, STI, INX etc), 1 byte, 2 cycles. 1 cycle for fetching the instruction and one for executing the fetched instruction.

LDA addr,x seems to be pipelined a bit though. It's "AD lo hi" in memory and takes 4 cycles unless lo+x > 255, then it takes 5 cycles. The lo+x calculation seems to occur while hi is being read.

rbanffy · on Jan 4, 2011

I will have to dig up my 6502 documentation, but, IIRC, by the time the processor executed the NOP (CLI, INX etc) it already fetched the next instruction, so, if it's another NOP, it will complete in one cycle instead of two. Unless you crossed a page boundary, which implies a one-cycle penalty.

Flow · on Jan 4, 2011

I see, but that's not how it worked on the C64 at least. I did some raster-programming and counted cycles a lot.

rbanffy · on Jan 4, 2011

Since I never wrote timing-critical code for the 6502 (apart from "make it as fast as possible") I cannot recall many specifics. Since you did, you certainly have a better understanding of how it worked.

I am restoring a 65c02-based //e clone, so, I may be able to properly measure instruction timings, but I won't hold my breath.

leoc · on Jan 4, 2011

It seems that all the mysteries of 6502 timing have been revealed thanks to the Visual 6502 project http://www.youtube.com/watch?v=H_15RtVbqGU#t=5m33s http://www.visual6502.org/ .

Flow · on Jan 4, 2011

Ye well, it could be C64-specific quirks since it shared the bus with the graphics hardware.

Sounds like a fun project.

exception · on Jan 4, 2011

Yes you are right, the instruction timings were very exact as far as I remember. The only cases where there was an option was in the case of a branch taken or not.