MEMC and RISC iX

paulb · Post by **paulb** » Thu Dec 28, 2023 11:00 pm

Previously, there was some discussion of the MEMC's virtual memory management architecture, mixed into a thread about Motorola where it was only tangentially relevant. I was thinking about this again recently, considering the remarks made by reviewers about the 32K page size and the suboptimal performance of the R-series with regard to paging.

I looked up the memory map and memory mapping sections of the MEMC datasheet, and one thing that I had either forgotten or never paid any attention to was the possibility of 4K pages with MEMC. I started out with a 1MB A3000 which used 8K pages, but I imagine that A305 users would have been aware of 4K pages with the right software (like the RISC OS Task Manager, if anyone managed to use RISC OS on a 512K system) making it apparent.

Of course, the page size is proportional to the total amount of physical memory accessible to each MEMC, this being 4M (imposed by the 20-bit RAM addressing), dividing the memory into 128 pages. And since the maximum amount of addressable physical memory is 16M, this seems to introduce the limit of four coupled MEMCs to access that amount of memory.

I did wonder whether it might have been possible to have some kind of base page for the page table in such a way that instead of the 128-entry page table covering the entirety of physical memory, there would have been a window into a region of physical memory employing a smaller page size. So, a task or process might have been able to occupy up to, say, 512K using 128 pages of 4K, this being constrained to a particular 512K area of the total memory. This might have been inflexible and introduce all sorts of page allocation issues, but it might have offered a workaround for the underlying issues of the MEMC architecture.

The VLSI Technology datasheet for the ARM chipset notes the following:

MEMC provides a descriptor entry for every page of physical memory which eliminates descriptor thrashing (address translation misses) from degrading system performance.

At this point, I wanted to look at some source code for a multitasking operating system to see what implementers actually do with MEMC. Although RISC iX sources may be out there, I remembered that NetBSD had an acorn26 port that worked on the A-series and R-series machines. In NetBSD 7.2, the usr/src/sys/arch/acorn26/acorn26/pmap.c file has a useful remark:

The page tables in the MEMC are implemented using content-addressable memory, with one cell for each physical page. This means that it's impossible to have one physical page mapped in two locations. For this reason, we try to treat the MEMC as a TLB, and to keep enough information around to quickly re-load mappings without troubling uvm_fault().

Previously, we wondered what advantages the MEMC approach might have for implementers over a simpler approach involving a TLB and a page fault handler (familiar from MIPS, Am29000, and so on). Ignoring remarks about "UNIX hackers... learning to love it", it seems like we have some kind of answer.

arg · Post by **arg** » Fri Dec 29, 2023 1:59 pm

paulb wrote: ↑Thu Dec 28, 2023 11:00 pm Previously, there was some discussion of the MEMC's virtual memory management architecture, mixed into a thread about Motorola where it was only tangentially relevant. I was thinking about this again recently, considering the remarks made by reviewers about the 32K page size and the suboptimal performance of the R-series with regard to paging.

I think that the focus on 32K page size as a topic of debate was a bit of a red herring. The real problem with R140 was not that the pages were too big, it was that there weren't enough of them. Sure, dividing the same memory into smaller pages gives you more pages, but memory was cheap enough by then that the 4Mbyte limit was also a small number. R260, still with 32K pages but more of them, worked well.

It is however curious that 4K has remained the page size of choice in most architectures right up to the present day (with superpages being more about short-cutting the multi-level page tables needed in large memory systems than being a choice of page sizes). I'm not sure how much that is historical vs 4K genuinely being considered the all-time optimal size for all workloads.

The aspects of the ARM/MEMC architecture more deserving debate are the fact that it's virtually mapped (performance benefit at cost of context-switch overhead), and the 'CAM' structure (as your references say, bad for Unix as you can't map the same page in two places).

paulb · Post by **paulb** » Fri Dec 29, 2023 9:31 pm

arg wrote: ↑Fri Dec 29, 2023 1:59 pm I think that the focus on 32K page size as a topic of debate was a bit of a red herring. The real problem with R140 was not that the pages were too big, it was that there weren't enough of them. Sure, dividing the same memory into smaller pages gives you more pages, but memory was cheap enough by then that the 4Mbyte limit was also a small number. R260, still with 32K pages but more of them, worked well.

4M was rather small for a Unix workstation, really, and I feel that the R140 was almost more of a statement of intent than a competitive product. Having the R260 arrive about a year later, running four or five times faster and having twice the memory and SCSI for the same price, even though Acorn had planned to release it at a higher price, at least brought a substantial value-for-money improvement for anyone wanting a Unix workstation from Acorn. I don't think any of the A-series machines were ever superseded in such a spectacular fashion.

If Acorn had been able to get the ARM3 ready by 1989, launching the R260 instead of the R140, they would have had a more credible competitor, with performance more in line with the DEC and Sun machines having similar pricing. Had that brought about more customer interest, it might have driven FPA development somewhat, and one can envisage (or dream about) the FPA being ready by 1990. At that point, they would have had a product that may have been merely bringing up the rear of the pack, due to the other architectural limitations, as opposed to falling off the back of the pack and consequently seeing diminishing interest from potential customers.

(I also considered what the effect on morale might have been had the R-series been more competitive, particularly amongst those doing the actual work on the product line. Bringing something to market that isn't particularly competitive can motivate engineers and developers who might feel that the successor will be better received and that more support within the organisation might be forthcoming, but I can imagine that with the FPA dragging out, no new chipset upgrades likely, and with a lot of action happening elsewhere, it might have been hard to retain Unix specialists at Acorn.)

arg wrote: ↑Fri Dec 29, 2023 1:59 pm It is however curious that 4K has remained the page size of choice in most architectures right up to the present day (with superpages being more about short-cutting the multi-level page tables needed in large memory systems than being a choice of page sizes). I'm not sure how much that is historical vs 4K genuinely being considered the all-time optimal size for all workloads.

I'm sure there is plenty of literature discussing such matters.

arg wrote: ↑Fri Dec 29, 2023 1:59 pm The aspects of the ARM/MEMC architecture more deserving debate are the fact that it's virtually mapped (performance benefit at cost of context-switch overhead), and the 'CAM' structure (as your references say, bad for Unix as you can't map the same page in two places).

With the CAM, it seems that you can't really leave entries around when switching between (unprivileged) processes, because that would obviously allow processes to accidentally see memory belonging to other processes, and there isn't an address space identifier to qualify entries, only the page protection level which distinguishes between user, supervisor and operating system modes. So, this requires unprivileged entries in the CAM to be flushed upon a context switch and repopulated for the process being resumed, either opportunistically to avoid page faults or lazily to adapt to the actual demands of the process.

Of course, there might be physical pages that are deliberately shared between processes at the same virtual address, and obviously these need not be flushed. In principle, you can have the same physical page available at different virtual addresses in different processes, but such mappings cannot exist in the CAM at the same time due to its structure, as you note. However, without any address space annotations, you wouldn't really want multiple mappings of this nature even if the CAM did support it.

Perhaps a more onerous restriction applies to cases where one might want to map a physical page to multiple virtual pages within the same process. That might sound like an esoteric need, but I can imagine situations where one might allocate a placeholder page (containing zeros, but not being a view onto /dev/zero or anything like that, or maybe having predefined contents) which then appears at various virtual addresses until such time that the page is updated. This kind of thing might be awkward with the CAM, perhaps necessitating needless allocation, rapidly using up those rather large pages.

paulb · Post by **paulb** » Sun Jan 28, 2024 11:26 pm

paulb wrote: ↑Thu Dec 28, 2023 11:00 pm In NetBSD 7.2, the usr/src/sys/arch/acorn26/acorn26/pmap.c file has a useful remark:

The page tables in the MEMC are implemented using content-addressable memory, with one cell for each physical page. This means that it's impossible to have one physical page mapped in two locations. For this reason, we try to treat the MEMC as a TLB, and to keep enough information around to quickly re-load mappings without troubling uvm_fault().

And here's a newsgroup thread with a post from Mark Taunton in a similar vein:

In fact, if look at it in a certain way, a TLB is all that MEMC1(/1a) does have! The address translation for each logical page accessed is looked up in an on-chip CAM, which is in effect a TLB (Translation Lookaside Buffer). Now the term TLB in a conventional page table system derives from the fact that it is a cache of recently used page address translations (i.e. page table entries) which gets checked on each memory access; the MMU "looks sideways" at this buffer in parallel with preparing to load up the appropriate PTE from the main page table in memory somewhere. If the translation (matched on the logical page number requested) is found in the TLB, the main memory fetch is abandoned and the cached entry gets used instead.

Plus remarks about the CAM being the page table and the MEMC limitations causing a large page size, as we already know.

SarahWalker · Post by **SarahWalker** » Mon Jan 29, 2024 7:48 am

paulb wrote: ↑Thu Dec 28, 2023 11:00 pm I looked up the memory map and memory mapping sections of the MEMC datasheet, and one thing that I had either forgotten or never paid any attention to was the possibility of 4K pages with MEMC. I started out with a 1MB A3000 which used 8K pages, but I imagine that A305 users would have been aware of 4K pages with the right software (like the RISC OS Task Manager, if anyone managed to use RISC OS on a 512K system) making it apparent.

A minor off-topic note, but A305 users still had 8k pages; on that machine only 64 physical pages are available.

paulb · Post by **paulb** » Mon Jan 29, 2024 1:02 pm

SarahWalker wrote: ↑Mon Jan 29, 2024 7:48 am A minor off-topic note, but A305 users still had 8k pages; on that machine only 64 physical pages are available.

Yes, I must have read that a few times but then forgot about it when formulating my post!

stardot.org.uk

MEMC and RISC iX

MEMC and RISC iX

Re: MEMC and RISC iX

Re: MEMC and RISC iX

Re: MEMC and RISC iX

Re: MEMC and RISC iX

Re: MEMC and RISC iX