RamseyConfig released!

SpeedGeek

Member
AmiBayer
Joined
Jan 21, 2011
Posts
934
Country
USA
Region
Wisconsin
RamseyConfig 1.1 ©SpeedGeek 2015

INTRODUCTION:
RamseyConfig allows you to view and change the config options of
the custom chip "Ramsey". This chip controls the Fast RAM in
A3000 & A4000 Amiga's. The other programs "SetRamsey" and
"Ramsey" have been around for sometime now, but with no bug
fixes, missing features or just supporting one version of Ramsey
these programs have serious limitations.

FEATURES:
- Small, fast and pretty reliable¹
- Shows Ramsey version² and config settings
- Config settings can be changed
- Dynamically manages Page mode setting for both versions

REQUIREMENTS:
- Amiga 3000 or 4000
- OS2.0+ (maybe it works under OS1.3, but please no bug reports)

NOTES:
¹Reliable IMHO means the program should not crash or mess up
the Ramsey config register because of bugs which can be
prevented (e.g. Page mode is invalid on version $D). But it's
certainly OK for the user to enter a valid config option which
may not be supported on a certain system (see WARNINGS) thereby,
possibly causing any or all of the above to happen.
RamseyConfig's argument list has the safest options at the
beginning of the list and the more risky options at the end of
the list so nobody should use them by accident!

²RamseyConfig simply reports the version register verbatim so if
this doesn't match the expected, official or chip printed
version number please don't send me version related bug reports!

Here is the link:

http://eab.abime.net/showthread.php?t=78703
 
Last edited:
I haven't messed with the Ramsey at all but this looks like a nice program for those who do. :)

Heather
 
** NEWS UPDATE **

v1.1 Released!

Some useful improvements
- Added code to parse multiple arguments
- Simplified argument message code

UPDATE INFO:
The new multiple argument parsing code (since v1.1) has the
following limitations:
- Delimiters must be either spaces or commas
- The maximum number of arguments accepted is 4
- The first invalid argument terminates the argument search
 
** 2ND NEWS UPDATE **

v1.2 Released!
RamseyConfig now works more like a "Standard" Shell tool!
- Replaced custom argument parsing code with ReadArgs()
- Replaced paired SuperState()/UserState() calls with single
Supervisor() calls

UPDATE INFO:
The new ReadArgs() parsing code (since v1.2) now accepts up to 4 arguments but you must use the full argument name. Also, delimiters must be spaces.
 
Feedback:

OS: Fresh 3.1 (okay, updated 2.1 to 3.1) on KS 40.68 ROMs, A3000/25, 2MB/16MB DRAM, SCSI2SD v6 (latest firmware).

Setpatch CACHE BURST FASTROM
RamsyConfig reports: Ver: $D, Page/Burst/Wrap: 0, Refresh: 238 (note: 8x pieces of 60ns DRAM on replacement RAM board, no ZIPs)
RSCP Read to FastRAM: 2197K/sec, DhryIdle: 4794, DhryBusy: 3794
bustest (v0.19) to FastRAM
Read (w/l/m): 8.0, 12.1, 13.2/13.1
Write (w/l/m): 8.0, 16.0, 18.1

Pushing Refresh to R380 with RamseyConfig:
RSCP Read to FastRAM: 2206K/sec, DhryIdle: 4809, DhryBusy: 3982
bustest (v0.19) to FastRAM
Read (w/l/m): 8.1, 12.4, 12.9*/13.0
Write (w/l/m): 8.1, 16.2, 18.1/18.2
*- slightly slower than stock
All others marginal improvement.
Note: Running Bustest more than once sometimes produced about .1 variance in the reported numbers.
Curious Note: I pushed ConfigRamsey Burst on the DRAM, and of course it hung the system, but the warm reboot did not clear the refresh value - still 380...

RamseyConfig Wrap = 1 (R380)
RSCP Read to FastRAM: 2202K/sec, DhryIdle: 4780/4810, DhryBusy: 3999/4022
bustest (v0.19) to FastRAM
Read (w/l/m): 8.1, 12.3, 12.9
Write (w/l/m): 8.1, 16.2, 18.1/18.2

Pushing the above, but R154 provided:

RSCP Read to FastRAM: 2202K/sec, DhryIdle: 4794, DhryBusy: 3739
bustest (v0.19) to FastRAM
Read (w/l/m): 8.2, 12.0, 13.0
Write (w/l/m): 8.0, 16.0, 17.8

Net Results: A subtle improvement on the CPU performance with the lower refresh on a DRAM system. RSCP transfer differences are minimal, and may represent only a minor driver overhead improvement.

- - - Updated - - -

I also have an A3000 with 16MB SCRAM in it, but for some reason, with WB3.1, as soon as it hits SetPatch, where the Inst cache and burst are enabled, the Burst bit I have confirmed in Ramsey is already set at power-on/reboot, and it Gurus when it hits that. If I call ConfigRamsey to disable the Burst bit, it's happy. I will review all of the ZIP the memory in the unit just in case, but I generally know it's 80ns Toshiba Static Column RAM. Refresh tinkering to R157 to R380 had no effect on the crash behavior. I saw similar numbers as with the 60ns DRAM machine, so not posting them again.

- - - Updated - - -

And upon quick inspection, I seem to have gotten a mix of TC514402Z 60ns and 80ns - methinks that's not good.

- - - Updated - - -

And one more edit...

Grabbing SCSIPrefs and turning on Sync bumps the SCSI into >3.5MB/sec land. RSCP:

With R238:
Rate: 3580K
Idle: 4851
Busy: 3518

With R380:
Rate: 3580K
Idle: 4911
Busy: 3491

All above is with a full Workbench boot.
 
Are some of your DRAMs Fast Page instead of Static Column. Think you need all SC otherwise enabling burst will crash the system.

Isn't the SCSI2SD a bottleneck?
 
I'm positive all are SC (TC514402AZ), but 1 effective bank (the first) are 80ns, the rest are 60ns. I've read that some may have mixed 60's and 70's, or 70's with 80's, but 60ns-80ns is a wider gap. Due to the nature of the pins on Zip pieces, I'm not taking any out until I have all the same - in a few weeks.

SCSI2SC v6 does Sync, but I haddn't tinkered with SCSIPrefs when I first started, hence the initial lower rate. I also have an AzTek MonsterCF, and a MonsterSATA w/80GB SSD, both of which I know do Sync and are (potentially) rated for higher than the v6 SCSI2SD - their performance on an up-clocked A2000 G-Force 040/33+GuruROM, are similar @ 3.6MB/sec. I am currently lacking in the area of passive terminator packs for the motherboard, and the Monster adapter boards don't seem to be happy on the short 4" cable I have even with their terminators active. I didn't get around to trying a longer cable and 2 devices that are terminated. My main intent was to see if the RAMSEY settings changed performance in both CPU and/or motherboard DMA read performance a definitive amount. Until I can stabilize the 68030+SCRAM system, which should do Burst, I won't have any info. The DRAM system is a fraction better with the lower refresh rate, so if you have the option to lower it, it's a tweak to go do. The other options don't matter for DRAM (and it of course won't do burst). As for disk module performance, it's hard to tell with the small amount difference whether it's the minor OS performance bump making the driver handle hardware requests a tiny bit better, or if the SDMAC getting a fraction more transfer windows.

One may need to hack down to basic hardware I/O and try timing individual DMA reads and DMA writes - without the OS running - to see if the better refresh timing actually impacts SDMAC that much. Maybe something like Ralph Babel's RAWSCSISpeed (known not compatible with the flash adapters, needs a device with a real-world buffer on the drive). 'Worst case' RAM read is at least twice the read bandwidth of the above SCSI transfer, and memory is 4-5 times faster under 'best case'. SDMAC/33C93A will top out at a just under the theoretical 5.0MB/sec Sync max (33C93A ~14.xxxMhz clocked means more like 4.8MB/sec).
 
Last edited:
@thebajaguy

Thanks for the feedback. FYI Burst mode enabled can only improve CPU performance on a (68030*) system with SC RAM. It won't in any way affect DMA performance since neither the A3000 SDMAC, A2091 DMAC, nor GVP DPRC support Burst. So for RSCP the Dhrystone benchmark may improve a little and that's all.

Increasing the Refresh mode count decreases the Refresh frequency, hence memory read/write performance will improve a little which in turn improves CPU performance. Unfortunately, Bustest by default leaves the data cache enabled so you get an (unreliable) mix of cache + memory results. Use [CPU NODATACACHE] to see more (reliable) memory test results. ;)

@Thread

Beware of Chinese "Fake" SC RAMs (especially when they offer SC RAMs for the same price as FP RAMs)! :roll:


*Unfortunately, most 040-060 accelerator cards either don't support Burst or only support it for "Local Bus" Fast RAM.
 
Last edited:
@Speedgeek

Pondering.... Assuming the data cache loads the code and the benchmark code then cache-hits every time (for the purposes of the code timing loop), if you then turn off the data cache, you now don't get any data burst read or write (like flipping Ramsey's Burst bit off), so therefore there's no point in running the benchmark, or having SC RAM, as the potential performance difference should be the same as page mode. The only measure that possibly gets tested here is Inst Burst's ability to re-load the benchmark code when something else in the system (multitasking on) bumps that code out. I'm just not seeing the purpose of turning that cache off just to negate the only real thing we are trying to 'test'.

BusTest, running with a 256K buffer, is fairly good for testing the best-case memory-bus access, as the cache needs to load sequential from memory constantly. I recognize that actual program data won't hit like this all the time. We are of course looking to keep the CPU fed with a fairly good stream (of instructions), so real-world Inst burst read/caching will have a fairly good benefit from code that has good design and doesn't branch too often. We hope the data burst reads and writes are able to use ~2 (or more) of the 4x 32-bit longwords pulled in so that the clock penalty of the burst read for when a single byte, or 32-bit longword is needed is infrequent. I see it as best case is being sought on a BusTest benchmark acting in full sequential mode where the access is most efficient, yet real-world code and data will hopefully bring several % improvement on that random environment.

Thinking aloud - as I have thought that BusTest is a bit due for a modernization for some time:

I foresee a need for one benchmark tool that has a lot of instructions in it to test just the Instruction load efficiency of accelerator caches - some place a cache with 32-bit RAM is good, and burst reads can make an additional difference, but cache only on poor memory subsystems will show more real-world performance (like 68K's with caches, and 68030's without 32-bit RAM), and the native A3640 without wait state removals shows in the benchmark what we know happens in the original and enhanced state machine. Using instructions which are fairly popular, but build the code in such a way that one can flip a decision variable (one of several) held in a few registers? and cause (for our purposes) planned (by the coder) branching at known intervals. Code has to be several times the 68040/68060 caches in actual size to give the cache-loading and the predictors some work to do. This would test the burst performance on the Instruction side well, and the impact of Data I/O would be minimal as little would load from RAM for the Data side, if anything. A register or two could keep track of how many instructions, or instruction blocks, have processed, and system timers get checked at code or maybe timer-interrupt intervals. Would be a lot of copy-paste once the basic logic was hammered out, but 128K or 256K of code to chew on might be nice.


For a benchmark on the data cache/memory access testing side, I'd make sure there were several different buffers that were available. Both aligned, and also unaligned - maybe go with several chunks of 64K and switch between them vs a single large aligned 256K buffer. Give the caches, memory controllers, and the branch predictors both an easy and a harder time. Something like this might turn out to be a good memory controller test tool, as catching unaligned accesses (and their penalty, or mis-handling) is probably not in most HW engineer's toolboxes.


I would have put in my A3640 with Adapter+68060 and run these all again, but my current A3640 board is v3.1 with none of the wait state enhancements (which makes it worse than 68030/25), and it's also one of those that that isn't happy with A3000 DMA (DMAC-02/Ramsey 04), plus I want to burn GALs for the motherboard (the SC system only has the GALs update). I have all of the parts to update all of them, it's just time needed.

Noted on the sourced RAM, and the Burst by 040/060 accelerators. My 80ns was apparently the original memory, and the 60ns I added, now that I looked at the purchase history. I may pull out the 80ns, move the 8x bank-3 60ns chips to bank 0, and test again. I have other priorities, including a work trip covering Fri-Thurs again, so I may revisit some things in a week or two.
 
Last edited:
@thebajaguy

Well, of course if the only thing you are interested in is Burst performance testing the data cache then you need to have it enabled since Burst is a cache mode performance feature.

But if you want to test Refresh mode or Page mode (Ramsey version $F) performance then you have a good reason to disable the data cache since mixing in cache mode performance will provide unreliable results.

Regarding the data cache, having burst mode enabled doesn't often give any worthwhile performance benefit. That's probably why Setpatch only enables (030) instruction Burst mode. The problem is Burst mode offers both "Best" case and "Worst" case performance results.

The best case is the next 3 sequential long words result in a cache "Hit". The worst case is when the next 3 sequential long words result in a cache "Miss". If you disable "Wrap" mode it can help reduce the worst case results but data structures are often fragmented or non-sequential in memory and this often shows in typical "Benchmark" results but not in sequential "Bustest" results.
 
Last edited:
Back
Top Bottom