FastCache040+ Released!

SpeedGeek

Member
AmiBayer
Joined
Jan 21, 2011
Posts
934
Country
USA
Region
Wisconsin
FastCache040+ 1.0 ©SpeedGeek 2017

INTRODUCTION:
FastCache040+ is a patch to replace the CachePreDMA() and
CachePostDMA() functions of most 68040/060 libraries. While
the old functions are adequate they are far from optimal.
These old functions have 3x more code then the new ones
provided with this patch!

Also, the new functions implement a much more efficient method
of managing the Copyback cache for DMA. While every system
will have some CPU performance loss under DMA conditions, the
new functions keep this performance loss to a bare minimum.

FEATURES:
- Replaces CachePreDMA() and CachePostDMA() with smaller
and more efficient code
- Replaces complex MMU code with simple and fast DTTR code
- Temporarily changes Copyback mode to Write Through for DMA
(but only when required!)
- Never flushes the ATC!
- Never flushes the DC for Chip RAM DMA!
- Uses 68040/060 library detection code
- Will not patch itself
- 100% Assembler code

CODE SIZE COMPARISONS:
- FastCache040+ 1.0 (NewFunc 132 bytes)
- 68060.library 46.7 (OldFunc 304 bytes)
- 68040.library 44.2 (OldFunc 414 bytes)

REQUIREMENTS:
- Amiga with 68040 or 68060 CPU and MMU
- 68040.library or 68060.library

WARNING:
Do NOT use this patch with GigaMEM, VMM or any similar
virtual memory software! Do NOT use this patch with any
code which uses the MMU to write protect or remap modified
data structures!

NOTES:
Remapping a mirror image of the Kickstart ROM with the MMU
is OK! The new functions still have one thing in common with
the old functions. They do NOT translate virtual addresses
as specified in the Amiga RKRM! For more info on the old
functions see the Enforcer.guide by Michael Sinz.

HISTORY:
v1.0 - First release

Here is the link:

http://eab.abime.net/showthread.php?...90#post1189690

 
Last edited:
** NEWS UPDATE **

Sorry, there was a bug in v1.0 with the patch install code. :Doh:

v1.1 - Fixed a bug which prevented the patch from installing
- Added code to use OldCachePreDMA for MEMF_24BIT
transfers (I don't know why errors occured here)
 
Last edited:
** 2ND NEWS UPDATE **

v1.2 released (updated patch size info)
- Added code to use OldCachePostDMA for MEMF_24BIT
transfers (So MMU Pages can be restored to original)

EDIT:
OK, I believe I have found a solution to the MEMF_24BIT transfer
error problem without OldPre/OldPost calls. Unfortunately, the cache mode would have to be changed to NoCache.

This would make the NewFunc code a little smaller but could reduce CPU performance a little for MEMF_24BIT transfers.

So it's a trade off situation... will give it some more thought!
:D
 
Last edited:
** 3RD NEWS UPDATE **

v1.3 Released!
- Added code to change MEMF_24BIT transfers to NoCache.
This eliminated all OldFunc calls. MEMF_24BIT transfers may have
some CPU performance loss but the NewFunc code performance
benefits should still justify this.

NOTES: v1.2 will still be available for download for users if they
believe using OldFunc calls is still justified. The v1.2 NewFuncSrc
for lbC00004E should read as follows:
CINVA NC ;Support 060, 040 not sure?

EDIT:
v1.4 Released!
- Removed MEMF_24BIT code from PreDMA/PostDMA for the
case of 16 byte aligned transfers. This will allow
some MEMF_24BIT transfers to be cache enabled!

EDIT2:
The v1.4 NewFuncSrc for lbC000080 should read as follows:
ORI.W #$8000,D1 ;Cache WT mode + User FC
 
Last edited:
Ok guys, now it's your turn to post your compatibility results!

Please provide information on 68040.library or 68060.library vendor and version. Also, accelerator card type and vendor is requested too. Thank you! :)
 
** 4TH NEWS UPDATE **

The was another stupid version bug in v1.4 which has now been fixed (It was a just a fully functional v1.4 reporting itself as v1.3).

I now have a simple benchmark tool called "CacheDMAmips" (see attached image). I will probably release it when I am satisfied with the compatibility results. ;)

EDIT: CacheDMAmips was removed for providing bogus results. Obviously, programs compiled on an old "Pile of Crap" C compiler and using v34 timer.device functions are not so reliable. Mips benchmark results are generally bogus anyway! Thus a new improved benchmark tool is called for! :D
 
Last edited:
Ok, here are images of the new improved benchmark tool. Sadly, only 1 user has provided compatibility results so far? :roll:
 

Attachments

  • CACHEDMABENCH040.PNG
    CACHEDMABENCH040.PNG
    4.2 KB · Views: 5
  • CACHEDMABENCH060.PNG
    CACHEDMABENCH060.PNG
    4.2 KB · Views: 6
This was the first time I found this section.

I will try to drop things into my systems this weekend. A4000T, but need to hook up a SCSI drive (currently using IDE DOM), GVP 68060/60 (ultrasound) board with 64MB, 256MB on the Z3 space, full 16MB mobo, an X-Surf, and a GVP Spectrum 28/24 in the system. I also have 2 A3000's I can equip with a 3640/060 at stock speed but with wait state removal (one with static column RAM mobo RAM, one without), and I have a modded PP&S 040->060/28Mhz I can equip with either a GVP Series II with RAM, or an A2091/2MB for different 16-bit memory DMA controller behavior. I'll see if the RSCP benchmark might show any differences.
 
System: (default)
A2000 Rev 6.2, KS 6.x modified for 68060. Setpatch 44.2, Workbench 3.5, ECS 2MB Agnus/ECS Denise, in standard resolution, PAL clock rate.
PP&S 68040 28Mhz upgraded to a 68060, 32MB 32-bit RAM (high), GVP HC8/2MB (WhichAmiga reports 55.5Mhz)
RSCP 1.1 745K/sec on gvpscsi.device 4.15, Dhry idle 53121, Dhry busy 35481 % 66.7% (512K setting)
68040.library 46.5 (stub), 68060.library 46.16, exec.library 45.20, - all in high 32-bit RAM, MMU in use (using FastExec with fastexp fastvbr, fastmem, fastssp, and the 32-bit RAM added in) and ksremap
I have MuFastROM On Protect, MuFastZero On, MuFastChip On, MuLockLib
SysInfo reports 41497 Dhry, 43.31 MIPS, 31.05 MFlops, 59.36 over an A2000, and 759,104 on the disk speed.
HD is an Aztek CF Monster that is capable of over 2.7MB/sec if I had an ideal system (just GuruROM and stock 68K with GVP-HC8) but is subject to ZII DMA/CPU copy up, and GVP Async SCSI (which I'd normally see about 2.2MB/sec in a stock 68K).
CPU reports Cache, Burst (memory supports it), and Copyback.

With FastCache040+ installed:
RSCP (two runs): 744K/sec
Dhry Idle: 53500 / 53118
Dhry Busy: 35188 / 35338, 65.7% / 66.5%
Same numbers on SysInfo (as expected)
Bustest 0.19
read speed (32-bit Fast) spread is 27.4-30.8MB/sec, and writes are 18.0-18.2MB/sec
read speed (16-bit Fast) spread is 2.8MB/sec, and writes are 1.4MB/sec

No stability issues.
The system has 5 FFS partitions, 3 are 1GB, the other two are split.
System 32-bit RAM sits at $08000000-09FFFFFF

A different system to follow hopefully soon.
 
Last edited:
Ok, next system - the beast.

A4000T, TekMagic 68060/60Mhz/64MB RAM, boot from IDE DOM, testing DMA off the SCSI with the same Aztek MonsterCF disk module used on the HC8. .


RSCP 1.1 4248/sec on scsi.device (), Dhry idle is 57411 / 58075, Dhry busy is 56494 / 56261, 98.9% / 96.8 (512K setting)
68040.library v1.2 (stub) (26.04.98) Ralph Babel (modified by Boyd Edmonson), 68060.library 2.4 (13.05.98) Ralph Babel (modified by Boyd Edmonson)
Sysinfo reports 44521 Dhry, 46.47 MIPS, 33.31 MFlops, 63.69 over an A2000, and 4183148 for the SCSI. exec.library 45.20 (Kickstart 45.57), scsi.device (IDE) 40.20, 2nd.scsi.device (SCSI) 43.45

With FastCache040+
RSCP (Two Runs): 4248K/sec
Dhry Idle: 57958 / 57954
Dhry Busy: 56386 / 56396, 97.2% / 97.3%
Sysinfo has 4205518 on the SCSI disk.

Bustest 0.19
read speed (32-bit Fast/060 board 08000000) spread is 42.4-51.8MB/sec, and writes are 38.5-38.7MB/sec
read speed (32-bit Chip) spread is 2.3-4.5MB/sec, and writes are 3.5-7.0MB/sec

System remains stable.
I've got a 256MB Z3 card, a Z3 Spectrum 28/24 (nothing running on it at the moment, but hw/picaso96 sw installed, and an x-surf 100 with drivers not active.

Loading the AmiTCP v3 stack and mounting my 5.5TB Nas SMB share doesn't affect stability or the RSCP the numbers.

I need to get the Monster SCSI2SATA and the 80GB SSD connected up to see if I can get some higher numbers on the SCSI. I know it can do more, and I'd be curious as the speed goes up how it affects the cache routines.
 
** 5TH NEWS UPDATE **

The new benchmark tool has now been released! The lamers who failed to provide compatibility feedback owe a BIG THANKS to the users who did. A very special Thanks to thebajaguy for providing feedback on multiple systems! :)

BTW, these benchmark results were easily predictable. It's a No-Brainer!
 
** 6TH NEWS UPDATE **

v1.5 - Found an occasional Recoverable Alert bug which could
possibly result in a crash but only on 060 systems!
The simple fix was to move "CINVA NC" in PostDMA to the
end of the code.
- Removed the "+" character from the executable name due
to a unknown "Feature" of the Amiga Shell causing script
execution and version command problems.


EDIT: [CPU060 NOWRITEBUFFER] with the Phase5 46.7 68060.library seems to be a more reliable solution than the v1.5 update. Some more testing is required.
 
Last edited:
** 7TH NEWS UPDATE **

v1.6 - Added code to PostDMA to Flush the cache conditionally
(if the Store buffer and cache are enabled). Added NOPs
to sync the pipelines before RTE (CINVA is now obsolete)

UPDATE:
68040 users can use v1.4 or v1.5 if they like since they will
be a little faster than v1.6 but 68060 users should use v1.6!
68060 users will now have a performance trade off to consider
in deciding whether to enable the store buffer.
 
** 8TH NEWS UPDATE **

v1.6P5 Removed code to allow PostDMA cache Flush for the case of
16 byte aligned transfers. Added code to skip PostDMA
cache Flush for the case of cache disabled MEMF_24BIT
transfers.

UPDATE:
v1.6P5 is my last attempt solve compatibility problems with
the Phase5 68060.library and Store buffer enabled. This
library is unstable and buggy WITH or WITHOUT FastCache040+
so either disable the Store buffer or expect the problems to
continue with only a MINIMAL improvement provided by this
patch!

v1.7 - Removed all v1.6P5 PostDMA cache flush code so most users
(except Phase5 68060.library users) can run at full speed!

UPDATE:
Phase5 68060.library users should use v1.6P5. All others users
can (probably) use v1.4, v1.5 or v1.7 without any problems.
 
** 9TH NEWS UPDATE **

FastCache040+ v1.6P5 has been removed. Phase5 68060.library users should use FixMapP5 before using this patch.

FixMapP5 1.2 ©SpeedGeek 2018 (MMU Handler ©Michael Sinz 2001)

INTRODUCTION:
FixMapP5 is a tool to modify some of the default MMU mapping of
the Phase5 68040 and 68060 libraries. This can improve stability
and prevent crashing under the following condition:

- Hardware or software interrupts which occur during a Chip RAM
access by the 68060 (In particular when Store buffer is enabled).

Software bugs which allow illegal writes to the $F80000 Standard
Kickstart ROM can cause a debugging problem in Copyback mode so this patch corrects that problem as well.

FEATURES:
- Changes Chip RAM mode to Precise (68060 only)
- Changes Standard ROM cache to Writethrough (68040 or 68060)
- Uses 68040/060 library detection code
- 100% Assembler code

REQUIREMENTS:
- Amiga with 68040 or 68060 CPU and MMU
- Phase5 68040.library or 68060.library

WARNING:
This tool was developed ONLY for use with the Phase5 libraries but
it does NOT actually verify such usage. So it can and probably
will mess up the mapping of ANY other libraries!

CREDITS:
Thanks to Michael Sinz for his freely distributable MMU handler.

HISTORY:
v1.0 - First release
v1.1 - Added code to skip mapping $F00000 space (which included
$F80000 space) for CyberstormPPC, CyberstormMK3 and
BlizzardPPC
v1.2 - Replaced FindName() with FindResident() since v1.1 wasn't working at all. Also, fixed a typo on module names.
 
Last edited:
** 10TH NEWS UPDATE **

v1.8 Released!
- Reworked the code to eliminate a serious (but seldom noticed) data transfer corruption bug for the case of multiple DMA drivers in the same system. Special Thanks to Ralph Babel for his excellent knowledge on this topic.
 
** 11TH NEWS UPDATE **

v1.9 Released!
- Fixed "D2 Register Not Preserved" coding bug in PreDMA.
Most DMA drivers don't seem to need it preserved but
Thanks to Cosmos for reporting it anyway. Moved PostDMA
Nest count code to user section of code. This eliminates
any calls to Supervisor when the count is more than 1.
v1.9BR Added new "Experimental" code which should allow only
DMA targeted 16MB blocks of Fast RAM to change to Write
Through mode. This "In Theory" allows the other 16MB
blocks to remain in Copyback mode. This can only benefit
"Big RAM" systems with 32MB+ of Fast RAM and ONLY when
these systems run apps which use the extra Fast RAM.
WARNING: Use at you own risk!

CACHEDMABENCH:
v1.0 - First release
v1.1 - Fixed address and size bugs in FC loop code which
could have affected the results.
 
Just had an opportunity to try v1.9 since the v1.4 which I last tested.

TekMagic 060/60/64MB on A4KT native SCSI, Babel 68060.library, full boot.
Without FastCache040: 208080
With FastCache040 after Setpatch: 139390

I didn't try out the BR variant (yet). The two 256M memory boards I have are not in that system at the moment.

FWIW: RSCP_020 has shown me 7870-8000K/sec (Sync/Fast operational) off an Intel SATA SSD w/SCSI2SATA bridge and >95% CPU, although invoking FastCache040 later (and only on rare occasion) has had a low >25% CPU amount. I'm guessing alignment of various code matters. Running it early after SetPatch I didn't notice any issues like that.

A2000 G-Force 68060/33 64MB, no startup.

CacheDMABench: 1943415
FastCache040, CacheDMABench: 534205
68040.library v44.2

CacheDMABench: 1967955
FastCache040, CacheDMaBench: 538614
68040.library v37.30
 
Last edited:
** 12TH NEWS UPDATE **

FixMapP5 1.3 released

v1.3 - Swapped order of 68040/060 library test. Some OS 3.1
systems use a dummy 68040.library (which does not expunge)
and prevented the chip RAM change to precise. Thanks to
Northway for reporting this bug.
 
** 13TH NEWS UPDATE **

FixMapP5 1.4 released

v1.4 - Added code to determine the Chip RAM start address from the
system memory list. Hopefully, this solves the problem with
Kickstart versions which config the Chip RAM differently.
 
Back
Top Bottom