Page 1 of 2 12 LastLast
Results 1 to 10 of 11

Thread: FastCache040+ Released!

  1. #1
    VIP
    Amibayer!
    SpeedGeek's Avatar
    Join Date
    Jan 2011
    Country
    USA
    Region:
    Wisconsin
    Age
    54
    Posts
    852
    Feedback
    21 (100%)

    Lightbulb FastCache040+ Released!

    FastCache040+ 1.0 ©SpeedGeek 2017

    INTRODUCTION:
    FastCache040+ is a patch to replace the CachePreDMA() and
    CachePostDMA() functions of most 68040/060 libraries. While
    the old functions are adequate they are far from optimal.
    These old functions have 3x more code then the new ones
    provided with this patch!

    Also, the new functions implement a much more efficient method
    of managing the Copyback cache for DMA. While every system
    will have some CPU performance loss under DMA conditions, the
    new functions keep this performance loss to a bare minimum.

    FEATURES:
    - Replaces CachePreDMA() and CachePostDMA() with smaller
    and more efficient code
    - Replaces complex MMU code with simple and fast DTTR code
    - Temporarily changes Copyback mode to Write Through for DMA
    (but only when required!)
    - Never flushes the ATC!
    - Never flushes the DC for Chip RAM DMA!
    - Uses 68040/060 library detection code
    - Will not patch itself
    - 100% Assembler code

    CODE SIZE COMPARISONS:
    - FastCache040+ 1.0 (NewFunc 132 bytes)
    - 68060.library 46.7 (OldFunc 304 bytes)
    - 68040.library 44.2 (OldFunc 414 bytes)

    REQUIREMENTS:
    - Amiga with 68040 or 68060 CPU and MMU
    - 68040.library or 68060.library

    WARNING:
    Do NOT use this patch with GigaMEM, VMM or any similar
    virtual memory software! Do NOT use this patch with any
    code which uses the MMU to write protect or remap modified
    data structures!

    NOTES:
    Remapping a mirror image of the Kickstart ROM with the MMU
    is OK! The new functions still have one thing in common with
    the old functions. They do NOT translate virtual addresses
    as specified in the Amiga RKRM! For more info on the old
    functions see the Enforcer.guide by Michael Sinz.

    HISTORY:
    v1.0 - First release

    Here is the link:

    http://eab.abime.net/showthread.php?...90#post1189690

    Last edited by SpeedGeek; 11th October 2017 at 14:40.

  2. #2
    VIP
    Amibayer!
    SpeedGeek's Avatar
    Join Date
    Jan 2011
    Country
    USA
    Region:
    Wisconsin
    Age
    54
    Posts
    852
    Feedback
    21 (100%)

    Default

    ** NEWS UPDATE **

    Sorry, there was a bug in v1.0 with the patch install code.

    v1.1 - Fixed a bug which prevented the patch from installing
    - Added code to use OldCachePreDMA for MEMF_24BIT
    transfers (I don't know why errors occured here)
    Last edited by SpeedGeek; 6th October 2017 at 08:31.

  3. #3
    VIP
    Amibayer!
    SpeedGeek's Avatar
    Join Date
    Jan 2011
    Country
    USA
    Region:
    Wisconsin
    Age
    54
    Posts
    852
    Feedback
    21 (100%)

    Default

    ** 2ND NEWS UPDATE **

    v1.2 released (updated patch size info)
    - Added code to use OldCachePostDMA for MEMF_24BIT
    transfers (So MMU Pages can be restored to original)

    EDIT:
    OK, I believe I have found a solution to the MEMF_24BIT transfer
    error problem without OldPre/OldPost calls. Unfortunately, the cache mode would have to be changed to NoCache.

    This would make the NewFunc code a little smaller but could reduce CPU performance a little for MEMF_24BIT transfers.

    So it's a trade off situation... will give it some more thought!
    Last edited by SpeedGeek; 6th October 2017 at 17:38.

  4. #4
    VIP
    Amibayer!
    SpeedGeek's Avatar
    Join Date
    Jan 2011
    Country
    USA
    Region:
    Wisconsin
    Age
    54
    Posts
    852
    Feedback
    21 (100%)

    Default

    ** 3RD NEWS UPDATE **

    v1.3 Released!
    - Added code to change MEMF_24BIT transfers to NoCache.
    This eliminated all OldFunc calls. MEMF_24BIT transfers may have
    some CPU performance loss but the NewFunc code performance
    benefits should still justify this.

    NOTES: v1.2 will still be available for download for users if they
    believe using OldFunc calls is still justified. The v1.2 NewFuncSrc
    for lbC00004E should read as follows:
    CINVA NC ;Support 060, 040 not sure?

    EDIT:
    v1.4 Released!
    - Removed MEMF_24BIT code from PreDMA/PostDMA for the
    case of 16 byte aligned transfers. This will allow
    some MEMF_24BIT transfers to be cache enabled!

    EDIT2:
    The v1.4 NewFuncSrc for lbC000080 should read as follows:
    ORI.W #$8000,D1 ;Cache WT mode + User FC
    Last edited by SpeedGeek; 13th October 2017 at 17:32.

  5. #5
    VIP
    Amibayer!
    SpeedGeek's Avatar
    Join Date
    Jan 2011
    Country
    USA
    Region:
    Wisconsin
    Age
    54
    Posts
    852
    Feedback
    21 (100%)

    Default

    Ok guys, now it's your turn to post your compatibility results!

    Please provide information on 68040.library or 68060.library vendor and version. Also, accelerator card type and vendor is requested too. Thank you!

  6. #6
    VIP
    Amibayer!
    SpeedGeek's Avatar
    Join Date
    Jan 2011
    Country
    USA
    Region:
    Wisconsin
    Age
    54
    Posts
    852
    Feedback
    21 (100%)

    Default

    ** 4TH NEWS UPDATE **

    The was another stupid version bug in v1.4 which has now been fixed (It was a just a fully functional v1.4 reporting itself as v1.3).

    I now have a simple benchmark tool called "CacheDMAmips" (see attached image). I will probably release it when I am satisfied with the compatibility results.

    EDIT: CacheDMAmips was removed for providing bogus results. Obviously, programs compiled on an old "Pile of Crap" C compiler and using v34 timer.device functions are not so reliable. Mips benchmark results are generally bogus anyway! Thus a new improved benchmark tool is called for!
    Last edited by SpeedGeek; 23rd October 2017 at 18:01.

  7. #7
    VIP
    Amibayer!
    SpeedGeek's Avatar
    Join Date
    Jan 2011
    Country
    USA
    Region:
    Wisconsin
    Age
    54
    Posts
    852
    Feedback
    21 (100%)

    Default

    Ok, here are images of the new improved benchmark tool. Sadly, only 1 user has provided compatibility results so far?
    Attached Thumbnails Attached Thumbnails CACHEDMABENCH040.PNG   CACHEDMABENCH060.PNG  

  8. #8
    Jack of Many Trades, Semi-Master of Some thebajaguy's Avatar
    Join Date
    May 2017
    Country
    United States
    Region:
    Rhode Island
    Posts
    103
    Feedback
    15 (100%)

    Default

    This was the first time I found this section.

    I will try to drop things into my systems this weekend. A4000T, but need to hook up a SCSI drive (currently using IDE DOM), GVP 68060/60 (ultrasound) board with 64MB, 256MB on the Z3 space, full 16MB mobo, an X-Surf, and a GVP Spectrum 28/24 in the system. I also have 2 A3000's I can equip with a 3640/060 at stock speed but with wait state removal (one with static column RAM mobo RAM, one without), and I have a modded PP&S 040->060/28Mhz I can equip with either a GVP Series II with RAM, or an A2091/2MB for different 16-bit memory DMA controller behavior. I'll see if the RSCP benchmark might show any differences.
    A500/A1000/A1200/A2000(4x)/A3000D(2x)/A4000D(under repair)/A4000T loaded....Toys...Toys...Toys...
    Former GVP Tech Support 1989-93 - The beatings will continue until morale improves...

  9. #9
    Jack of Many Trades, Semi-Master of Some thebajaguy's Avatar
    Join Date
    May 2017
    Country
    United States
    Region:
    Rhode Island
    Posts
    103
    Feedback
    15 (100%)

    Default

    System: (default)
    A2000 Rev 6.2, KS 6.x modified for 68060. Setpatch 44.2, Workbench 3.5, ECS 2MB Agnus/ECS Denise, in standard resolution, PAL clock rate.
    PP&S 68040 28Mhz upgraded to a 68060, 32MB 32-bit RAM (high), GVP HC8/2MB (WhichAmiga reports 55.5Mhz)
    RSCP 1.1 745K/sec on gvpscsi.device 4.15, Dhry idle 53121, Dhry busy 35481 % 66.7% (512K setting)
    68040.library 46.5 (stub), 68060.library 46.16, exec.library 45.20, - all in high 32-bit RAM, MMU in use (using FastExec with fastexp fastvbr, fastmem, fastssp, and the 32-bit RAM added in) and ksremap
    I have MuFastROM On Protect, MuFastZero On, MuFastChip On, MuLockLib
    SysInfo reports 41497 Dhry, 43.31 MIPS, 31.05 MFlops, 59.36 over an A2000, and 759,104 on the disk speed.
    HD is an Aztek CF Monster that is capable of over 2.7MB/sec if I had an ideal system (just GuruROM and stock 68K with GVP-HC but is subject to ZII DMA/CPU copy up, and GVP Async SCSI (which I'd normally see about 2.2MB/sec in a stock 68K).
    CPU reports Cache, Burst (memory supports it), and Copyback.

    With FastCache040+ installed:
    RSCP (two runs): 744K/sec
    Dhry Idle: 53500 / 53118
    Dhry Busy: 35188 / 35338, 65.7% / 66.5%
    Same numbers on SysInfo (as expected)
    Bustest 0.19
    read speed (32-bit Fast) spread is 27.4-30.8MB/sec, and writes are 18.0-18.2MB/sec
    read speed (16-bit Fast) spread is 2.8MB/sec, and writes are 1.4MB/sec

    No stability issues.
    The system has 5 FFS partitions, 3 are 1GB, the other two are split.
    System 32-bit RAM sits at $08000000-09FFFFFF

    A different system to follow hopefully soon.
    Last edited by thebajaguy; 28th October 2017 at 03:05.
    A500/A1000/A1200/A2000(4x)/A3000D(2x)/A4000D(under repair)/A4000T loaded....Toys...Toys...Toys...
    Former GVP Tech Support 1989-93 - The beatings will continue until morale improves...

  10. #10
    Jack of Many Trades, Semi-Master of Some thebajaguy's Avatar
    Join Date
    May 2017
    Country
    United States
    Region:
    Rhode Island
    Posts
    103
    Feedback
    15 (100%)

    Default

    Ok, next system - the beast.

    A4000T, TekMagic 68060/60Mhz/64MB RAM, boot from IDE DOM, testing DMA off the SCSI with the same Aztek MonsterCF disk module used on the HC8. .


    RSCP 1.1 4248/sec on scsi.device (), Dhry idle is 57411 / 58075, Dhry busy is 56494 / 56261, 98.9% / 96.8 (512K setting)
    68040.library v1.2 (stub) (26.04.9 Ralph Babel (modified by Boyd Edmonson), 68060.library 2.4 (13.05.9 Ralph Babel (modified by Boyd Edmonson)
    Sysinfo reports 44521 Dhry, 46.47 MIPS, 33.31 MFlops, 63.69 over an A2000, and 4183148 for the SCSI. exec.library 45.20 (Kickstart 45.57), scsi.device (IDE) 40.20, 2nd.scsi.device (SCSI) 43.45

    With FastCache040+
    RSCP (Two Runs): 4248K/sec
    Dhry Idle: 57958 / 57954
    Dhry Busy: 56386 / 56396, 97.2% / 97.3%
    Sysinfo has 4205518 on the SCSI disk.

    Bustest 0.19
    read speed (32-bit Fast/060 board 08000000) spread is 42.4-51.8MB/sec, and writes are 38.5-38.7MB/sec
    read speed (32-bit Chip) spread is 2.3-4.5MB/sec, and writes are 3.5-7.0MB/sec

    System remains stable.
    I've got a 256MB Z3 card, a Z3 Spectrum 28/24 (nothing running on it at the moment, but hw/picaso96 sw installed, and an x-surf 100 with drivers not active.

    Loading the AmiTCP v3 stack and mounting my 5.5TB Nas SMB share doesn't affect stability or the RSCP the numbers.

    I need to get the Monster SCSI2SATA and the 80GB SSD connected up to see if I can get some higher numbers on the SCSI. I know it can do more, and I'd be curious as the speed goes up how it affects the cache routines.
    A500/A1000/A1200/A2000(4x)/A3000D(2x)/A4000D(under repair)/A4000T loaded....Toys...Toys...Toys...
    Former GVP Tech Support 1989-93 - The beatings will continue until morale improves...

Page 1 of 2 12 LastLast

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •