No announcement yet.

FastCache040+ Released!

  • Filter
  • Time
  • Show
Clear All
new posts

  • FastCache040+ Released!

    FastCache040+ 1.0 ©SpeedGeek 2017

    FastCache040+ is a patch to replace the CachePreDMA() and
    CachePostDMA() functions of most 68040/060 libraries. While
    the old functions are adequate they are far from optimal.
    These old functions have 3x more code then the new ones
    provided with this patch!

    Also, the new functions implement a much more efficient method
    of managing the Copyback cache for DMA. While every system
    will have some CPU performance loss under DMA conditions, the
    new functions keep this performance loss to a bare minimum.

    - Replaces CachePreDMA() and CachePostDMA() with smaller
    and more efficient code
    - Replaces complex MMU code with simple and fast DTTR code
    - Temporarily changes Copyback mode to Write Through for DMA
    (but only when required!)
    - Never flushes the ATC!
    - Never flushes the DC for Chip RAM DMA!
    - Uses 68040/060 library detection code
    - Will not patch itself
    - 100% Assembler code

    - FastCache040+ 1.0 (NewFunc 132 bytes)
    - 68060.library 46.7 (OldFunc 304 bytes)
    - 68040.library 44.2 (OldFunc 414 bytes)

    - Amiga with 68040 or 68060 CPU and MMU
    - 68040.library or 68060.library

    Do NOT use this patch with GigaMEM, VMM or any similar
    virtual memory software! Do NOT use this patch with any
    code which uses the MMU to write protect or remap modified
    data structures!

    Remapping a mirror image of the Kickstart ROM with the MMU
    is OK! The new functions still have one thing in common with
    the old functions. They do NOT translate virtual addresses
    as specified in the Amiga RKRM! For more info on the old
    functions see the by Michael Sinz.

    v1.0 - First release

    Here is the link:

    Last edited by SpeedGeek; 11 October 2017, 13:40.

  • #2
    ** NEWS UPDATE **

    Sorry, there was a bug in v1.0 with the patch install code.

    v1.1 - Fixed a bug which prevented the patch from installing
    - Added code to use OldCachePreDMA for MEMF_24BIT
    transfers (I don't know why errors occured here)
    Last edited by SpeedGeek; 6 October 2017, 07:31.


    • #3
      ** 2ND NEWS UPDATE **

      v1.2 released (updated patch size info)
      - Added code to use OldCachePostDMA for MEMF_24BIT
      transfers (So MMU Pages can be restored to original)

      OK, I believe I have found a solution to the MEMF_24BIT transfer
      error problem without OldPre/OldPost calls. Unfortunately, the cache mode would have to be changed to NoCache.

      This would make the NewFunc code a little smaller but could reduce CPU performance a little for MEMF_24BIT transfers.

      So it's a trade off situation... will give it some more thought!
      Last edited by SpeedGeek; 6 October 2017, 16:38.


      • #4
        ** 3RD NEWS UPDATE **

        v1.3 Released!
        - Added code to change MEMF_24BIT transfers to NoCache.
        This eliminated all OldFunc calls. MEMF_24BIT transfers may have
        some CPU performance loss but the NewFunc code performance
        benefits should still justify this.

        NOTES: v1.2 will still be available for download for users if they
        believe using OldFunc calls is still justified. The v1.2 NewFuncSrc
        for lbC00004E should read as follows:
        CINVA NC ;Support 060, 040 not sure?

        v1.4 Released!
        - Removed MEMF_24BIT code from PreDMA/PostDMA for the
        case of 16 byte aligned transfers. This will allow
        some MEMF_24BIT transfers to be cache enabled!

        The v1.4 NewFuncSrc for lbC000080 should read as follows:
        ORI.W #$8000,D1 ;Cache WT mode + User FC
        Last edited by SpeedGeek; 13 October 2017, 16:32.


        • #5
          Ok guys, now it's your turn to post your compatibility results!

          Please provide information on 68040.library or 68060.library vendor and version. Also, accelerator card type and vendor is requested too. Thank you!


          • #6
            ** 4TH NEWS UPDATE **

            The was another stupid version bug in v1.4 which has now been fixed (It was a just a fully functional v1.4 reporting itself as v1.3).

            I now have a simple benchmark tool called "CacheDMAmips" (see attached image). I will probably release it when I am satisfied with the compatibility results.

            EDIT: CacheDMAmips was removed for providing bogus results. Obviously, programs compiled on an old "Pile of Crap" C compiler and using v34 timer.device functions are not so reliable. Mips benchmark results are generally bogus anyway! Thus a new improved benchmark tool is called for!
            Last edited by SpeedGeek; 23 October 2017, 17:01.


            • #7
              Ok, here are images of the new improved benchmark tool. Sadly, only 1 user has provided compatibility results so far?
              Attached Files


              • #8
                This was the first time I found this section.

                I will try to drop things into my systems this weekend. A4000T, but need to hook up a SCSI drive (currently using IDE DOM), GVP 68060/60 (ultrasound) board with 64MB, 256MB on the Z3 space, full 16MB mobo, an X-Surf, and a GVP Spectrum 28/24 in the system. I also have 2 A3000's I can equip with a 3640/060 at stock speed but with wait state removal (one with static column RAM mobo RAM, one without), and I have a modded PP&S 040->060/28Mhz I can equip with either a GVP Series II with RAM, or an A2091/2MB for different 16-bit memory DMA controller behavior. I'll see if the RSCP benchmark might show any differences.
                A500(2x)/A1000/A1200/A2000(4x)/A3000D(2x)/A4000D/A4000T (all loaded....Toys...Toys...Toys...)
                Former GVP Tech Support 1989-93 - The beatings will continue until morale improves...


                • #9
                  System: (default)
                  A2000 Rev 6.2, KS 6.x modified for 68060. Setpatch 44.2, Workbench 3.5, ECS 2MB Agnus/ECS Denise, in standard resolution, PAL clock rate.
                  PP&S 68040 28Mhz upgraded to a 68060, 32MB 32-bit RAM (high), GVP HC8/2MB (WhichAmiga reports 55.5Mhz)
                  RSCP 1.1 745K/sec on gvpscsi.device 4.15, Dhry idle 53121, Dhry busy 35481 % 66.7% (512K setting)
                  68040.library 46.5 (stub), 68060.library 46.16, exec.library 45.20, - all in high 32-bit RAM, MMU in use (using FastExec with fastexp fastvbr, fastmem, fastssp, and the 32-bit RAM added in) and ksremap
                  I have MuFastROM On Protect, MuFastZero On, MuFastChip On, MuLockLib
                  SysInfo reports 41497 Dhry, 43.31 MIPS, 31.05 MFlops, 59.36 over an A2000, and 759,104 on the disk speed.
                  HD is an Aztek CF Monster that is capable of over 2.7MB/sec if I had an ideal system (just GuruROM and stock 68K with GVP-HC but is subject to ZII DMA/CPU copy up, and GVP Async SCSI (which I'd normally see about 2.2MB/sec in a stock 68K).
                  CPU reports Cache, Burst (memory supports it), and Copyback.

                  With FastCache040+ installed:
                  RSCP (two runs): 744K/sec
                  Dhry Idle: 53500 / 53118
                  Dhry Busy: 35188 / 35338, 65.7% / 66.5%
                  Same numbers on SysInfo (as expected)
                  Bustest 0.19
                  read speed (32-bit Fast) spread is 27.4-30.8MB/sec, and writes are 18.0-18.2MB/sec
                  read speed (16-bit Fast) spread is 2.8MB/sec, and writes are 1.4MB/sec

                  No stability issues.
                  The system has 5 FFS partitions, 3 are 1GB, the other two are split.
                  System 32-bit RAM sits at $08000000-09FFFFFF

                  A different system to follow hopefully soon.
                  Last edited by thebajaguy; 28 October 2017, 02:05.
                  A500(2x)/A1000/A1200/A2000(4x)/A3000D(2x)/A4000D/A4000T (all loaded....Toys...Toys...Toys...)
                  Former GVP Tech Support 1989-93 - The beatings will continue until morale improves...


                  • #10
                    Ok, next system - the beast.

                    A4000T, TekMagic 68060/60Mhz/64MB RAM, boot from IDE DOM, testing DMA off the SCSI with the same Aztek MonsterCF disk module used on the HC8. .

                    RSCP 1.1 4248/sec on scsi.device (), Dhry idle is 57411 / 58075, Dhry busy is 56494 / 56261, 98.9% / 96.8 (512K setting)
                    68040.library v1.2 (stub) (26.04.9 Ralph Babel (modified by Boyd Edmonson), 68060.library 2.4 (13.05.9 Ralph Babel (modified by Boyd Edmonson)
                    Sysinfo reports 44521 Dhry, 46.47 MIPS, 33.31 MFlops, 63.69 over an A2000, and 4183148 for the SCSI. exec.library 45.20 (Kickstart 45.57), scsi.device (IDE) 40.20, 2nd.scsi.device (SCSI) 43.45

                    With FastCache040+
                    RSCP (Two Runs): 4248K/sec
                    Dhry Idle: 57958 / 57954
                    Dhry Busy: 56386 / 56396, 97.2% / 97.3%
                    Sysinfo has 4205518 on the SCSI disk.

                    Bustest 0.19
                    read speed (32-bit Fast/060 board 08000000) spread is 42.4-51.8MB/sec, and writes are 38.5-38.7MB/sec
                    read speed (32-bit Chip) spread is 2.3-4.5MB/sec, and writes are 3.5-7.0MB/sec

                    System remains stable.
                    I've got a 256MB Z3 card, a Z3 Spectrum 28/24 (nothing running on it at the moment, but hw/picaso96 sw installed, and an x-surf 100 with drivers not active.

                    Loading the AmiTCP v3 stack and mounting my 5.5TB Nas SMB share doesn't affect stability or the RSCP the numbers.

                    I need to get the Monster SCSI2SATA and the 80GB SSD connected up to see if I can get some higher numbers on the SCSI. I know it can do more, and I'd be curious as the speed goes up how it affects the cache routines.
                    A500(2x)/A1000/A1200/A2000(4x)/A3000D(2x)/A4000D/A4000T (all loaded....Toys...Toys...Toys...)
                    Former GVP Tech Support 1989-93 - The beatings will continue until morale improves...


                    • #11
                      ** 5TH NEWS UPDATE **

                      The new benchmark tool has now been released! The lamers who failed to provide compatibility feedback owe a BIG THANKS to the users who did. A very special Thanks to thebajaguy for providing feedback on multiple systems!

                      BTW, these benchmark results were easily predictable. It's a No-Brainer!


                      • #12
                        ** 6TH NEWS UPDATE **

                        v1.5 - Found an occasional Recoverable Alert bug which could
                        possibly result in a crash but only on 060 systems!
                        The simple fix was to move "CINVA NC" in PostDMA to the
                        end of the code.
                        - Removed the "+" character from the executable name due
                        to a unknown "Feature" of the Amiga Shell causing script
                        execution and version command problems.

                        EDIT: [CPU060 NOWRITEBUFFER] with the Phase5 46.7 68060.library seems to be a more reliable solution than the v1.5 update. Some more testing is required.
                        Last edited by SpeedGeek; 31 March 2018, 02:00.


                        • #13
                          ** 7TH NEWS UPDATE **

                          v1.6 - Added code to PostDMA to Flush the cache conditionally
                          (if the Store buffer and cache are enabled). Added NOPs
                          to sync the pipelines before RTE (CINVA is now obsolete)

                          68040 users can use v1.4 or v1.5 if they like since they will
                          be a little faster than v1.6 but 68060 users should use v1.6!
                          68060 users will now have a performance trade off to consider
                          in deciding whether to enable the store buffer.


                          • #14
                            ** 8TH NEWS UPDATE **

                            v1.6P5 Removed code to allow PostDMA cache Flush for the case of
                            16 byte aligned transfers. Added code to skip PostDMA
                            cache Flush for the case of cache disabled MEMF_24BIT

                            v1.6P5 is my last attempt solve compatibility problems with
                            the Phase5 68060.library and Store buffer enabled. This
                            library is unstable and buggy WITH or WITHOUT FastCache040+
                            so either disable the Store buffer or expect the problems to
                            continue with only a MINIMAL improvement provided by this

                            v1.7 - Removed all v1.6P5 PostDMA cache flush code so most users
                            (except Phase5 68060.library users) can run at full speed!

                            Phase5 68060.library users should use v1.6P5. All others users
                            can (probably) use v1.4, v1.5 or v1.7 without any problems.


                            • #15
                              ** 9TH NEWS UPDATE **

                              FastCache040+ v1.6P5 has been removed. Phase5 68060.library users should use FixMapP5 before using this patch.

                              FixMapP5 1.2 ©SpeedGeek 2018 (MMU Handler ©Michael Sinz 2001)

                              FixMapP5 is a tool to modify some of the default MMU mapping of
                              the Phase5 68040 and 68060 libraries. This can improve stability
                              and prevent crashing under the following condition:

                              - Hardware or software interrupts which occur during a Chip RAM
                              access by the 68060 (In particular when Store buffer is enabled).

                              Software bugs which allow illegal writes to the $F80000 Standard
                              Kickstart ROM can cause a debugging problem in Copyback mode so this patch corrects that problem as well.

                              - Changes Chip RAM mode to Precise (68060 only)
                              - Changes Standard ROM cache to Writethrough (68040 or 68060)
                              - Uses 68040/060 library detection code
                              - 100% Assembler code

                              - Amiga with 68040 or 68060 CPU and MMU
                              - Phase5 68040.library or 68060.library

                              This tool was developed ONLY for use with the Phase5 libraries but
                              it does NOT actually verify such usage. So it can and probably
                              will mess up the mapping of ANY other libraries!

                              Thanks to Michael Sinz for his freely distributable MMU handler.

                              v1.0 - First release
                              v1.1 - Added code to skip mapping $F00000 space (which included
                              $F80000 space) for CyberstormPPC, CyberstormMK3 and
                              v1.2 - Replaced FindName() with FindResident() since v1.1 wasn't working at all. Also, fixed a typo on module names.
                              Last edited by SpeedGeek; 28 April 2018, 00:19.


                              Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)