No announcement yet.

CopyMem Quick & Small released!

  • Filter
  • Time
  • Show
Clear All
new posts

  • CopyMem Quick & Small released!

    CopyMem Quick & Small v1.6
    Parts of patch install code by Dirk Busse 1999
    Enhanced patch code by SpeedGeek 2015

    CMQ&S is a much smaller Copymem + Copymemquick patch. It's
    main goal is to give a good speed increase without all the bloated
    code found in the other CMQ patches. It does not attempt to be the
    fastest CMQ patch, but at 92 bytes for the current version, it's
    almost certainly one of the fastest in it's class.

    - Installs one of the smallest CMQ patches for 68020+ Amigas
    - Safely exits if the patch is already installed (e.g. a good patch
    program should really avoid patching itself)

    - Amiga with 68020+

    Most other CMQ patches got bloated because their authors
    wanted their patch to be faster than any previously released
    patch. This means even bigger "unrolled" loops, redundant
    testing for small & medium size copies and extra code to
    handle misaligned transfers. Unfortunately, Testit from
    COPMQR28 makes a disproportional number of these
    "Worst Case" tests and other coders have fallen into the trap.
    Motorola optimized it's 68K CPUs to improve average case
    performance at the expense of worst case performance.
    Need I say any more?

    Here is the link for downloadable files:
    Last edited by SpeedGeek; 11 June 2021, 13:10.

  • #2
    Originally posted by SpeedGeek View Post
    In the early days of CMQ040+ patch coding a couple of "Software Guys"
    spread some bogus info about "Problems" using Move16 with "Burst"
    access to Chip memory. Unfortunately, some of the "Hardware Guys"
    who knew about the bogus info really didn't make a good effort to
    explain away this nonsense and so it seems I must do it now:
    Hi SpeedGeek, You got me interested. Can you please guide me to some sources where I can read upon the subject or some background information? Thank you!


    • #3
      Originally posted by EFTPOTRM View Post
      Hi SpeedGeek, You got me interested. Can you please guide me to some sources where I can read upon the subject or some background information? Thank you!
      Hi EFTPOTRM,

      The best source of information is the Motorola/Freescale 030, 040 and 060 manuals since these are the only CPUs which support Burst. But also you can look at various Amiga schematics and note the absence of any Burst control signals for Chip RAM.

      Since the 030 doesn't support Move16 it's probably more important to understand how the 040 and 060 handle non-burst cycles:

      5.4.6 Transfer Burst Inhibit (TBI)
      This input signal indicates to the processor that the accessed device cannot support burst mode accesses and that the requested line transfer should be divided into individual longword transfers. Asserting TBI with TA terminates the first data transfer of a line access, which causes the processor to terminate the burst and access the remaining data for the line as three successive long-word transfers. During alternate bus master accesses, the M68040 samples the TBI to detect completion of each bus transfer.
      Last edited by SpeedGeek; 29 December 2014, 14:19.


      • #4
        ** NEWS UPDATE **

        Deleted post.
        Last edited by SpeedGeek; 4 July 2020, 20:48.


        • #5
          ** 2ND NEWS UPDATE **

          CMQ&S v1.6 released
          v1.6 minor change
          - fixed install code which could (but seldom ever did) trash a few bytes
          of memory past the end of the patch
          Last edited by SpeedGeek; 4 July 2020, 20:47.


          • #6
            ** 3RD NEWS UPDATE **

            Deleted post.
            Last edited by SpeedGeek; 4 July 2020, 20:48.


            • #7
              Hi SpeedGeek

              I gave the v1.7 a quick go the other day with the 1024 block size, quick bench mark does show up quicker memory copy.

              Do you have a rough example of when say a bigger block size could benefit the system. Or is it just a case of say play a game and note the average frame rate and compare them that way to see which program benefits from different block sizes?

              Once I get my chip programmer I am going to try and do some in depth testing of different ways in speeding up the A4000.
              Such as 64mb on motherboard, your various mods for the A3640 such as the state mod, 040-060 adaptor and different software patches.
              As there doesn't seem to be a good guide anywhere and I know different computers might get different results etc but thought it would be handy to have some overall comparisons.


              • #8
                Hi KGC210,

                The 040's data cache is 4096 bytes and the 060's data cache is 8192 bytes. MoveL is faster than Move16 if it can write to the data cache. However, when the cache is full and must be flushed before any pending writes then Move16 is faster than MoveL. Move16 always writes to memory so it never causes or has to wait for a cache flush.

                I suggest starting with a Block Size of 50% of the data cache and increasing/decreasing it as necessary. There are trade offs between average system performance and application specific performance. (e.g. C2P code may benefit from the smallest block size which could reduce pipeline stall when the write to memory is slower than the read from memory but SCSI DMA drivers may benefit from the largest block size due to frequent cache flush activity.

                P.S. You may also want to compare your results against CMQ&B040. I suspect it will very soon become more popular than CMQ&S040 (at least until an good alternative to Testit is available).
                Last edited by SpeedGeek; 29 January 2015, 12:30.


                • #9
                  Thanks for the reply.

                  Don't know why but silly me thought the 1024, 2048 etc was KB oops! That's why I was a little puzzled about how such huge figures would make much difference on a typical Amiga.

                  I'm not fussed about Benchmark programs as I know they can be very misleading like when Nvidia and ATI (AMD) would make special bench mark driver just to bodge the figures.

                  I just use them as a general guide as to whether or not they changed anything.

                  When I start making proper comparisons I want to look at more real world situations such as a time demo on Quake to compare FPS, can you transfer a 50KB, 1MB, 10MB file quicker or not. Can the computer say decode or encode and Audio file quicker or not.


                  • #10
                    ** 4TH NEWS UPDATE **

                    CMQ&S040 has been removed and all related posts deleted. CMQ&S was originally developed as "Proof of Concept" code for low memory systems. Since 040+ systems typically have much more memory than 020-030 systems it's really not practical for such systems. CMQ&B040 should be more practical in this case.

                    Here is the link:



                    Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)