scene.org File Archive

File download

<root>­/­mirrors­/­icebird/sainscyc.txt

File size:
7 619 bytes (7.44K)
File date:
2019-03-03 03:32:11
Download count:
all-time: 360

Preview

List of StrongARM instruction execution cycles!                       2000/06/05
-----------------------------------------------                          mrh/icb

Heyho!

Welcome to the first *valid* list of StrongARM instruction execution cycles!
This list was entirely compiled using testing results - no information from
'official' ART or ARM announcements was used. And - gosh! - nearly all values
differ from the officially announced ones! In fact most instructions execute
slower than stated by ART/ARM. So it seems to me that these official value
shall boost the SA sales?! Well, I may be wrong...

                              tested and written by
                                  _ _  ____     __   __  _  _
                                 / ^ \/ - _>   /  \_/__</ \/ \
                                <__x__>__\__> <__/__>__/__<__< of iCEBiRD

                                         e-mail: bawa@thepentagon.com


  instruction syntax              type
  ADC<cc><S> Rd,Rn,Op2............1
  ADD<cc><S> Rd,Rn,Op2............1
  AND<cc><S> Rd,Rn,Op2............1
  B  <cc> address.................6
  BL <cc> address.................6
  BIC<cc><S> Rd,Rn,Op2............1
  CMN<cc><P> Rn,Op2...............1
  CMP<cc><P> Rn,Op2...............1
  EOR<cc><S> Rd,Rn,Op2............1
  LDM<cc><mode> Rn<!>,{Rlist}<^>..4
  LDR<cc><B|H|SB|SH> Rd,adr.......2
  MLA<cc><S> Rd,Rm,Rs,Rn..........8
  MOV<cc><S> Rd,Op2...............1
  MUL<cc><S> Rd,Rm,Rs.............8
  MRS<cc> Rd,<psr>................10
  MSR<cc> <psr{f}>,Rm.............10
  MVN<cc><S> Rd,Op2...............1
  ORR<cc><S> Rd,Rn,Op2............1
  RSB<cc><S> Rd,Rn,Op2............1
  RSC<cc><S> Rd,Rn,Op2............1
  SBC<cc><S> Rd,Rn,Op2............1
  SMLA<cc><S> Rl,Rh,Rm,Rn.........11
  SMUL<cc><S> Rl,Rh,Rm,Rn.........11
  STM<cc><mode> Rn<!>,{Rlist}<^>..5
  STR<cc><B|H> Rd,adr.............3
  SUB<cc><S> Rd,Rn,Op2............1
  SWI<cc> number..................7
  SWP<cc> Rd,Rn,[Rn]..............9
  TEQ<cc><P> Rn,Op2...............1
  TST<cc><P> Rn,Op2...............1
  UMLA<cc><S> Rl,Rh,Rm,Rn.........11
  UMUL<cc><S> Rl,Rh,Rm,Rn.........11

legend

 Rx    : plain register (R0-R14, PC) without shift
 Rd    : destination register for operations
 Rh    : high word of 64 bit MUL result
 Rl    : low word of 64 bit MUL result
 Rlist : registerlist of LDM/STM instructions
 Rs,Rn : second factor-register in MUL instructions (-> MUL Rd,Rm,Rs)
 Op2   : immediate constant, plain register or shifted register


 - All execution cycles given are valid for cached instructions and data, only.
 - All instructions with a 'false' condition code take 1 cycle.

type special cases                      examples              SA cycles   ARM250
--------------------------------------------------------------------------------
1  * s=1 if register controled shift,   ADD R0,R0,R2,LSL #4   1+s            1+s
     s=0 otherwise

   * P condition used                   TEQP R0,#0            3+s            1

   * Rd=PC, S condition used            MOVS PC,R14           4+s            4+s
                                        ADDS PC,PC,R4,LSL #2

   * MOV PC,Rx (Rx=reg. without shift)  MOV PC,R14            2+p            4
     p=2: Rx changed in previous cycle
     p=1: Rx constant since 1 cycle
     p=0: Rx constant since >=2 cycles

   * Rd=PC and is calculated by this    MOV PC,R14,LSL #2     3+s            4
     instruction in some way            SUB PC,PC,#44
--------------------------------------------------------------------------------
2  * f=1 if Rd is needed in next instr. LDR R4,[R2,#32]!      1+f+e (cache)  4
     f=0 otherwise
     e=1 if LDRSB/LDRSH sign-extension
     e=0 otherwise

   * Rd=PC                              LDR PC,[R2,R4 LSL #2] 4     (cache)  7
--------------------------------------------------------------------------------
3  * -                                  STR R4,[R3,R2,LSL #2] 1 (writebuffer)4
--------------------------------------------------------------------------------
4  * n=number of registers in Rlist     LDMIA R0,{R0-R4}      f+n   (cache)  3+n
     f=1 if last register loaded is
         needed in next instruction
     f=0 otherwise

   * n=1 (only 1 register is loaded)    LDMDB R0,{R4}         2     (cache)  4

   * ^ condition for userbank register  LDMIA R0,{R13}^       2+n   (cache)  3+n
     load is used

   * Rlist includes PC                  LDMFD R13!,{R10,PC}^  3+n   (cache)  6+n
--------------------------------------------------------------------------------
5  * n=number of registers in Rlist     STMFD R13!,{R0-R1}    n+u  (writebuffer)
     u=2 if ^ condition used for                                             3+n
         storing user bank registers
     u=0 otherwise
                                                                             4
   * n=1 (only 1 register is stored)    STMDB R0,{R4}         2+u  (writebuffer)
--------------------------------------------------------------------------------
6  * -                                  BL &80AC              2              4
--------------------------------------------------------------------------------
7  * -                                  SWI &42               ?              4
--------------------------------------------------------------------------------
8  * f=1 if Rd is needed in next instr. MLA R0,R1,R2,R0       x+f+s         1-17
            (exception: see note #1)
         or S condition is used
         or next instruction is any multiplication
     f=0 otherwise
     x=1 if ABS(Rs) in range &00000000-&000007FF
     x=2 if ABS(Rs) in range &00000800-&007FFFFF
     x=3 if ABS(Rs) in range &00800000-&7FFFFFFF
     s=2 if S condition used
     s=0 otherwise

   * Rd=PC                              MUL PC,R2,R2          4+x
--------------------------------------------------------------------------------
9  * SWP works, but does not use write- SWP R0,R1,[R3]        ? (>100)       5
     backbuffers, therefore it is
     extremely slow. :(
--------------------------------------------------------------------------------
10 * -                                  MRS R3,SPSR_all       1              -
--------------------------------------------------------------------------------
11 * f=1 if Rh is needed in next instr. UMUL R3,R4,R5,R6      1+x+f+s        -
            (exception: see note #1 and #2)
         or S condition is used
     f=0 otherwise
     x=1 if ABS(Rn) in range &00000000-&000007FF
     x=2 if ABS(Rn) in range &00000800-&007FFFFF
     x=3 if ABS(Rn) in range &00800000-&7FFFFFFF
     s=2 if S condition used
     s=0 otherwise
--------------------------------------------------------------------------------

 - All execution cycles given are valid for cached instructions, only.
 - All instructions with a 'false' condition code take 1 cycle.


note #1:If next instruction is a type 1 instruction with register controlled
        shift and Rd/Rh is not the shift-control register then f=0.
        Some code for better understanding:

        assuming ABS(R2) in range &0-&7FF -> fastest MUL execution

        case 1:
          MUL   R0,R1,R2        ; 2 cycles (x=1,f=1,s=0)
          MOV   R2,R2,LSL R0    ; 2 cycles

        case 2:
          MUL   R0,R1,R2        ; 1 cycle  (x=1,f=0,s=0)
          MOV   R2,R0,LSL R2    ; 2 cycles

note #2:If next instruction is a 64 bit multiplication (SMUL,UMUL,SMLA,UMLA)
        and Rh is involved as multiplier then f=0.
        
        assuming ABS(R2) in range &0-&7FF -> fastest MUL execution

          SMUL R0,R1,R2,R2      ; 2 cycles (x=1, f=0(!), s=0)
          SMLA R0,R1,R2,R2