Preview
List of StrongARM instruction execution cycles! 2000/06/05
----------------------------------------------- mrh/icb
Heyho!
Welcome to the first *valid* list of StrongARM instruction execution cycles!
This list was entirely compiled using testing results - no information from
'official' ART or ARM announcements was used. And - gosh! - nearly all values
differ from the officially announced ones! In fact most instructions execute
slower than stated by ART/ARM. So it seems to me that these official value
shall boost the SA sales?! Well, I may be wrong...
tested and written by
_ _ ____ __ __ _ _
/ ^ \/ - _> / \_/__</ \/ \
<__x__>__\__> <__/__>__/__<__< of iCEBiRD
e-mail: bawa@thepentagon.com
instruction syntax type
ADC<cc><S> Rd,Rn,Op2............1
ADD<cc><S> Rd,Rn,Op2............1
AND<cc><S> Rd,Rn,Op2............1
B <cc> address.................6
BL <cc> address.................6
BIC<cc><S> Rd,Rn,Op2............1
CMN<cc><P> Rn,Op2...............1
CMP<cc><P> Rn,Op2...............1
EOR<cc><S> Rd,Rn,Op2............1
LDM<cc><mode> Rn<!>,{Rlist}<^>..4
LDR<cc><B|H|SB|SH> Rd,adr.......2
MLA<cc><S> Rd,Rm,Rs,Rn..........8
MOV<cc><S> Rd,Op2...............1
MUL<cc><S> Rd,Rm,Rs.............8
MRS<cc> Rd,<psr>................10
MSR<cc> <psr{f}>,Rm.............10
MVN<cc><S> Rd,Op2...............1
ORR<cc><S> Rd,Rn,Op2............1
RSB<cc><S> Rd,Rn,Op2............1
RSC<cc><S> Rd,Rn,Op2............1
SBC<cc><S> Rd,Rn,Op2............1
SMLA<cc><S> Rl,Rh,Rm,Rn.........11
SMUL<cc><S> Rl,Rh,Rm,Rn.........11
STM<cc><mode> Rn<!>,{Rlist}<^>..5
STR<cc><B|H> Rd,adr.............3
SUB<cc><S> Rd,Rn,Op2............1
SWI<cc> number..................7
SWP<cc> Rd,Rn,[Rn]..............9
TEQ<cc><P> Rn,Op2...............1
TST<cc><P> Rn,Op2...............1
UMLA<cc><S> Rl,Rh,Rm,Rn.........11
UMUL<cc><S> Rl,Rh,Rm,Rn.........11
legend
Rx : plain register (R0-R14, PC) without shift
Rd : destination register for operations
Rh : high word of 64 bit MUL result
Rl : low word of 64 bit MUL result
Rlist : registerlist of LDM/STM instructions
Rs,Rn : second factor-register in MUL instructions (-> MUL Rd,Rm,Rs)
Op2 : immediate constant, plain register or shifted register
- All execution cycles given are valid for cached instructions and data, only.
- All instructions with a 'false' condition code take 1 cycle.
type special cases examples SA cycles ARM250
--------------------------------------------------------------------------------
1 * s=1 if register controled shift, ADD R0,R0,R2,LSL #4 1+s 1+s
s=0 otherwise
* P condition used TEQP R0,#0 3+s 1
* Rd=PC, S condition used MOVS PC,R14 4+s 4+s
ADDS PC,PC,R4,LSL #2
* MOV PC,Rx (Rx=reg. without shift) MOV PC,R14 2+p 4
p=2: Rx changed in previous cycle
p=1: Rx constant since 1 cycle
p=0: Rx constant since >=2 cycles
* Rd=PC and is calculated by this MOV PC,R14,LSL #2 3+s 4
instruction in some way SUB PC,PC,#44
--------------------------------------------------------------------------------
2 * f=1 if Rd is needed in next instr. LDR R4,[R2,#32]! 1+f+e (cache) 4
f=0 otherwise
e=1 if LDRSB/LDRSH sign-extension
e=0 otherwise
* Rd=PC LDR PC,[R2,R4 LSL #2] 4 (cache) 7
--------------------------------------------------------------------------------
3 * - STR R4,[R3,R2,LSL #2] 1 (writebuffer)4
--------------------------------------------------------------------------------
4 * n=number of registers in Rlist LDMIA R0,{R0-R4} f+n (cache) 3+n
f=1 if last register loaded is
needed in next instruction
f=0 otherwise
* n=1 (only 1 register is loaded) LDMDB R0,{R4} 2 (cache) 4
* ^ condition for userbank register LDMIA R0,{R13}^ 2+n (cache) 3+n
load is used
* Rlist includes PC LDMFD R13!,{R10,PC}^ 3+n (cache) 6+n
--------------------------------------------------------------------------------
5 * n=number of registers in Rlist STMFD R13!,{R0-R1} n+u (writebuffer)
u=2 if ^ condition used for 3+n
storing user bank registers
u=0 otherwise
4
* n=1 (only 1 register is stored) STMDB R0,{R4} 2+u (writebuffer)
--------------------------------------------------------------------------------
6 * - BL &80AC 2 4
--------------------------------------------------------------------------------
7 * - SWI &42 ? 4
--------------------------------------------------------------------------------
8 * f=1 if Rd is needed in next instr. MLA R0,R1,R2,R0 x+f+s 1-17
(exception: see note #1)
or S condition is used
or next instruction is any multiplication
f=0 otherwise
x=1 if ABS(Rs) in range &00000000-&000007FF
x=2 if ABS(Rs) in range &00000800-&007FFFFF
x=3 if ABS(Rs) in range &00800000-&7FFFFFFF
s=2 if S condition used
s=0 otherwise
* Rd=PC MUL PC,R2,R2 4+x
--------------------------------------------------------------------------------
9 * SWP works, but does not use write- SWP R0,R1,[R3] ? (>100) 5
backbuffers, therefore it is
extremely slow. :(
--------------------------------------------------------------------------------
10 * - MRS R3,SPSR_all 1 -
--------------------------------------------------------------------------------
11 * f=1 if Rh is needed in next instr. UMUL R3,R4,R5,R6 1+x+f+s -
(exception: see note #1 and #2)
or S condition is used
f=0 otherwise
x=1 if ABS(Rn) in range &00000000-&000007FF
x=2 if ABS(Rn) in range &00000800-&007FFFFF
x=3 if ABS(Rn) in range &00800000-&7FFFFFFF
s=2 if S condition used
s=0 otherwise
--------------------------------------------------------------------------------
- All execution cycles given are valid for cached instructions, only.
- All instructions with a 'false' condition code take 1 cycle.
note #1:If next instruction is a type 1 instruction with register controlled
shift and Rd/Rh is not the shift-control register then f=0.
Some code for better understanding:
assuming ABS(R2) in range &0-&7FF -> fastest MUL execution
case 1:
MUL R0,R1,R2 ; 2 cycles (x=1,f=1,s=0)
MOV R2,R2,LSL R0 ; 2 cycles
case 2:
MUL R0,R1,R2 ; 1 cycle (x=1,f=0,s=0)
MOV R2,R0,LSL R2 ; 2 cycles
note #2:If next instruction is a 64 bit multiplication (SMUL,UMUL,SMLA,UMLA)
and Rh is involved as multiplier then f=0.
assuming ABS(R2) in range &0-&7FF -> fastest MUL execution
SMUL R0,R1,R2,R2 ; 2 cycles (x=1, f=0(!), s=0)
SMLA R0,R1,R2,R2