- Mastering Delphi Programming:A Complete Reference Guide
- Primo? Gabrijel?i?
- 590字
- 2021-06-24 12:33:37
Going the assembler way
Sometimes, when you definitely have to squeeze everything from the code, there is only one solution—rewrite it in assembler. My response to any such idea is always the same—don't do it! Rewriting code in an assembler is almost always much more trouble than it is worth.
I do admit that there are legitimate reasons for writing assembler code. I looked around and quickly found five areas where an assembler is still significantly present. They are memory managers, graphical code, cryptography routines (encryption, hashing), compression, and interfacing with hardware.
Even in these areas, situations change quickly. I tested some small assembler routines from the graphical library, GraphicEx, and was quite surprised to find out that they are not significantly faster than the equivalent Delphi code.
The biggest gain that you'll get from using an assembler is when you want to process a large buffer of data (such as a bitmap) and then do the same operation on all elements. In such cases, you can maybe use the SSE2 instructions which run circles around the slow 386 instruction set that Delphi compiler uses.
As assembler is not my game, (I can read it but I can't write good optimized assembler code), my example is extremely simple. The code in the demo program, AsmCode implements a four-dimensional vector (a record with four floating-point fields) and a method that multiplies two such fields:
type
TVec4 = packed record
X, Y, Z, W: Single;
end;
function Multiply_PAS(const A, B: TVec4): TVec4;
begin
Result.X := A.X * B.X;
Result.Y := A.Y * B.Y;
Result.Z := A.Z * B.Z;
Result.W := A.W * B.W;
end;
As it turns out, this is exactly an operation that can be implemented using SSE2 instructions. In the code shown next, first movups moves vector A into register xmm0. Next, movups does the same for the other vector. Then, the magical instruction mulps multiplies four single-precision values in register xmm0 with four single-precision values in register xmm1. At the end, movups is used to copy the result of the multiplication into the function result:
function Multiply_ASM(const A, B: TVec4): TVec4;
asm
movups xmm0, [A]
movups xmm1, [B]
mulps xmm0, xmm1
movups [Result], xmm0
end;
Running the test shows a clear winner. While Multiply_PAS needs 53 ms to multiply 10 million vectors, Multiply_ASM does that in half the time—24 ms.
As you can see in the previous example, assembler instructions are introduced with the asm statement and ended with end. In the Win32 compiler, you can mix Pascal and assembler code inside one method. This is not allowed with the Win64 compiler. In 64-bit mode, a method can only be written in pure Pascal or in pure assembler.
The asm instruction is only supported by Windows and OS/X compilers. In older sources, you'll also find an assembler instruction which is only supported for backwards compatibility and does nothing.
I'll end this short excursion into the assembler world with some advice. Whenever you are implementing a part of your program in assembler, please also create a Pascal version. The best practice is to use a conditional symbol, PUREPASCAL as a switch. With this approach, we could rewrite the multiplication code as follows:
function Multiply(const A, B: TVec4): TVec4;
{$IFDEF PUREPASCAL}
begin
Result.X := A.X * B.X;
Result.Y := A.Y * B.Y;
Result.Z := A.Z * B.Z;
Result.W := A.W * B.W;
end;
{$ELSE}
asm
movups xmm0, [A]
movups xmm1, [B]
mulps xmm0, xmm1
movups [Result], xmm0
end;
{$ENDIF}
- 電腦組裝、維護、維修全能一本通(全彩版)
- 平衡掌控者:游戲數值經濟設計
- 數字邏輯(第3版)
- 筆記本電腦維修不是事兒(第2版)
- 微軟互聯網信息服務(IIS)最佳實踐 (微軟技術開發者叢書)
- R Deep Learning Essentials
- Machine Learning with Go Quick Start Guide
- 面向對象分析與設計(第3版)(修訂版)
- Hands-On Artificial Intelligence for Banking
- 單片機技術及應用
- FreeSWITCH Cookbook
- 微服務實戰(Dubbox +Spring Boot+Docker)
- 嵌入式系統設計大學教程(第2版)
- 基于S5PV210處理器的嵌入式開發完全攻略
- 詳解FPGA:人工智能時代的驅動引擎