Info about P-code-外文翻译-看雪-安全社区|安全招聘|kanxue.com

Info about P-code

发表于: 2005-3-13 13:21 5620

Info about P-code

KuNgBiM

2005-3-13 13:21

5620

Info about P-code

P-code works by compiling an application into an intermediate code format that is much more compact than 80x86 machine code. At link time, a small engine is built into your application that processes the p-code into native machine code during run time. Although there is an associated reduction in performance due to the extra step of interpretation, some simple techniques can minimize this effect.

p-code can be used with the Microsoft Source Profiler and CodeView® debugger.

High degree of "tunability." P-code can be applied selectively throughout a program through the use of pragmas. Size-critical functions and modules can be compiled with p-code, while speed-critical functions and modules can be compiled with the normal optimizations. Experimenting with these tradeoffs allows programmers to achieve the right size and speed balance

Local Use of P-Code
Local use of p-code involves the selective placement of pragmas in the application's source code, indicating to the compiler which sections are to be compiled as p-code and which are to be compiled as native machine code.
Pragmas can be placed either at the module or function level. For example:

// An example of p-code pragmas
#pragma optimize("q", on)    //Compile the following function using p-code:
Func1()
{
// Code that can trade size for speed
}
#pragma optimize("q", off)    //Compile the following function with p-code
                           //turned off
Func2()
{
// Speed-critical code
}

The following general guidelines apply when using p-code pragmas:
Speed-critical functions should be compiled as native code. Also, routines that are called frequently (such as those appearing mostly within loops) should be compiled as native code even though the function code itself may not be CPU-intensive.

User-interface routines such as menu and dialog handlers can be compiled as p-code. The perceived difference in their execution speed is usually negligible.

Infrequently used routines should be compiled as p-code. These include routines such as error handling procedures and features of the application's functionality that are not used on a regular basis.
As with any other optimization, a profiler is the best tool for determining actual execution times. The most effective strategy is to do a complete function-level profile to provide the most accurate "map" of a program's execution speed. P-code can then be applied where appropriate.

P-Code Internals
How the Engine Works
If you use p-code in any part of your C/C++ 7.0 application, the linker automatically binds in a simple copy of the p-code run-time engine. This engine adds approximately 9K to the size of your executable file.
The p-code engine is a relatively simple machine that processes a series of "high-level" operation codes ("opcodes"). It is stack-based, meaning that it uses the system stack for virtually all its operations. In contrast, an actual microprocessor uses registers for most of its operations and uses the stack primarily to perform function call mechanics. All operands used by p-code instructions are stored on the stack.
P-code instructions are much more compact than assembly instructions because it is not always necessary to specify the source and/or destination addresses for each instruction. For example, to add two register-resident values using assembly language, you would use the following syntax:
ex = add ax, bx
The source registers AX and BX are added and placed in the destination EX. The same instruction in p-code is simply:
ex = AddW
Since each p-code instruction implicitly pops its operands off the stack and pushes its result back onto the stack, the single AddW opcode encapsulates both the locations of the operands (on the stack) as well as the stack mechanics. Fewer instructions are needed to represent the opcode because much of the process is performed by default. The AddW instruction above is actually equivalent to the following assembly language sequence:

pop cx       ; Pop first operand from the stack into cx
pop di       ; Pop second operand from stack into di
add di,cx    ; Add the two values, store result in di (by default)
push di    ; Push the result back onto the stack

In cases where a p-code instruction must modify the value of a variable, source and destination addresses are specified in the opcode because variables cannot be stored on the stack. However, most instructions use the stack for at least one of their arguments.

In addition to implicit stack mechanics, further size reductions are gained through the use of assumed values. Since many operations involve the same specific values over and over, several opcodes have been created to incorporate these values without using operands. For example, consider a statement using the Jump On Not Equal instruction:

EX = JneWb 05

This statement pops two words off the stack and compares them. If they are not equal, a jump of length 5 is performed. The "b" in the instruction indicates that a 1-byte operand is required. The following alternate form of this instruction assumes that a jump of length 5 is required, thus eliminating the additional space needed for the operand:

EX = JneW5

Opcode Format and Statistics
The p-code engine's use of implied addressing enables an opcode size that averages less than 2 bytes. Because of this, two sets of opcodes are defined: standard and extended.

The standard set consists of the 255 most commonly used opcodes. These opcodes are a single byte in length and can be used in combination of up to 4 bytes of data. The extended set consists of 256 opcodes that are used less frequently. The following table shows run-time statistics for the p-code opcode sizes in a sample program of 200,000 lines of C source (.c files, does not include .h files) compiled into all p-code.

Note that this program consisted almost exclusively of 1- and 2-byte opcodes, with 3- and 4-byte opcodes representing a very small percentage.
P-code Opcode Size (bytes)Number of Times UsedFrequency of Usage (%) % of Code Size1414,36955.537.02321,90839.252.0341,8384.69.148,782.711.8

Further Optimization Through Quoting
An important feature of p-code technology is quoting. Quoting enables the sharing of a single instance of a code sequence. Quoting is similar to using routines in a high-level language because it allows a single block of code to be used throughout the program without incurring added space. In order to implement this feature, the compiler examines the code that it generates, looking for places where a sequence of instructions is repeated. If it finds such repetitions, it replaces all but one of the occurrences with a jump instruction that directs the flow of execution to the beginning of the quoted block of code. Quoting provides approximately 5 to 10 percent additional compression in an executable. As with p-code in general, quoting can be controlled at either the global or program level. Otherwise there is no additional programmer involvement needed. Since quoting involves many jumps to labels, it can make compiled code difficult to read and debug. A good strategy is to turn quoting off during program development, and then turn it on once the program has been fully debugged.

Quoting differs from function calls in that there are no arguments and no return value. Only the path of execution is changed. To implement quoting, two instructions are used: QUOTE and EQUOTE. The QUOTE instruction takes a 1- or 2-byte offset as an argument. When a QUOTE is executed, it saves the address of the next instruction as a return address and performs a jump to the specified offset. Instead of pushing the return address onto the stack, it is stored in the PQ register. When an EQUOTE instruction is executed, it checks whether PQ contains an address. If not, EQUOTE does nothing; if it does, a jump is performed back to that address. This allows the quoted section of code to be executed in one of two ways: in sequence with the preceding and following code, or as a quote call.

Function calls within code blocks can be quoted, and even interprocedural quoting is supported. Nested quotes, however, are not supported.
In the following example, two lines of source code containing a common subexpression i+j+func() appear within the same module. During the first instance of the code, the quote is marked with a starting label of L1 and a standard end label of EQuote. When the code is used again in the second instance, p-code generates a call to the quote, rather then producing the entire sequence all over again:
Source codeP-codem = i+j+func();L1:LdfWiLdfWjAddWCallFCWfuncAddWEQuoteStfWmn = i + j + func();QuoteL1StfWnThis technique is similar to the frequently used common subexpression reduction method employed by many optimizing compilers. In this case, the common subexpression i+j+func() is converted to a reusable "routine."

Native Entry Points
If pragmas are used to turn p-code on and off throughout a program, it is possible that a machine code function will call a p-code function. When this happens, the program must stop executing machine code and turn control over to the p-code engine, which can then execute the p-code.
For this to occur, a p-code function normally contains a native entry point at its beginning. The native entry point consists of several machine code instructions that transfer control to the p-code engine. For each function that is specified as p-code, the compiler automatically generates the necessary entry sequence. There is a small overhead of about 6 bytes associated with the entry sequence. You can instruct the compiler to suppress generation of these entry sequences if you are sure that no machine code function will call a p-code function. This suppression can be done globally through the Gn switch or locally by placing the #pragma native_caller(off) statement before the p-code function for which you want to remove the call sequence.

Debugging P-Code
Applications compiled with p-code can be debugged using the Microsoft CodeView debugger in virtually the same manner as an application compiled without p-code. Both source-level and assembly-level debugging of p-code applications are supported. In assembly mode, p-code undergoes a special disassembly process that shows p-code instructions rather than native assembly instructions.
When a p-code program halts at a breakpoint, the register window changes to show the stack and engine state. All normal CodeView debugger commands, such as break, step, watch, and others, work identically for both p-code and non-p-code applications.
The Options/Native menu item in the CodeView debugger allows you to disable p-code support, and to work only with machine code. If you choose to debug with p-code on, you can even single-step through the p-code engine itself.

Summary
P-code technology provides programmers who develop for both MS-DOS and Windows operating systems a new way to shrink the size of their executables by an average of 40 percent. Although there is a performance trade-off, this can be minimized through the effective use of p-code on "idle-time" routines such as those that handle the user interface. Microsoft uses this technique in many of its own applications.
A major strength of this code compression technology is its high payoff and low investment of time on the part of the programmer. P-code can be implemented globally throughout an application simply by recompiling. Placement of pragmas before strategic routines ensures that code compression is maximized while performance loss is minimized.

In addition to the IDE changes, Visual Basic now provides a native code compiler. Earlier versions of Visual Basic compiled applications into p-code, which required a runtime interpreter that reduced application performance. The native code compiler lets you compile your apps down to Intel-native instructions. You can even optimize for specific processors, including the Pentium and Pentium Pro. Whether you compile your app to native code or p-code, you still need to ship the Visual Basic runtime files, but native code applications will run much faster. (Even though compiling to native code generates machine code, the machine code still needs to make some calls to helper routines that live within the Visual Basic runtime.)

P-Code Versus Native Code
When you write a line of code in the IDE, Visual Basic breaks it down into expressions and encodes the expressions into a preliminary format called op-codes. In other words, each line is partially precompiled as it is written. Some lines contain shared information that cannot be precompiled independently (mainly Dim statements and procedure definitions). This is why you have to restart if you change certain lines in break mode. The opcodes are compiled into p-code instructions when you compile (in the background if you have the Compile On Demand and Background Compile options set).
At run time, the p-code interpreter works through the program, decoding and executing p-code instructions. These p-code instructions are smaller than equivalent native code instructions, thus dramatically reducing the size of the executable program. But the system must load the p-code interpreter into memory in addition to the code, and it must decode each instruction.
It’s a different story with native code. You start with the same opcodes, but instead of translating to p-code instructions, the compiler translates to native instructions. Because you’re not going to be expecting an instant response while stepping through native code instructions in the IDE, the compiler can look at code from a greater distance; it can analyze blocks of code and find ways to eliminate inefficiency and duplication. The compiler philosophy is that, since you compile only once, you can take as long as you want to analyze as much code as necessary to generate the best results possible.
These two approaches create a disjunction. How can you guarantee that such different ways of analyzing code will generate the same results? Well, you can’t. In fact, if you look at the Advanced Optimizations dialog box (available from the Compile tab of the Project Properties dialog box) you’ll see a warning:
“Enabling the following optimizations might prevent correct execution of your program.” This might sound like an admission of failure, but welcome to the real world of compilers. Users of other compiled languages understand that optimization is a bonus. If it works, great. If not, turn it off.
On the other hand, very few developers are going to be used to the idea of working in an interpreter during development but releasing compiled code. Most compilers have a debug mode for fast compiles and a release mode for fast code. Visual Basic doesn’t worry about fast compiles because it has a no-compile mode that is faster than the fastest compiler. You get the best of both worlds, but it’s going to take a little while for people to really trust the compiler to generate code that they can’t easily see and debug.
NOTE Even with a compiler, Basic code might be slower than the compiled code of some other languages. That’s because Basic always does run-time error checking. You can’t expect a language that validates every statement to offer the same performance as a language that leaves you at the mercy of your own error checking. Of course, if you were to write a C program that does all the run-time error checking Basic does, you not only would pay the same performance penalty but also would have to write the error handlers.

Examining Code

Before we get to my recommended way of testing performance (writing performance tests), let’s examine ways to look at your code and see exactly what goes on under the hood. It turns out that this is a little different depending on whether you use p-code or native code.

If you’re the adventurous type who isn’t afraid of disassembled machine code, you can examine Basic p-code. The key to breaking into Visual Basic code is the DebugBreak API routine. (It’s in the Windows API type library described
in Chapter 2.) In case you’re curious, its assembly language implementation looks like this:

DebugBreak PROC
int 3
ret
DebugBreak ENDP

The INT 3 instruction signals any active debugger to break out of execution. That’s how debuggers work-by temporarily putting an INT 3 wherever they want a breakpoint.

Put a DebugBreak statement in your Basic source just before the line you want to examine. You should start with something simple but recognizable:

DebugBreak
i = &HABCD

Now run your program in the Visual Basic environment. When you hit the breakpoint, an application error box will appear, telling you that you’ve hit a breakpoint. It gives you a choice of terminating Visual Basic or debugging the application. If you’re running under Windows 95, click Debug; if you’re running under Windows NT, click Cancel. You’ll pop up in your system debugger. (Mine is Microsoft Visual C++, but yours might be different.) You’ll see the INT 3 instruction, followed by a RET instruction. Step through them. When you step past RET, you’ll be in the code that calls API functions like DebugBreak. If you keep stepping through a lot of confusing code to get to the next Basic statement, eventually you’ll find yourself in the strange world of p-code.

Let’s just say that this code doesn’t look like any C or assembler code I’ve disassembled before. If you want to know more, there are articles describing the p-code concept in MSDN and in the Visual C++ manuals. The main point is how many instructions it takes to do a simple task. In disassembled C code, the example statement would translate into something like this:
mov WORD PTR i, 0ABCDh

It is sobering to see how many instructions it takes to do the same thing in
p-code.

The story is different for compiled code. All you have to do is compile with debugging information. Choose the Create Symbolic Debug Info check box and the No Optimization radio button on the Compile tab of the Project Properties dialog box. (If you don’t turn off optimizations, the compiler might rearrange code in ways that are difficult to trace through.) After compiling, you can load the executable file into a source debugger. I use the Microsoft Developer Studio (MSDEV.EXE), but any debugger that understands Microsoft’s debug format will work. For example, load from an MS-DOS session with this command line:
start msdev timeit.exe

You’ll pop up in the assembly language decode window at some undetermined point, but it is possible to debug with source code. Use the File Open command to load the startup module (or any module you’re interested in). Put a breakpoint somewhere in the code, and press F5 or the Go button. You’ll stop at your breakpoint, and you can start tracing. You might find a few surprises. The Microsoft Developer Studio isn’t designed for debugging Visual Basic code and doesn’t have a Basic expression evaluator. You might not be able to evaluate expressions the way you expect in the Watch window. If these limits seem annoying, try debugging p-code again; you’ll appreciate the native code.

These prefix abbreviations are use for control names
------------------------------------------------------------------------------------
Prefix                                                                Control
------------------------------------------------------------------------------------
cbo                                                                Combo box
chk                                                                Check box
cmd                                                             Command Button
dir                                                                Directory box
drv                                                                Drive list box
fil                                                                   File list box
fra                                                                   Frame
frm                                                                Form
grd                                                                Grid
hsb                                                                Horizontal scrollbar
img                                                                Image
lbl                                                                   Label
lin                                                                   Line
lst                                                                   List box
mnu                                                                Menu
ole                                                                OLE client
opt                                                                Option button
pic                                                                Picture Box
shp                                                                Shape
tmr                                                                Timer
txt                                                                   Text box
vsb                                                                Vertical scrollbar
-----------------------------------------------------------------------------------------

附件:vbpcode.zip

[培训]内核驱动高级班，冲击BAT一流互联网大厂工作，每周日13:00-18:00直播授课

收藏・2

免费・0

支持