PHP操作码与实际执行的二进制代码之间的关系如何


how does PHP opcode relate to the actually executed binary code?

test.php为纯文本:

<?php
$x = "a";
echo $x;

test.php作为操作码:

debian:~ php -d vld.active=1 -d vld.execute=0 -f test.php
Finding entry points
Branch analysis from position: 0
Return found
filename:       /root/test.php
function name:  (null)
number of ops:  5
compiled vars:  !0 = $x
line     # *  op                           fetch          ext  return  operands
---------------------------------------------------------------------------------
   2     0  >   EXT_STMT
         1      ASSIGN                                                   !0, 'a'
   3     2      EXT_STMT
         3      ECHO                                                     !0
   4     4    > RETURN                                                   1
branch: #  0; line:     2-    4; sop:     0; eop:     4
path #1: 0,

test.php作为二进制表示:

debian:~ php -d apc.stat=0 -r "
  require '/root/test.php'; 
  echo PHP_EOL; 
  echo chunk_split(bin2hex(
    apc_bin_dump(array('/root/test.php'))
  ),64);
"

(跳过test.php的echo输出(

    b110000001000000325dedaa64d801bca2f73027abf0d5ab67f3023901000000
    2c0000000a000000871000000300000000000000000000004c0000005b000000
    8a0200008a020000650000002f726f6f742f746573742e7068700002070f9c00
    00000000000000000000000000000000000000000000000000000000000100fa
    000000fe00000005000000050000007c02000001000000100000000100000000
    00000000000000ffffffff0000000000000000000000000000000000000000ff
    ffffffeb00000000000000000000000000000000000000ffffffff0000000000
    00000001000000000000002f726f6f742f746573742e7068700001000000204a
    3308080000000000000000000000000000000000000008000000000000000000
    0000000000000000000008000000000000000000000000000000000000000000
    00000200000065000000204a3308040000000000000001000000000000000000
    00001000000000000000100000000100000006000000010000007a0200000100
    00000100000006000000000000000200000026000000204a3308080000000000
    0000000000000000000000000000080000000000000000000000000000000000
    0000080000000000000000000000000000000000000000000000030000006500
    0000900f34080800000000000000000000000000000000000000100000000000
    0000100000000100000006000000080000000000000000000000000000000000
    0000000000000300000028000000204a33080800000000000000000000000000
    00000000000001000000010000002c70d7b6010000000100d7b6080000000000
    000000000000000000000000000000000000040000003e000000610088020000
    01000000bd795900780000000000000000000000000000000000000000000000
[ ... a lot of lines just containing 0s ... ]
    0000000000000038000000c30000007f0000007a010000830000007c0200008f
    0000003c000000400000004400000008

现在我想了解更多关于操作码如何转换为二进制表示的信息。

经过编辑和澄清的问题:

操作码是如何翻译成二进制版本的?你能看到"a"的ASSIGN吗!0?ECHO的声明和它输出的内容在哪里?

我在二进制版本中发现很少有模式提示操作码的逐行表示。

("2f726f6f742f746573742e706870"是"/root/test.php"的十六进制表示(

编辑

当线路长度设置为4字节并在不同程序之间进行比较时,十六进制表示揭示了模式。

...
00000002  // 2 seems to be something like the "line number"
00000065  // seems to increase by 1 for every subsequent statement.
00000040  // 
06330808  // seems to mark the START of a statement
00000000
00000000
00000000
00000000
00000001  //
00000012  // In a program with three echo statements,
03000007  // this block was present three times. With mild
00000001  // changes that seem to represent the spot where
00000006  // the output-string is located.
00000008  //
00000000
00000000
00000000
00000000
00000000
00000002  // 2 seems to be something like the "line number"
00000028  //
00000020  //
4a330808  // seems to mark the END of a statement
00000000
00000000
00000000
00000000
00000008  // repeating between (echo-)statements
00000000
00000000
00000000
00000000
00000008  // repeating between (echo-)statements
...

但我对虚拟机如何在这样一个级别上工作的了解太弱了,无法真正准确地分析并将其与C代码联系起来。

编辑

PHP有像Java那样的虚拟机吗?

Zend引擎是否可以在PHP之外嵌入?

好问题。。。

UPDATE:操作码由PHP虚拟机(Zend引擎(直接执行。看起来它们是由中定义的不同处理程序函数执行的/Zend/Zend_vm_execute.h

有关如何执行Zend操作码的更多信息,请参阅Zend引擎的体系结构。

这些资源可能会有所帮助:

http://php.net/manual/en/internals2.opcodes.list.php

http://www.php.net/manual/en/internals2.opcodes.ops.php

此外,我将检查PECL VLD源以获取更多线索。。。

http://pecl.php.net/package/vld

http://derickrethans.nl/projects.html#vld

此外,编写VLD-Pecl扩展的作者可能有助于:Derick Rethans、Andrei Zmievski或Marcus Börger

他们的电子邮件地址位于扩展源中srm_oparray.c的顶部。

更新:发现更多线索

在PHP 5.3.8中,我发现了操作码执行位置的三条线索:

./Zend/zend_execute.c:1270 
ZEND_API void execute_internal
./Zend/zend.c:1214:ZEND_API int zend_execute_scripts(int type TSRMLS_DC, zval **retval, int file_count, ...)
./Zend/zend.c:1236:                  zend_execute(EG(active_op_array) TSRMLS_CC);
./Zend/zend_vm_gen.php

我找不到zend_execute((的定义,但我猜它可能是用生成的/zend_vm_gen.php

我想我找到了…

./Zend/zend_vm_execute.h:42
ZEND_API void execute(zend_op_array *op_array TSRMLS_DC)

我可能错了,但看起来所有的操作码处理程序都是在中定义的/Zend/Zend_vm_execute.h。

请参阅/Zend/Zend_vm_execute.h:2413,以获取看起来是"整数加法"操作码的示例。

apc_bin_dump((返回内存中缓存项的原始表示。

它返回一个apc_bd_t结构的内容。

此结构是apc_bd_entry_t的数组,其中包含一些用于错误检测的校验和。

apc_bd_entry_t包含一个apc_cache_entry_value_t。

您可以查看apc_bin_dump和apc_bin_load内部函数,了解转储和加载是如何进行的。