在HTML中找到一个模式并用PHP代码替换它


find a pattern in html and replace it with php code

我正在寻找这个模式

<!-- Footer part at bottom of page-->
<div id="footer">
   <div class="row col-md-2 col-md-offset-5">
    <p class="text-muted">&copy; 2014. Core Team</p>
  </div>
    <div id="downloadlinks">
    <!-- downloadlinks go here-->
    </div>
</div>

,并将其替换为以下模式用于许多.html文件

<!-- Footer part at bottom of page-->
<div id="footer">
    <div class="row col-md-2 col-md-offset-5">
       <?php
            $year = date("Y");
            echo "<p class='text-muted'>© $year. Core Team</p>";
        ?>
    </div>
    <div id="downloadlinks">
    <!-- downloadlinks go here-->
    </div>
</div>

注意区别在于这个

<p class="text-muted">&copy; 2014. Core Team</p>

取代
       <?php
            $year = date("Y");
            echo "<p class='text-muted'>© $year. Core Team</p>";
        ?>

我想用sed来做,但有了最初的尝试,我的困难是我可能或可能或可能不需要转义的字符。还有制表符或新行在php代码,我希望它出现在这里。

有很多文件要做,所以我想自动化它,但它可能更快,只是手工做(复制和粘贴)。但在这种情况下,sed可能是错误的方法。有人能告诉我正确的方向吗?在这个阶段,我对其他语言(例如php, python, bash)开放,以找到解决方案。

然后我计划用以下命令将每个。html文件重命名为。php:

for i in *.html; do mv "$i" "${i%.*}.php"; done;

EDIT1

根据下面的awk答案,我可以让它在这个版本下工作

$ awk -Wversion 2>/dev/null || awk --version
GNU Awk 4.1.1, API: 1.1 (GNU MPFR 3.1.2, GNU MP 6.0.0)
Copyright (C) 1989, 1991-2014 Free Software Foundation.

但是在这个版本上我得到了不同的输出。它似乎打印出3个文件,旧的,新的和文件。这个版本容易修改吗?

root@4461f768e343:/github/find_pattern# awk -Wversion 2>/dev/null || awk --version
mawk 1.3.3 Nov 1996, Copyright (C) Michael D. Brennan
root@4461f768e343:/github/find_pattern#
root@4461f768e343:/github/find_pattern#
root@4461f768e343:/github/find_pattern# awk -v RS='^$' -v ORS= 'ARGIND==1{old=$0;next} ARGIND==2{new=$0;next} s=index($0,old){ $0 = substr($0,1,s-1) new substr($0,s+length(old))} 1' old new file
<!-- Footer part at bottom of page-->
<div id="footer">
   <div class="row col-md-2 col-md-offset-5">
    <p class="text-muted">&copy; 2014. Core Team</p>
  </div>
    <div id="downloadlinks">
    <!-- downloadlinks go here-->
    </div>
</div><!-- Footer part at bottom of page-->
<div id="footer">
    <div class="row col-md-2 col-md-offset-5">
       <?php
            $year = date("Y");
            echo "<p class='text-muted'>© $year. Core Team</p>";
        ?>
    </div>
    <div id="downloadlinks">
    <!-- downloadlinks go here-->
    </div>
</div>some pile of text
or other
<!-- Footer part at bottom of page-->
<div id="footer">
   <div class="row col-md-2 col-md-offset-5">
    <p class="text-muted">&copy; 2014. Core Team</p>
  </div>
    <div id="downloadlinks">
    <!-- downloadlinks go here-->
    </div>
</div>
and more maybe.root@4461f768e343:/github/find_pattern#

您可以使用replace

html_files = ['a.html', ...]
copyright = '<p class="text-muted">&copy; 2014. Core Team</p>'
new_copyright = """       <?php
        $year = date("Y");
        echo "<p class='text-muted'>© $year. Core Team</p>";
    ?>"""
for html_file_path in html_files:
    with open(html_file_path) as html_file:
        html = html_file.read()
    if copyright in html:
        php_file_path = html_file_path.replace('.html', '.php')
        with open(php_file_path, "w") as php_file:
            php = html.replace(copyright, new_copyright)
            php_file.write(php)

注意这不会覆盖你的html文件,这是有用的,如果脚本有一个错误。

sed用于在单独的行上进行简单的替换,因此您的任务肯定不是sed的任务。如果文件的格式都很好,可以使用awk:

$ cat old
<!-- Footer part at bottom of page-->
<div id="footer">
   <div class="row col-md-2 col-md-offset-5">
    <p class="text-muted">&copy; 2014. Core Team</p>
  </div>
    <div id="downloadlinks">
    <!-- downloadlinks go here-->
    </div>
</div>

.

$ cat new
<!-- Footer part at bottom of page-->
<div id="footer">
    <div class="row col-md-2 col-md-offset-5">
       <?php
            $year = date("Y");
            echo "<p class='text-muted'>© $year. Core Team</p>";
        ?>
    </div>
    <div id="downloadlinks">
    <!-- downloadlinks go here-->
    </div>
</div>

.

$ cat file
some pile of text
or other
<!-- Footer part at bottom of page-->
<div id="footer">
   <div class="row col-md-2 col-md-offset-5">
    <p class="text-muted">&copy; 2014. Core Team</p>
  </div>
    <div id="downloadlinks">
    <!-- downloadlinks go here-->
    </div>
</div>
and more maybe.

.

$ awk -v RS='^$' -v ORS= 'ARGIND==1{old=$0;next} ARGIND==2{new=$0;next} s=index($0,old){ $0 = substr($0,1,s-1) new substr($0,s+length(old))} 1' old new file
some pile of text
or other
<!-- Footer part at bottom of page-->
<div id="footer">
    <div class="row col-md-2 col-md-offset-5">
       <?php
            $year = date("Y");
            echo "<p class='text-muted'>© $year. Core Team</p>";
        ?>
    </div>
    <div id="downloadlinks">
    <!-- downloadlinks go here-->
    </div>
</div>
and more maybe.

上面的代码使用GNU awk处理多字符RS和ARGIND。如果您想为许多文件执行此操作,您可以使用:

find . -type f -name '*.php' -exec awk -i inplace -v RS='^$' -v ORS= 'ARGIND==1{old=$0;print;next} ARGIND==2{new=$0;print;next} s=index($0,old){ $0 = substr($0,1,s-1) new substr($0,s+length(old))} 1' old new {} ';

或类似.