如何在使用 PHP 抓取网站时与页面元素进行交互


How to interact with page elements while crawling a website with PHP?

我需要转到 http://butlercountyclerk.org/bcc-11112005/ForeclosureSearch.aspx,在字段中输入数据,然后单击按钮以获取结果。当进入结果页面时,我得到了一个数据表,但它被分页为 5 个不同的页面。

我可以使用 cURL 完成上述操作,但正是在这一点上我卡住了。

进入结果页面后,我需要单击"date"标题两次以通过减少日期进行数据排序,然后略读当天的结果。

知道如何做到这一点,高级细节或概念吗?无论哪种方式都应该有所帮助。

谢谢!

问题是点击实际上是使用 javascript 执行回发,由于 PHP 和 cURL 的限制,您需要检查浏览器发送的 HTTP 标头(GET、POST 和 COOKIE),并模拟它们。请记住,某些值可能与会话相关。现在我没有时间为您做这件事,但我知道在某些情况下 ASP.Net 网站可能会非常棘手。可能有更简单的方法可以做到这一点,但这就是它总是会归结为什么,因为这就是发生的事情。

如果你没有被PHP绑定,那么整个世界都会打开选项——例如,我正在处理的项目中的聚合器实际上能够专门为这些类型的任务/页面执行(受控)javascript(尽管规模更大)。

我无法获得一组工作结果 - 如果您可以发布一些给出结果的虚拟数据,那会有所帮助。

作为通用答案,您需要可以操纵DOM的东西。你可以使用PHP和Webdriver之类的东西在服务器端,或者纯粹的客户端使用Selenium。模拟点击,获取生成的 HTML 并解析它。

这应该有效。 试试这个。

$url    ='http://butlercountyclerk.org/bcc-11112005/ForeclosureSearch.aspx';
## do curl , with cookies enabled.

## after do this.
$url    =$url.'?'.'__EVENTTARGET=Search%3AdgSearch%3A_ctl2%3A_ctl1&__EVENTARGUMENT=&__VIEWSTATE=dDwtMjk2Mjk5NzczO3Q8O2w8aTwxPjs%2BO2w8dDw7bDxpPDE%2BOz47bDx0PDtsPGk8Mz47aTwxNz47aTwxOT47PjtsPHQ8dDw7cDxsPGk8MD47aTwxPjtpPDI%2BO2k8Mz47aTw0PjtpPDU%2BOz47bDxwPDIwMDY7MjAwNj47cDwyMDA3OzIwMDc%2BO3A8MjAwODsyMDA4PjtwPDIwMDk7MjAwOT47cDwyMDEwOzIwMTA%2BO3A8MjAxMTsyMDExPjs%2BPjs%2BOzs%2BO3Q8cDxwPGw8VmlzaWJsZTs%2BO2w8bzx0Pjs%2BPjs%2BOzs%2BO3Q8QDA8cDxwPGw8Q3VycmVudFBhZ2VJbmRleDtQYWdlQ291bnQ7XyFJdGVtQ291bnQ7XyFEYXRhU291cmNlSXRlbUNvdW50O0RhdGFLZXlzOz47bDxpPDA%2BO2k8ND47aTwxMD47aTw0MD47bDw%2BOz4%2BOz47Ozs7Ozs7Ozs7PjtsPGk8MD47PjtsPHQ8O2w8aTwyPjtpPDM%2BO2k8ND47aTw1PjtpPDY%2BO2k8Nz47aTw4PjtpPDk%2BO2k8MTA%2BO2k8MTE%2BOz47bDx0PDtsPGk8MD47aTwxPjtpPDI%2BO2k8Mz47aTw0Pjs%2BO2w8dDw7bDxpPDA%2BOz47bDx0PHA8cDxsPFRleHQ7TmF2aWdhdGVVcmw7PjtsPENWIDIwMTEgMDUgMTQzNjtodHRwOi8vd3d3LmJ1dGxlcmNvdW50eWNsZXJrLm9yZy9wYS9wYS51cmQvcGFtdzIwMDAtb19jYXNlX3N1bT8xNjE3NzE0OSAgICAgICAgICAgIDs%2BPjs%2BOzs%2BOz4%2BO3Q8cDxwPGw8VGV4dDs%2BO2w8NS8zLzIwMTE7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPFNVTlRSVVNUIE1PUlRHQUdFIElOQzs%2BPjs%2BOzs%2BO3Q8cDxwPGw8VGV4dDs%2BO2w8TkFUSEFOSUVMIEdBQkJBUkQ7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPDEzNTQgVkFOREVSVkVFUiBBVkUgSEFNSUxUT04sIE9IIDQ1MDExOz4%2BOz47Oz47Pj47dDw7bDxpPDA%2BO2k8MT47aTwyPjtpPDM%2BO2k8ND47PjtsPHQ8O2w8aTwwPjs%2BO2w8dDxwPHA8bDxUZXh0O05hdmlnYXRlVXJsOz47bDxDViAyMDExIDA1IDE0MTU7aHR0cDovL3d3dy5idXRsZXJjb3VudHljbGVyay5vcmcvcGEvcGEudXJkL3BhbXcyMDAwLW9fY2FzZV9zdW0%2FMTk2MzQ4ODUgICAgICAgICAgICA7Pj47Pjs7Pjs%2BPjt0PHA8cDxsPFRleHQ7PjtsPDUvMi8yMDExOz4%2BOz47Oz47dDxwPHA8bDxUZXh0Oz47bDxUSElSRCBGRURFUkFMIFNBVklOR1MgQU5EIExPQU4gQVNTTiBPRiBDTEVWRUxBTkQ7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPEdBWUxFIE5BU0g7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPDg5NDEgQ09YIFJEIFdFU1QgQ0hFU1RFUiwgT0ggNDUwNjk7Pj47Pjs7Pjs%2BPjt0PDtsPGk8MD47aTwxPjtpPDI%2BO2k8Mz47aTw0Pjs%2BO2w8dDw7bDxpPDA%2BOz47bDx0PHA8cDxsPFRleHQ7TmF2aWdhdGVVcmw7PjtsPENWIDIwMTEgMDUgMTUwMztodHRwOi8vd3d3LmJ1dGxlcmNvdW50eWNsZXJrLm9yZy9wYS9wYS51cmQvcGFtdzIwMDAtb19jYXNlX3N1bT8yMjY1MTYxMiAgICAgICAgICAgIDs%2BPjs%2BOzs%2BOz4%2BO3Q8cDxwPGw8VGV4dDs%2BO2w8NS85LzIwMTE7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPFUgUyBCQU5LIE4gQTs%2BPjs%2BOzs%2BO3Q8cDxwPGw8VGV4dDs%2BO2w8TE9VSVMgTUlSTUFOOz4%2BOz47Oz47dDxwPHA8bDxUZXh0Oz47bDw2OTkxIEdBUlkgTEVFIERSIFdFU1QgQ0hFU1RFUiwgT0ggNDUwNjk7Pj47Pjs7Pjs%2BPjt0PDtsPGk8MD47aTwxPjtpPDI%2BO2k8Mz47aTw0Pjs%2BO2w8dDw7bDxpPDA%2BOz47bDx0PHA8cDxsPFRleHQ7TmF2aWdhdGVVcmw7PjtsPENWIDIwMTEgMDUgMTQ5MjtodHRwOi8vd3d3LmJ1dGxlcmNvdW50eWNsZXJrLm9yZy9wYS9wYS51cmQvcGFtdzIwMDAtb19jYXNlX3N1bT8yMzk3NTc5MiAgICAgICAgICAgIDs%2BPjs%2BOzs%2BOz4%2BO3Q8cDxwPGw8VGV4dDs%2BO2w8NS82LzIwMTE7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPEZJRlRIIFRISVJEIE1PUlRHQUdFIENPOz4%2BOz47Oz47dDxwPHA8bDxUZXh0Oz47bDxSQVlNT05EIFNURUlOOz4%2BOz47Oz47dDxwPHA8bDxUZXh0Oz47bDwyMzU5IFRIUlVTSCBBVkUgRkFJUkZJRUxELCBPSCA0NTAxNDs%2BPjs%2BOzs%2BOz4%2BO3Q8O2w8aTwwPjtpPDE%2BO2k8Mj47aTwzPjtpPDQ%2BOz47bDx0PDtsPGk8MD47PjtsPHQ8cDxwPGw8VGV4dDtOYXZpZ2F0ZVVybDs%2BO2w8Q1YgMjAxMSAwNSAxNDM4O2h0dHA6Ly93d3cuYnV0bGVyY291bnR5Y2xlcmsub3JnL3BhL3BhLnVyZC9wYW13MjAwMC1vX2Nhc2Vfc3VtPzI0NzgyOTYzICAgICAgICAgICAgOz4%2BOz47Oz47Pj47dDxwPHA8bDxUZXh0Oz47bDw1LzMvMjAxMTs%2BPjs%2BOzs%2BO3Q8cDxwPGw8VGV4dDs%2BO2w8V0VMTFMgRkFSR08gQkFOSyBOIEE7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPEpBTkVUIEJPRUhNOz4%2BOz47Oz47dDxwPHA8bDxUZXh0Oz47bDw4NjA4IEdPTERGSU5DSCBXQVkgV0VTVCBDSEVTVEVSLCBPSCA0NTA2OTs%2BPjs%2BOzs%2BOz4%2BO3Q8O2w8aTwwPjtpPDE%2BO2k8Mj47aTwzPjtpPDQ%2BOz47bDx0PDtsPGk8MD47PjtsPHQ8cDxwPGw8VGV4dDtOYXZpZ2F0ZVVybDs%2BO2w8Q1YgMjAxMSAwNSAxNDQwO2h0dHA6Ly93d3cuYnV0bGVyY291bnR5Y2xlcmsub3JnL3BhL3BhLnVyZC9wYW13MjAwMC1vX2Nhc2Vfc3VtPzI1NTkwMjAzICAgICAgICAgICAgOz4%2BOz47Oz47Pj47dDxwPHA8bDxUZXh0Oz47bDw1LzQvMjAxMTs%2BPjs%2BOzs%2BO3Q8cDxwPGw8VGV4dDs%2BO2w8RklGVEggVEhJUkQgQkFOSzs%2BPjs%2BOzs%2BO3Q8cDxwPGw8VGV4dDs%2BO2w8VEhFT0RPUkUgQ09PSzs%2BPjs%2BOzs%2BO3Q8cDxwPGw8VGV4dDs%2BO2w8UE8gQk9YIDE3MTEgV0VTVCBDSEVTVEVSLCBPSCA0NTA3MTs%2BPjs%2BOzs%2BOz4%2BO3Q8O2w8aTwwPjtpPDE%2BO2k8Mj47aTwzPjtpPDQ%2BOz47bDx0PDtsPGk8MD47PjtsPHQ8cDxwPGw8VGV4dDtOYXZpZ2F0ZVVybDs%2BO2w8Q1YgMjAxMSAwNSAxNDkwO2h0dHA6Ly93d3cuYnV0bGVyY291bnR5Y2xlcmsub3JnL3BhL3BhLnVyZC9wYW13MjAwMC1vX2Nhc2Vfc3VtPzI2ODY3MDkxICAgICAgICAgICAgOz4%2BOz47Oz47Pj47dDxwPHA8bDxUZXh0Oz47bDw1LzYvMjAxMTs%2BPjs%2BOzs%2BO3Q8cDxwPGw8VGV4dDs%2BO2w8Q0lUSUZJTkFOQ0lBTCBJTkM7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPERPTk5BIE1BUkRJUzs%2BPjs%2BOzs%2BO3Q8cDxwPGw8VGV4dDs%2BO2w8NjU0OSBDQU5BU1RPVEEgRFJJVkUgSEFNSUxUT04sIE9IIDQ1MDExOz4%2BOz47Oz47Pj47dDw7bDxpPDA%2BO2k8MT47aTwyPjtpPDM%2BO2k8ND47PjtsPHQ8O2w8aTwwPjs%2BO2w8dDxwPHA8bDxUZXh0O05hdmlnYXRlVXJsOz47bDxDViAyMDExIDA1IDE0Njg7aHR0cDovL3d3dy5idXRsZXJjb3VudHljbGVyay5vcmcvcGEvcGEudXJkL3BhbXcyMDAwLW9fY2FzZV9zdW0%2FMjk4NzU2MDIgICAgICAgICAgICA7Pj47Pjs7Pjs%2BPjt0PHA8cDxsPFRleHQ7PjtsPDUvNS8yMDExOz4%2BOz47Oz47dDxwPHA8bDxUZXh0Oz47bDxDSVRJTU9SVEdBR0UgSU5DOz4%2BOz47Oz47dDxwPHA8bDxUZXh0Oz47bDxNQVRUSEVXIEJMVU5ERUxMOz4%2BOz47Oz47dDxwPHA8bDxUZXh0Oz47bDwxNDEyIEhFTE1BIEFWRSBIQU1JTFRPTiwgT0ggNDUwMTM7Pj47Pjs7Pjs%2BPjt0PDtsPGk8MD47aTwxPjtpPDI%2BO2k8Mz47aTw0Pjs%2BO2w8dDw7bDxpPDA%2BOz47bDx0PHA8cDxsPFRleHQ7TmF2aWdhdGVVcmw7PjtsPENWIDIwMTEgMDUgMTQzMjtodHRwOi8vd3d3LmJ1dGxlcmNvdW50eWNsZXJrLm9yZy9wYS9wYS51cmQvcGFtdzIwMDAtb19jYXNlX3N1bT8zMjI0MzYxNyAgICAgICAgICAgIDs%2BPjs%2BOzs%2BOz4%2BO3Q8cDxwPGw8VGV4dDs%2BO2w8NS8zLzIwMTE7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPFdFTExTIEZBUkdPIEJBTksgTiBBOz4%2BOz47Oz47dDxwPHA8bDxUZXh0Oz47bDxKT0hOIEJPV01BTjs%2BPjs%2BOzs%2BO3Q8cDxwPGw8VGV4dDs%2BO2w8Jm5ic3BcOzs%2BPjs%2BOzs%2BOz4%2BO3Q8O2w8aTwwPjtpPDE%2BO2k8Mj47aTwzPjtpPDQ%2BOz47bDx0PDtsPGk8MD47PjtsPHQ8cDxwPGw8VGV4dDtOYXZpZ2F0ZVVybDs%2BO2w8Q1YgMjAxMSAwNSAxNDYzO2h0dHA6Ly93d3cuYnV0bGVyY291bnR5Y2xlcmsub3JnL3BhL3BhLnVyZC9wYW13MjAwMC1vX2Nhc2Vfc3VtPzQyMjcwMTE5ICAgICAgICAgICAgOz4%2BOz47Oz47Pj47dDxwPHA8bDxUZXh0Oz47bDw1LzQvMjAxMTs%2BPjs%2BOzs%2BO3Q8cDxwPGw8VGV4dDs%2BO2w8VSBTIEJBTksgTkFUSU9OQUwgQVNTT0NJQVRJT047Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPEJSWUFOIFNDSE1JRFQ7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPDI4OTUgV0VFUElORyBXSUxMT1cgRFJJVkUgSEFNSUxUT04sIE9IIDQ1MDExOz4%2BOz47Oz47Pj47Pj47Pj47Pj47Pj47Pj47PtVTse1TdIXrxq%2FXrY%2Fp22QQ7pAh&Search%3AddlMonth=5&Search%3AddlYear=2011&Search%3AtxtCompanyName=&Search%3AtxtLastName=&Search%3AtxtCaseNumber=';
## DO curl with cookies on again