« 上一篇 分享:访问不了Google Analytics请设置google映射IP配置IIS Rewrite 实现Z-blog 2.1 URL伪静态 下一篇 »

BingPreview经验证是BING更新网页快照的爬虫

今天在WIN IIS日志里发现一条记录,看起来像是爬虫叫BingPreview,但问题是BING爬虫的标准名称是Bingbog或Msnbot。于是上GOOGLE搜索了一下,关于BingPreview的中文资料非常少。不得不从英文网站找下资料,经验证,BingPreview确实是BING的爬虫程序。

今天在日志里看到BingPreview的UserAgent如下:
IIS6 cs(User-Agent):Mozilla/5.0+(Windows+NT+6.1;+WOW64)+AppleWebKit/534++(KHTML,+like+Gecko)+BingPreview/1.0b
日志里的客户端IP(c-ip)分别为以下两个:
131.253.38.67, 199.30.16.124
第一个IP只抓取了BING站长验证文件/BingSiteAuth.xml,后面这个IP对网站页面进行了抓取。

验证Bingbot:
找到BING 的“验证 Bingbot”页面 http://www.bing.com/toolbox/verify-bingbot,分别输入两个IP地址, 得到结果均是:
“判定 IP 地址 131.253.38.67:是 - 该 IP 地址是经过验证的 Bingbot IP 地址。名称: msnbot-131-253-38-67.search.msn.com ”;
“判定 IP 地址 199.30.16.124:是 - 该 IP 地址是经过验证的 Bingbot IP 地址。名称: msnbot-199-30-16-124.search.msn.com”。
由此可见,BingPreview确实是Bing的爬虫程序。

看起来BING的爬虫有点混乱:
本站除了有“BingPreview”外,也有“Bingbot”与“Msnbot”。这三个应该都是BING爬虫程序吧。Msnbot的User-Agent是这样的:65.55.217.201 msnbot/2.0b+(+http://search.msn.com/msnbot.htm)

BingPreview是什么爬虫?

原来,BingPreview是BING搜索引擎通过WIN8的BING APP触发,专门用来更新网页快照的蜘蛛程序。以下为BING BLOG里的原文:

Page snapshots in Bing Windows 8 app to bring new crawl traffic to sites
Today is a very exciting day as Windows 8 is now generally available to hundreds of millions of people, who will have access to a superb search experience through the preinstalled Bing app. This week we would like to highlight one specific feature that will impact the crawl traffic (visits to your site from our crawlers) we send to your website.
In addition to traditional web search, the Bing app for Windows 8 features a very visual image search feature, allowing users to swipe conveniently through a collection of thumbnails.
On top of this overview of the search results, users have the possibility to switch to a more detailed view by simply tapping on one of the images. The result is a full screen version of the image along with some metadata, including a link to the image source page and a small snapshot of the page.
This page snapshot is the specific feature we would like to highlight this week, as it is generated by our web crawler. Even though our crawler is intelligent enough to reuse components of your site it has already seen in the past, it will occasionally come and visit your pages again, as requested by a Bing app user, in order to get the freshest and most accurate snapshot possible. Therefore, as usage of the Bing app increases, you should expect more and more of this crawl traffic coming your way.
In order to be transparent on what crawl traffic is being generated, and obtain the best results, we are using a different user agent for this specific “snapshot generation” traffic:
Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/534+ (KHTML, like Gecko) BingPreview/1.0b
Having this page snapshot as part of the “full details” experience is a great way for us to drive traffic to your website as Bing app users look through your images.   As search continues to evolve in a visual, tactile and vocal direction, features such as the Bing App in Windows 8 stand to deliver traffic directly to sites by introducing searchers to sites they hadn’t previously discovered.