Outils de recherche et référencement

ThickParasite · 15-07-2005 19:28:17

Pour les fanatiques des bots comme Jan

Arachnophilia, the Joy of Playing with Spiders

Spiders make great geek pets, at least virtual ones do. Hëre at StepForth, we keep a couple spiders on our system to test sites, pages and documents in the hopes of learning more about the behaviours of common search engine spiders such as GoogleBot, Yahoo's Slurp and MSNBot. Recently, we learned that virtual pets share a similar problem with live pets; they grow old and eventually die. While our mock-spiders are still very much alive, the information we glean from their behaviours is increasingly irrelevant to predicting how a spider from a major search engine will behave. Our pet-spiders have grown too old to showër us with the informative affection they once did.

It used to be easy to predict the behaviour of common search engine spiders. Today, predicting search spiders is not so easy and with a growing number of spiders and search databases to consider, trying to get a leg-up on where the spiders are going is rather tricky. In previous years, Google, Inktomi and other electronic 'bots could be relied on to visit a site on a regular basis. The working environment was a bit simpler a few years ago, easily summed up with nine letters, G-O-O-G-L-E-B-O-T. GoogleBot was at one time the only important search spider around. While others existed, even as recently as two years ago, Google fed search results to most of its competitors.

Visiting on a somewhat regular monthly schedule, Googlebot would compile information on all the documents in its database, a process that took about one week and then rearrange their listings during the eagerly anticipated GoogleDance. Search engine optimization firms were often able to anticipate the unscheduled start dates of the GoogleDance by examining spidering activities in their weblogs and noting PageRank and back-link updates that generally preceded a shift in Google's rankings. When the shift actually happened, changes stemming from it were fairly significant as many of the search results would be altered based on new data found during the monthly spider-cycle.

What a difference a couple of years can make. Today there are four major general search engines and several vertical search tools, each with a unique algorithm and spidering schedule. So just how important is it to know the spidering schedule of the various search engines?

In previous years, most SEOs would say it was extremely important to know when a spider was going to visit a client's site. SEOs worked with fairly fixed deadlines, hoping to have clients' optimized content uploaded about a week before the expected GoogleDance began. Even then one was not entirely sure that the date they predicted for the Dance was correct but with a somewhat regular spider/update cycle, SEOs had fixed windows of opportunïty with subsequent weeks to tweak and rework content if rankings didn't materialize during the last update.

Today's spiders have become almost intuitive and it is less important to know when a spider will visit as it is to know where a spider will visit. Most spiders visit an active website very frequently. According to three months worth of stats compiled by ClickTracks, spiders from Ask Jeeves visit at least once a day while MSN and Yahoo spider the index page of the StepForth site several times a day. Google only visits our index page, every four days on average. Compared to previous years, even the least frequent visitor, GoogleBot is gobbling up content. With daily or even weekly visits, the increased number of visits gives SEOs a much faster turn around time from completing optimization on a site to seeing results in the Search Engine Results pages.

A major shift in the way search engines think about content is seen in where spiders will visit, the frequency of visits, and what drives them there. Previously, search engine spiders would consider a domain or URL as the top level source of information. It would go to the index page and spider its way through the site from that point. That is no longer the case as search engine spiders are today better able to contextualize content found on unique documents within a domain and schedule spider frequencies accordingly. For example, on a site dedicated to the sale of Widgets, the document that refers to the highly popular Blue Widgets will see more spider traffïc than a document referring to the less popular Red Widgets. Similarly, a document that changes regularly will see more visits as the search engines tend to know when changes are made on documents in their database. In other words, search engine spiders tend to know your website as a collection of unique documents contained under a single URL or domain, as opposed to a collection of topically themed documents under a single URL or domain. Based on the number of searches for relevant keywords performed by search engine users, the number of incoming links, the frequency of change, and the frequency of live-human visits to a document, the 4 major search spiders are now setting their own schedules.

While the timing of spider visits has changed radically, many standard behaviours remain the same. Spiders still travel where links, both internal and external, take them. The difference today is those links often lead to internal pages. In previous years, most links lead to the index or home page of a site. With the advent of PPC programs such AdWords and Yahoo Search Marketing, webmasters and search engine marketers are creating product specific landing pages, each of which might be relevant to organic searches. This has allowed savvy SEOs to optimize landing pages for organic rankings as well as PPC conversions. Search engine results now tend to be more relevant to the specifics of any given topic as opposed to a general overview of that topic.

Of all the spiders, the most active by far is MSNBot. Visiting each document in its index at least once per day and often more frequently, MSNBot has been known to crash servers housing sites with dynamically generated content as the 'bot sometimes doesn't know when to quit. After MSNBot, Ask Jeeves and Yahoo are the busiest of the major bots. Oddly enough, the quietest is GoogleBot, which visits each document in our site at least once per month but with little or no discernable pattern.

In order to prompt spiders through the site, we suggest creating a basic, text based sitemap appended to the back of your website. The sitemap should list every document in your website. To jazz it up, add a short description of the content of the document linked to below the link. Add a link to the sitemap to the footer of each page in your site. That will help with Ask, MSN and Yahoo. For Google, a slightly more complex solution is available through the creation of an XML based sitemap.

About two weeks after implementing the HTML sitemap on your site and uploading your XML sitemap to Google, start to watch your server logs for increased spider visits. Be sure to watch for where the spiders are going and which documents receive the most frequent visits. You may be pleasantly surprised at how friendly modern spiders can be.

About The Author
Jim Hedger is a writer, speaker and search engine marketing expert based in Victoria BC. Jim writes and edits full-time for StepForth. He has worked as an SEO for over 5 years and welcomes the opportunïty to share his experience through interviews, articles and speaking engagements. He can be reached at: [email protected].

Source : Newsletter SiteProNews

Turulillo · 15-07-2005 19:38:19

Franchement, j'ai pas trouvé ça transcendant.

ThickParasite · 15-07-2005 19:56:40

En fait, c'est un peu le résumé de l'état des lieux. Rien de nouveau à vrai dire.
Faudrait que Jan nous fasse le même en version underground

Turulillo · 15-07-2005 20:03:44

Allez Jan, notre Fantomaster national, fais nous un petit topo sur les bots, le cloack et tout ça.

Jan · 16-07-2005 01:02:43

Il fait trop chaud pour que je tente de me lancer dans un topo exhaustif. Le billet ne me transcende pas non plus. Ce qui est dit semble assez juste mais pas franchement nouveau.

A mon avis parler d'un spider au sens large ne veut rien dire: il n'y a pas un googlebot (il y a par exemple le "vrai" et le "mozilla" (non je ne parle pas de mozbot hein ) ). Idem pour slurp et msnbot: tous n'ont pas le même rôle. Alors se contenter d'un observation "macroscopique" est une approche biaisée, qui conduit à de fausses conclusions.

Exemple: quand l'auteur dit que msnbot est le plus actif, je ne suis pas d'accord. Effectivement le nombre de msnbots qui passent sur une page est élevé, mais le msnbot chargé d'indexer et de rafraichir la page en cache passe moins souvent que googlebot.

Pour ce qui est du sitemap, je ne sais rien en dire, faute de l'avoir testé. Jusqu'à maintenant quelques BLs bien sentis font l'affaire pour faire indexer toutes les pages des sites que je bichonne.

Outils de recherche et référencement

#1 15-07-2005 19:28:17

Jouer avec les Bots

#2 15-07-2005 19:38:19

Re: Jouer avec les Bots

#3 15-07-2005 19:56:40

Re: Jouer avec les Bots

#4 15-07-2005 20:03:44

Re: Jouer avec les Bots

#5 16-07-2005 01:02:43

Re: Jouer avec les Bots

Pied de page des forums