Monthly Archives: March 2011

You are browsing the site archives by month.

Anatomy of a Simple Search Engine

Google receives more than 100 million search requests each day. It is the job of a Search Engine to fetch, sort, and rank the appropriate results for each query from its dataset of several billion web pages. It goes without saying that this is a tremendously complex task – but essentially all search engines are composed of four attributes:

1. Crawling for new content.
2. Storing the content once it is found.
3. Applying algorithms to the content to glean off useful data.
4. Making the content searchable.

It is possible to build a simple search engine of your own. I have written a simple script in PHP to crawl, store, and search the content. Use the code in the following links to get started:

1. Code for how to build a crawler that stores data in an index using MySQL. (example)
2. Code for searching the data once it has been stored in the index. (example)

PHP code for Search Engines

Powerful Search Engines need to process tremendous amounts of data – this is often accomplished using programming environments such as C and Python. If you’re interested in learning more about building crawlers and writing sorting algorithms, I suggest you read Programming Collective Intelligence: Building Smart Web 2.0 Applications and Webbots, Spiders, and Screen Scrapers: A Guide to Developing Internet Agents with PHP/CURL.