The robots.txt
and .htaccess
files are important to help you gain more traffic from search engines. The robots.txt
file opens up or restricts access to files on your server for Search Engine Robots. The .htaccess
file takes care of creating great looking, search engine friendly, and easy to remember URLs for your web site.
However, they can also create havoc and dismay if used the wrong way, leaving Search Engine Robots locked outside your web site. It can also result in displaying those nice looking 404 pages under every link you touch on your web site. So, how do you know if the files are okay? Testing is the keyword here!
The Googlebot and other Search Engine Robots will crawl your web site based on the rules you provide in your robots.txt
file. This file needs to be in the root of your domain or Joomla! installation directory.
There are just a few rules that robots will take into account if they visit your web site. Some of the rules are in the robots.txt
file and you can add another set of rules, either on a page-by-page basis or on a link in your web site.
In the robots.txt
file you will see commands such as:
Allow: /folder1/myfile.html Disallow: /folder1/
You can also have a link to the sitemap of your web site:
Sitemap: http://www.gstatic.com/s2/sitemaps/profiles-sitemap.xml
This will give the link to your XML or .html
sitemap to the robots if you don't have an XML file. Small difference, large effect!
The following rule looks like it does the same thing, but it doesn't:
User-agent: * Disallow: /
The "/" in the second line tells the robots not to visit your site's pages. In the following example, the robots are allowed to visit all pages.
User-agent: * Disallow:
The previous example is to show that you really need to make sure to use the right syntax in your robots.txt
file.
Joomla! comes with a standard robots.txt
file:
User-agent: * Disallow: /administrator/ Disallow: /cache/ Disallow: /components/ Disallow: /images/ Disallow: /includes/ Disallow: /installation/ Disallow: /language/ Disallow: /libraries/ Disallow: /media/ Disallow: /modules/ Disallow: /plugins/ Disallow: /templates/ Disallow: /tmp/ Disallow: /xmlrpc/
As you can see, most special directories are blocked from the Search Engine Robots. There is no need to let them visit and index these special pages that hold the core of the system.
In the standard Joomla! robots.txt
file, the directory images
is blocked by the following line:
Disallow: /images/
However, this is one line that you need to remove. In the images
directory you have all the images that you so carefully named, to be included in the image search pages of the major search engines.
Make sure that the robots get access to this directory by removing that line from your robots.txt file. This will open up a new flood of visitors. If you installed the SEF patch from JoomlAtWork.com site, this is already done for you.
The following is the complete robots.txt
file of the site www.cblandscapegardenign.com—notice the long line for sitemap
:, it must be on one line in your robots.txt
file.
User-agent: * Disallow: /administrator/ Disallow: /cache/ Disallow: /components/ Disallow: /includes/ Disallow: /installation/ Disallow: /language/ Disallow: /libraries/ Disallow: /media/ Disallow: /modules/ Disallow: /plugins/ Disallow: /templates/ Disallow: /tmp/ Disallow: /xmlrpc/ sitemap: http://www.cblandscapegardening.com/component/option,com_xmap/lang,en/no_html,1/sitemap,1/view,xml/
Full access is now granted to include the images
and stories directories, and a sitemap link is provided for all Search Engine Robots. The way in which pages and links are handled by the robots is a part of your content creation and that explanation is covered in Chapter 4, How to write keyword-rich articles.