The mod_rewrite and the magic of rewriting the URL (second part)

In the first part of this article we’ve seen two simple examples of url rewrite. What we will do now is to insert the name of the product in the url. Why?
Since the rewritten url will be the one registered from the search engine, it will be extremely interesting to visualize the product’s name in the url.
An aspect that we should pay great attention regards the limit for which the string of an url doesn’t accept spaces and special or accented characters. So, we should pass the string that represents the name of the product through a filter that replaces the spaces with dashes, the accented letters with the same letters not accented and that eventually eliminates the special characters.
You can see an example of this technique in the url that contains this article. WordPress uses the method of the permalink, that transforms the title of the article in an univocal data. If you use WordPress, you can try to create an article and entitle it prova test. A permlink prova-test will be created. Now if you try to create another article and entitle it again prova test, the permalink prova-test-2 will be created. This is because the permlink will be the only data available to go and get the article and so- like I said- has to be an univocal data
The system used by WordPress is very complicated, it expects transformations and verifications and uses asynchronous callings. In this article we’ll see a simplified model , but valid (anyway, I’m from the old school and as far as I’m concerned the univocal data has one name: primary key ). Then we’ll pass the name of the product, and somehow also the id.
What kind of url we want to obtain to optimize the indexation?
The url that we want to obtain is of this kind:
http://www.yoursite.com/producto56/big-electric-trimmers
If we wouldn’t use the url rewrite, the construction of the link to visualize the details of the product would be like this:
//reading values from the database echo '<a href="products.php?id=' . $id . '">' . $nameProduct . '</a>';
Instead, if we want to produce an url like the one previously seen, we should go on this way:
//reading values from the database //filtering product name echo '<a href="product' . $id . '/' . $nameFilteredProduct . '">' . $nameProduct . '</a>';
As we have seen in the beginning, if we want to pass the product’s name in the url, we should first clean it up of accented letters and special characters that can give problems and afterwards replace also the spaces with dashes.
Clean up the product’s name that we want to pass as url
We create the function CleanString() and start replacing the accented letters with the same letters without accents using the function str_ireplace().
function CleanString($string)
{
$strResult = str_ireplace("à", "a", $string);
$strResult = str_ireplace("á", "a", $strResult);
$strResult = str_ireplace("è", "e", $strResult);
$strResult = str_ireplace("é", "e", $strResult);
$strResult = str_ireplace("ì", "i", $strResult);
$strResult = str_ireplace("í", "i", $strResult);
$strResult = str_ireplace("ò", "o", $strResult);
$strResult = str_ireplace("ó", "o", $strResult);
$strResult = str_ireplace("ù", "u", $strResult);
$strResult = str_ireplace("ú", "u", $strResult);
$strResult = str_ireplace("ç", "c", $strResult);
$strResult = str_ireplace("ö", "o", $strResult);
$strResult = str_ireplace("û", "u", $strResult);
$strResult = str_ireplace("ê", "e", $strResult);
$strResult = str_ireplace("ü", "u", $strResult);
$strResult = str_ireplace("ë", "e", $strResult);
$strResult = str_ireplace("ä", "a", $strResult);
We also replace the apostrophe with a space:
$strResult = str_ireplace("'", " ", $strResult);
Now we can remove everything that isn’t a normal character or number:
$strResult = preg_replace('/[^A-Za-z0-9 ]/', "", $strResult);
Afterwards, we remove spaces before and/or after the string with the function trim():
$strResult = trim($strResult);
Now a final touch. If there’s double (or more) spaces inside the string, we’ll reduce them to one only space:
$strResult = preg_replace('/[ ]{2,}/', " ", $strResult);
The previous regular expression finds everything that has two or more spaces and replaces it with one single space.
Now we have to replace the spaces with dashes:
$strResult = str_replace(" ", "-", $strResult);
And here’s our function:
function CleanString($string)
{
$strResult = str_ireplace("à", "a", $string);
$strResult = str_ireplace("á", "a", $strResult);
$strResult = str_ireplace("è", "e", $strResult);
$strResult = str_ireplace("é", "e", $strResult);
$strResult = str_ireplace("ì", "i", $strResult);
$strResult = str_ireplace("í", "i", $strResult);
$strResult = str_ireplace("ò", "o", $strResult);
$strResult = str_ireplace("ó", "o", $strResult);
$strResult = str_ireplace("ù", "u", $strResult);
$strResult = str_ireplace("ú", "u", $strResult);
$strResult = str_ireplace("ç", "c", $strResult);
$strResult = str_ireplace("ö", "o", $strResult);
$strResult = str_ireplace("û", "u", $strResult);
$strResult = str_ireplace("ê", "e", $strResult);
$strResult = str_ireplace("ü", "u", $strResult);
$strResult = str_ireplace("ë", "e", $strResult);
$strResult = str_ireplace("ä", "a", $strResult);
$strResult = str_ireplace("'", " ", $strResult);
$strResult = preg_replace('/[^A-Za-z0-9 ]/', "", $strResult);
$strResult = trim($strResult);
$strResult = preg_replace('/[ ]{2,}/', " ", $strResult);
$strResult = str_replace(" ", "-", $strResult);
return $strResult;
}
In this way the creation of our link will be very simple:
//reading values from the database echo '<a href="product' . $id . '/' . CleanString($nameProduct) . '">' . $nameProduct . '</a>';
Let’s write the rewrite url rule
At this point we can go on with the implementation of the rewrite rule in this way:
RewriteEngine On RewriteRule ^prodotto([0-9]+)/([a-zA-Z0-9-]+)$ prodotti.php?id=$1
This rule provides that: when you find the string product followed by a one or two digit number, followed by “/”, followed by an alphanumeric string (that can contain the the character “-”) of one ore more characters, ask for the page products.php?id=theNumberfounInTheFirstSubstringilNumeroCheHaiTrovatoNellaPrimaSottostringa.
The second substring doesn’t matter because we already have the id, necessary to load the details of the product, but we are interested in the optimization for the search engines, for which on the url level the name of the product is available.
Conclusions
We’ve seen three fundamental examples of how to implement the rewriting of the url. Theoretically there are no limits to the possibilities offered by this Apache model, the only limit may be the knowledge we have of the regular expressions. In fact, as we have seen, the major part deals with this. And you? Do you use this technique to improve the indexation of your sites?
*****************************************
L'immagine principale dell'articolo è stata fornita da @Fotolia
2 comments
Trackback e pingback
-
Tweets that mention The mod_rewrite and the magic of rewriting the URL (second part) | Your Inspiration Web -- Topsy.com
[...] This post was mentioned on Twitter by soshableweb and Web RSS News, Antonino Scarfì. Antonino Scarfì said: RT @YIW ...




I don’t think modern browsers have problems with accented characters. By removing them you risk to change the meaning of a word.
There are also languages (like Greek, Chinese, etc) who have non Latin alphabet. I’m not sure if preg_replacing [^A-Za-z0-9 ] those languages will return letters.
What I’ve been using so far is a function that looks like:
$string = mb_strtolower($string, ‘UTF-8′);
$lookfor = array(‘+’, ‘&’, ‘€’ ‘@’, ‘#’, ‘!’);
$replacewith = array(‘plus’, ‘and’, ‘euro’, ”, ”, ”);
$string = str_replace($lookfor, $replacewith, $string);