why unix | RBL service | netrs | please | ripcalc | linescroll
hosted services

hosted services

Overtime this blog has moved from text edited HTML files to a perl generated blog to WordPress any now to MediaWiki (this is mostly detailed in the about page.

One of the things where I think MediaWiki is lacking a little is in the trackback features. There's no management page for that just yet, but it wouldn't be hard for that to hacked about a little. For my purposes I've added the following to the trackback's table:

alter table wiki_trackbacks add column tb_ip varchar(255) not null default; create index wiki_trackbacks_tb_ip_idx on wiki_trackbacks( tb_ip );

So what this does is to add a column and an index to the database table so that if and when the trackbacks gets spammed it should be a little easier to drop these bits of spam based on source IP, sure that's not great as it would be nice to audit these before they get published, again this should be configurable.

The source file trackbacks.php requires a minor chance to log the source IP too:

Index: trackback.php
===================================================================
--- trackback.php       (revision 190)
+++ trackback.php       (revision 195)
@@ -43,6 +43,7 @@
 $tburl = strval( $_POST['url'] );
 $tbname = strval( @$_POST['blog_name'] );
 $tbarticle = strval( $_REQUEST['article'] );
+$tbip = strval( $_SERVER['REMOTE_ADDR'] );

 $title = Title::newFromText($tbarticle);
 if( !$title || !$title->exists() )
@@ -53,7 +54,8 @@
        'tb_title'      => $tbtitle,
        'tb_url'        => $tburl,
        'tb_ex'         => $tbex,
-       'tb_name'       => $tbname
+       'tb_name'       => $tbname,
+       'tb_ip'         => $tbip
 ));

rbl

I've added this change to include RBL lookups before accepting the post into the articles. Really I'd like to make this get shelved in a temporary location first.

Index: trackback.php
===================================================================  
--- trackback.php       (revision 195)
+++ trackback.php       (working copy)
@@ -49,6 +49,20 @@
 if( !$title || !$title->exists() )
        XMLerror( "Specified article does not exist." );

+
+if( $wgUseTrackbacksRBL==true && substr_count( $tbip, ":" ) == 0 && substr_count( $tbip, "." ) > 0 ) {
+
+       $rbl_list = array( "zen.spamhaus.org", "dnsbl.njabl.org", "dnsbl.sorbs.net", "bl.spamcop.net" );
+
+       foreach( $rbl_list as $rbl_site ) {
+               $ip_arr = array_reverse( explode( '.', $tbip ) );
+               $lookup = implode( '.', $ip_arr ) . '.' . $rbl_site;
+               if( $lookup != gethostbyname( $lookup ) ) {
+                       XMLerror( $tbip . " is listed in " . $rbl_site );
+               }
+       }
+}
+
 $dbw->insert('trackbacks', array(
        'tb_page'       => $title->getArticleID(),
        'tb_title'      => $tbtitle,

Apply this change and $wgUseTrackbacksRBL==true to LocalSettings.php will enable this.

This is available for download from here code/mediawiki.

optimisations

Dynamic content management systems often make life hard for the web server. In my opinion it's better to update pages on disk than to attempt to install caching layers. After all, when a page is updated in a database it's probably going to be read more than a dozen times before it's next updated.

images and css

Images and CSS files just don't need to be retrieved that frequently. Why not give them the Expires and ETag headers?

Here's what I've put in my site root .htaccess file so that most static content has a sensible expires header. Don't worry to much. If you change the object, you can always change the file name later so that the new copy is retrieved, something like a version number or the date in the file name should suffice.

ExpiresActive on
ExpiresDefault "access plus 24 hours"
ExpiresByType image/jpg "access plus 1 months"
ExpiresByType image/gif "access plus 1 months"
ExpiresByType image/jpeg "access plus 1 months"
ExpiresByType image/png "access plus 1 months"

ExpiresByType text/css "access plus 1 months"
ExpiresByType text/javascript "access plus 1 months"
ExpiresByType application/javascript "access plus 1 months"
ExpiresByType application/x-shockwave-flash "access plus 1 months"

ExpiresByType image/x-icon "access plus 1 months"

localsettings.php

wgdisablecounters

This toggle removes the page counter when set to true. Each time the page is retrieved when wgDisableCounters is set to true the database has to be updated. This requires resources and is therefore best set to false. Obviously, if you run something like webalizer on your log files it is not really all that useful to have this toggle set to true and is a "nice to have", if CPU requirements are tight, disable this by setting the value to false.

$wgDisableCounters=false;

cache

It's a good idea to cache what you can via mod_disk_cache:

CacheIgnoreHeaders Set-Cookie
CacheRoot /var/local/apache_disk
CacheDefaultExpire 3600
CacheMinFileSize 64
CacheMaxFileSize 64000
CacheDisable /perl
CacheEnable disk /wiki

accelerator

From my experience I've found that a good combination is to use MediaWiki article caching, Apache's mod_disk_cache and eaccelerator.

Building the accelerator:

tar jxvf eaccelerator-0.9.6-rc2.tar.bz2
cd eaccelerator-0.9.6-rc2
/usr/bin/phpize \
    ./configure --enable-eaccelerator=shared \
    --with-php-config=/usr/bin/php-config
make && make install

Setup the dirs:

mkdir -p /var/local/{apache_disk,eaccelerator,wiki_cache}
chmod 4770 /var/local/{apache_disk,eaccelerator,wiki_cache}
chown root:www-data /var/local/{apache_disk,eaccelerator,wiki_cache}

MediaWiki settings in LocalSettings.php:

$wgMainCacheType = CACHE_ACCEL;
$wgCacheDirectory = "/var/local/wiki_cache";
$wgUseFileCache = true;
$wgCachePages = true;
$wgEnableParserCache = true;

memcached

An alternative to eaccelerator is memcached. This keeps key/value pairs in memory. The immediate advantage is that you are able to reduce the database lookups since many popular items can already be in memory before having to go through the DB access layers (sockets read/write searches etc).

To enable memcached you normally have a rather simple path:

# apt-get install memcached

You may need to tweak the maximum amount of memory that memcached occupies, you can modify the -m value in /etc/memcached.conf.

Once you're happy and memcached is listening on port 11211 (ensure it's not listening to any other interfaces on your host other than local, or if it is then you should ensure that other hosts outside your private network cannot access it) you can make the changes to your LocalSettings.php file.

$wgMainCacheType = CACHE_MEMCACHED;
$wgMemCachedServers = array( "127.0.0.1:11211" );
$wgParserCacheType = CACHE_MEMCACHED;

That should be enough to begin taking some of the work load away from your database backend.

One thing that I wanted to have was adverts within the page content. This is possible using an extension which I cobbled together one evening.

This is available in the code directory, mediawiki, named s5h_adsense_filter.php.

If you have a google adsense account and want to have your adsense content in your pages then this is probably what you're after.

require_once( "$IP/extensions/s5h_adsense_filter.php" );

$wgs5hadsensefilter["ad_client"] = "ca-pub-8810238214716601";
$wgs5hadsensefilter["ad_slot"] = "8325595742";

google plus1

I've added google plus extensions to this site using an extension again, in much the same way as the google ads filter functions. This is available from s5h_google_plus1.php, simply add this to your mediawiki installation.

require_once("$IP/extensions/s5h_google_plus1.php");

If you wish to change some of the defaults then they're tunable as follows:

$wgs5hgoogleplus1filter["page_start"] = FALSE;

This controls if the button should be at the start of the page.

$wgs5hgoogleplus1filter["page_end"] = TRUE;

This controls if the button should appear at the end of the page.

$wgs5hgoogleplus1filter["display_size"] = 1;

This controls the size of the button:

0: default/standard
1: small
2: medium
3: tall

graphviz

After playing around with graphviz a bit (see ipv6 map) it's quite sensible to include the graphviz extension.

The configuration is very simple (two things to do) documented at the above URL. The extension is really very handy indeed if you want to inline a diagram. This goes with the why_unix, especially concerning documentation.