s5h.net

“fresh linux news and advice.”


puce2007-11-22 how much is that

Myself and a friend discussed the price of items on Amazon fluctuating. We both wrote different ways to scrape an Amazon page for the price of an item. My solution is a single line of perl. Change the URL to $ARGV[0] if you wish. It's just there for demonstration purposes.

Well, a single line of perl, but not easy to read on this blog! (One line can be very long). Here it is, broken up - but only so that it fits on the screen a little easier.

perl -e 'use LWP::UserAgent; my $u = LWP::UserAgent->new();'\
'my $r = $u->get( "http://www.amazon.co.uk/Console-Stand-Storage-'\
'Nintendo-Licensed/dp/B000W4MV6K/ref=pd_bbs_sr_4'\
'?ie=UTF8&s=videogames&qid=1195737214&sr=8-4" ); '\
'if( $r->is_success ) { my @l = split( /\n/, $r->content ); '\
'foreach( @l ) { '\
'if( $_ =~ /<b>Price:<\/b> <b class="price">(.*)<\/b><br \/>(.*)'\
'stock<br \/>/ ) { '\
'print( substr($1,1) . "  $2\n" ) } } };'

puce2007-11-09 php plugin mechanism

Having worked with squirrelmail plugins I thought it might be a nice idea to simply replicate the way that plugins can be implemented in PHP. It's not very much effort doing this, as compared with compiled languages where there is a more complex system, or dynamically linking.

Lets start by taking a look at what our goals are. We wish to allow the inclusion of plugins based on their file name, residing in ./plugins. We plan to implement three different hooks, header, body and footer (obviously for page creation, but we could have more, such as form submission buttons etc). We also wish to allow different actions to occur based on the exit status of the function return values.

Lets look at what we can do for registering a plugin function in the code. We're really just going to iterate over a directory and include_once the files that reside therein.

We can call register_plugins() from a PHP file that is included on each page load.

function register_plugins() {
	global $plugin_array;

	$d = opendir( "./plugins" );

	if( !$d ) {
		echo( "Warning: directory plugins does not exist\n" );
	}

	while( $f = readdir($d) ) {
		if( $f == ".." || $f == "." ) {
			continue;
		}

		if( !preg_match( '/^(.*)\.php$/', $f, $name ) ) {
			continue;
		}

		$functionname = $name[1];

		echo( "Including $f\n" );

		include_once( "./plugins/$f" );
		
		$func = $functionname . "_register_plugins";
		$func();
	}

	closedir( $d );
}

As we see here, the plugin name is derived from the filename, which we obtain during a readdir function call. We then create a function name to register the plugin hooks by giving the suffix '_register_plugins'. (Perhaps this should be called register_hooks).

Within the _register_plugins function we can assign some hook names within a global associative array (or array of hashes).

function simple_register_plugins() {
	global $plugin_array;

	echo( __FUNCTION__ . " has been called\n" );
	$plugin_array['simple']['header_hooks'] =
		'simple_header_functions';
}

The above code snippet just shows a single hook that is assigned. We don't have to use prefix of 'simple' in the hook name, but this provides some name space that we can work within that should ensure we never overwrite values that another hook might use.

Now lets take a look at the hook that we just added.

function simple_header_functions( $return_data ) { 
	global $exit_values;

	echo( __FUNCTION__ . " has been called: $return_data\n" );

	$return_data .= "Hello world etc!";

	return( $exit_values['good_exit'] );
}

Here we just append some data and return a good exit code, something that the caller can rely on, so we have a chance to gracefully stop the page load. In the index page we can call a function that cycles the hooks and does the work depending on the exit values. We would just call something along the lines of

switch( do_page_header_hooks( &$page_data ) ) {
	case $exit_values['good_exit']: {
		echo( $page_data );
		break;
	}
}

In the do_page_header_hooks we have the real guts of the operation which would be repeated for each of the hook stages that we wish to implement.

function do_page_header_hooks( $page_data ) {
	global $plugin_array;
	global $exit_values;

	$keys = array_keys( $plugin_array );

	for( $i = 0; $i<sizeof( $keys ) ; $i++ ) {
		$plugin_name = $keys[$i];

		if( $plugin_array[$plugin_name]['header_hooks'] != NULL ) {
			$val = $page_data;

			$func = $plugin_array[$plugin_name]['header_hooks'];
			$ret = $func( &$val );

			echo( "Returned: $val\n" );

			switch( $ret ) {
				// now lets make the page_data = val
				case $exit_values['good_exit']: {
					$page_data = $val;
					break;
				}

				// now lets stop the further exection
				case $exit_values['done']: {
					$page_data = $val;
					return( $exit_values['done'] );
				}
			}
		}
	}
	return( $exit_values['good_exit'] );

}

This really concludes the proof of concept which you're welcome to download here php_plugin_hooks.tar.gz.

puce2007-11-06 banshee-- rhythmbox--

Continuing from my rant last night I'd like to point out that the modern music players just don't perform as well as players that were working perfectly ok many years ago.

I'd like to cast the reader's mind back to x11amp, it really was as perfect as a mp3 player needed to be. What more does anyone want, except that it is maintained to use the current audio device that the rest of the world uses (back then it was OSS or nothing), but I remember some lag waiting for XMMS to support ALSA.

The trouble today is that there are just too many audio players around, and most of them trying to fill the space that XMMS has left. Seriously, if you want to play some music, usually dragging the directory onto the XMMS window is sufficient.

Well, trying to move on, so that I'm not stuck in the 90's, I looked into some other players, banshee and rhythmbox. Neither are that great.

What I loved about X11AMP^Wxmms was that I could close the application and when it's started again it would jump the cursor to the last item in the playlist that I had listened to.

No such joy with banshee. But there's a plugin interface, so I knocked something together and it seems to work.

It's great in the sense that it's going to save a list of what I've played, but it's not going to solve the problem and select the track in the main music folder. Perhaps in the next version, but the possibilities are there to implement this in a reasonably friendly language.

What's most infuriating is that the state of the current players just want to do everything, be an all in one if you like for your ipod, storage devices and what-not. What if you just want to play music? Well, there's mpg123, ogg123, flac123 etc, but they don't really support the simple playlist that the x11amp has.

Well, stuck with the slow stuff then I guess...

puce2007-11-05 interfaces

Should interfaces be written in ASM, C, C++, Java, c#, perl or python?

The thing that's key here for me is a browser. The internet browser is perhaps the most used piece of software on the desktop these days (aside from the software that draws it, and that's going to be c/c++).

Now, if the browser is written in a script language it's going to lag somewhat. C#/java isn't that far from the script itself, they both require a runtime.

But should it be written in something as low level as C? Would that make the development take so long that it's never going to get released as something other than a beta?

Lots of questions here, but it's also a lot of code. Are we ready for something that's purely script based? And this begs another question since the browsers themselves can parse and run script, javascript, XUL etc, so surely that should be as optimal as possible.

There ain't no easy answer here, but I'm siding on the lower level languages here as I remember the good old days when the browsers were fast (yes, before IE4), and I don't think the end user should give that up. For those curious please go and take a look at Opera and lynx, they're still snappy.

puce2007-11-02 whoops

Spending ages perfecting an algorithm can be rewarding.

Unless you happen to read through the header file to find something that does what you wanted, but better, and easier!

Whilst tuning an inflate routine for .gz compression I stubmled upon gzopen, a neat little function for handling .gz files! Including a nice little function named gzread, which reads a gz file descriptor. Lovely!

Well, the short of it all really is that now I've finished the module and it's working out nicely on my test server here. I've saved about 1/6th of the disk space that was previously in use, and an unknown quantity of CPU that was used on the fly to generate the content.

This doesn't replace modules like mod_gzip, or mod_deflate which work on page load time for dynamic and static content, but neither cache OOTB AFAIK.

Here is the code that I've roughed together. It seldom allocates memory, and should be simple enough to install for you. mod_gzip_disk.tar.gz.

There is a possible problem using this method of compression in that the client (googlebot, slurp etc) might prefer a 304 response (page not changed) which is something this module will not currently handle, although might in the future. All this is concerned with is squeezing all that little more out of bandwidth and disk.

In my case, many files are close to block size, so I'm not saving much, perhaps at best 1/6th.

Oh, one more thing, aside from the entry that apxs2 will add to httpd.conf, you don't have to touch your apache config files (although I might add something later to negate it's functionality should that become an issue). Send feedback or postcards.