» Join Speeple | People | Groups | Blogs | News
Published: Sun, 10 August 2008, 12:25, also tagged: development, performance, c, software, php, linux, intel, optimization, gnu, gcc, compilers, intel c compiler, programming optimization
PHP like any other scripting or programming language can be optimized to improve performance. In this series of blog posts I hope to highlight the areas where PHP can be optimized. I won’t go into details of PHP output caching (which can of course lead to massive performance improvements) – mainly because after the initial cache PHP plays only a minor role. This series of posts will target dynamic PHP scripts where output caching (e.g. due to constantly changing data etc.) isn’t an option.
The performance of PHP is ultimately determined by the PHP interpreter itself. PHP is open source software written in the C programming language. Taking steps to make sure a fast binary is compiled is the first step to improving overall PHP performance.
Published: Wed, 16 July 2008, 04:40, also tagged: development, php, functions, mysql, func num args, func get args
MySQL provides functionality for checking if a column or value is in a set, for example:
-- x IN(set) Example:
SELECT * FROM documents WHERE id IN(1, 20, 7, 18)
-- x NOT IN(set) Example:
SELECT * FROM documents WHERE id NOT IN(21, 5, 4, 13)
Using PHP’s function overloading we can easily emulate this method in PHP:
<?php
function in() {
if (func_num_args() < 2) return false; // Nothing to compare
$args = array_flip(array_slice(func_get_args(), 1)); // Remove first array item, flip the array so fast key lookup can be used – bypasses slow loops
return isset($args[func_get_arg(0)]); // Look up using array key – faster than using loops & less code
}
?>
Usage Examples:
Published: Tue, 1 July 2008, 16:43, also tagged: technology, rss, news, development, internet, statistics, syndication, speeple, speeple news
I’ve put together a source for Speeple News Statistics. The page provides overall statistics of the Speeple News service, including health statistics such as crawl rate and top sources grouped by domain and individual feeds.
The stats cover news item totals, feed count, feed types and type version and content languages. The page is updated every 30 minutes.
Published: Fri, 27 June 2008, 10:28, also tagged: technology, rss, news, development, internet, xml, php, http, java, syndication, speeple, speeple news, newsbot, etag, last modified, if modified since, if none match
To further improve the performance of the Speeple News “NewsBot” I have implemented support for ETag and Last-Modified HTTP headers. This basically means that only the HTTP headers will be retrieved rather than the full body content if the feed hasn’t changed since the last time NewsBot accessed the XML feed.
This not only improves the efficiency fetching content for Speeple News, it also benefits webmasters and site owners because less bandwidth is used.
Initial statistics shows that supporting HTTP ETag / Last-Modified headers along with handling gzip encoded content has reduced bandwidth costs by over 60%.
Published: Thu, 26 June 2008, 07:53, also tagged: technology, rss, news, development, internet, xml, java, syndication, gzip, speeple, speeple news, newsbot
The Speeple News “NewsBot” has been updated to support content compressed with the gzip compression algorithms. I should have supported HTTP content encoding in gzip all along, but my recent bandwidth logs on the server have brought it to my immediate attention. Averaging 80 GB per day for 80 thousands XML news feeds just isn’t economical use of bandwidth.
The next step of improving the economy of the Speeple News “NewsBot” is to give each feed a score based on the update frequency of that feed; resulting in feeds which rarely update to be downloaded less often.
In conclusion I am hoping a mixture of enabling gzip, a score for feed update frequency and some “If-None-Match” ETag & “If-Modified-Since” support thrown in will produce a very efficient news crawler.