» Join Speeple | People | Groups | Blogs | News
Published: Thu, 26 June 2008, 07:53, also tagged: technology, rss, news, development, internet, xml, java, syndication, gzip, speeple, speeple news, newsbot
The Speeple News “NewsBot” has been updated to support content compressed with the gzip compression algorithms. I should have supported HTTP content encoding in gzip all along, but my recent bandwidth logs on the server have brought it to my immediate attention. Averaging 80 GB per day for 80 thousands XML news feeds just isn’t economical use of bandwidth.
The next step of improving the economy of the Speeple News “NewsBot” is to give each feed a score based on the update frequency of that feed; resulting in feeds which rarely update to be downloaded less often.
In conclusion I am hoping a mixture of enabling gzip, a score for feed update frequency and some “If-None-Match” ETag & “If-Modified-Since” support thrown in will produce a very efficient news crawler.
Published: Wed, 18 June 2008, 10:52, also tagged: technology, rss, news, development, internet, xml, java, syndication, speeple, speeple news, newsbot
Version 2.0 of the Java based news crawler (Speeple NewsBot) for Speeple News is officially in action.
The primary reason for the code re-write was to increase the number of feeds crawled per hour, this requirement has been fulfilled, with the news crawler now processing 80K news feeds within the hour:
Published: Wed, 11 June 2008, 15:03, also tagged: technology, rss, news, development, xml, software, open source, java, concurrent programming, atom, syndication, speeple, speeple news, newsbot
Before my visit to Russia I was working primarily on the Speeple control panel and blogging services, but because of my limited internet access I worked on a new version of the news bot for Speeple News.
The new and vastly improved version is multithreaded and makes use of the ROME RSS/Atom syndication and publishing tools library rather than my own RSS/Atom parser. I can't be entirely sure of the performance difference; the main reason for using the ROME library was to get development rolling along quickly.
The new bot performs amazingly well in comparison to the old variant. It manages to crawl 50K news feeds within the hour on a server with 8 CPU cores (using 4 threads per core).
Published: Mon, 10 March 2008, 08:40, also tagged: technology, development, blogging, xml, web design, speeple
The development of the Speeple blog services is well under way with the blog markup language practically complete along with a standard design that I’m happy with.
The blog template system will allow developers to design complex blog layouts via the powerful combination of an XML blog mark-up language (which I have designed to give as much data control as possible) and XSL templates (XSLT). Other browser based technologies are also available to developers; CSS and JavaScript.
Due to the nature of these technologies submitted templates will go through a review system. Issues likely to be encountered include poor performing XSL templates and malicious JavaScript code in the form of XSS etc.