Speeple » Join Speeple | People | Groups | Blogs | News

Speeple Core

Speeple NewsBot 2.0

Version 2.0 of the Java based news crawler (Speeple NewsBot) for Speeple News is officially in action.

The primary reason for the code re-write was to increase the number of feeds crawled per hour, this requirement has been fulfilled, with the news crawler now processing 80K news feeds within the hour:

0.56 hours taken to crawl 84367 feeds. 100162 items today, 9369.2 items per hour (224860.91 projected, progress: 44.54%)

Changes since 1.0:

  • Dramatically increased crawl rate.
  • Threaded: takes full advantage of multi-core CPUs.
  • Fixes some embarrassing data integrity issues found in earlier versions.
  • Smaller memory footprint: uses similar amount of RAM as previous versus whilst at the same time running 32 threads.
  • Records more information about XML feeds; parse and HTTP errors etc.
  • Distributed: can be run on multiple servers massively increasing crawl throughput potential.