Speeple » Join Speeple | People | Groups | Blogs | News

Speeple Core

Tagged Speeple News

Page 1 of 2 »

  1. Speeple News Graphs

    A graph is now displayed for certain searches which are grouped by day or month. The graphs display the activity over a time period of 100 days or 100 months.

    The Speeple News graphs help outline when keywords were popular in the index, normally by showing a spike. This is most visible during sporting and seasonal events along with major world issues.

    Examples

    Continue Reading »

  2. Resource: Speeple News Statistics Page

    I’ve put together a source for Speeple News Statistics. The page provides overall statistics of the Speeple News service, including health statistics such as crawl rate and top sources grouped by domain and individual feeds.

    The stats cover news item totals, feed count, feed types and type version and content languages. The page is updated every 30 minutes.

  3. Speeple NewsBot: ETag (If-None-Match) and Last-Modified (If-Modified-Since) Implemented

    To further improve the performance of the Speeple News “NewsBot” I have implemented support for ETag and Last-Modified HTTP headers. This basically means that only the HTTP headers will be retrieved rather than the full body content if the feed hasn’t changed since the last time NewsBot accessed the XML feed.

    This not only improves the efficiency fetching content for Speeple News, it also benefits webmasters and site owners because less bandwidth is used.

    Initial statistics shows that supporting HTTP ETag / Last-Modified headers along with handling gzip encoded content has reduced bandwidth costs by over 60%.

    Continue Reading »

  4. Speeple NewsBot Update

    The Speeple News “NewsBot” has been updated to support content compressed with the gzip compression algorithms. I should have supported HTTP content encoding in gzip all along, but my recent bandwidth logs on the server have brought it to my immediate attention. Averaging 80 GB per day for 80 thousands XML news feeds just isn’t economical use of bandwidth.

    The next step of improving the economy of the Speeple News “NewsBot” is to give each feed a score based on the update frequency of that feed; resulting in feeds which rarely update to be downloaded less often.

    In conclusion I am hoping a mixture of enabling gzip, a score for feed update frequency and some “If-None-Match” ETag & “If-Modified-Since” support thrown in will produce a very efficient news crawler.

  5. Speeple News Statistics

    The news service provided by Speeple has been indexing content for over a year and half now and in this post I will outline some basic statistics.

    • 80000+ XML news feeds (72% RSS 2.0, RSS 0.91 14% and 1.4% Atom 1.0)
    • News in 50+ languages, top 5 languages:

      1. English - 60%
      2. German - 8%
      3. Spanish - 5.5%
      4. Chinese - 5%
      5. Russian - 4.6%
    • 37+ million news items indexed (150+ thousand added daily)
    • 1.2+ million unique tags

    The news crawler retrieves 8000+ news items per hour, taking 0.5 hours to process the full feed list.