All Projects → toddwschneider → hntrends

toddwschneider / hntrends

Licence: MIT license
Hacker News front page trends search

Programming Languages

ruby
36898 projects - #4 most used programming language
HTML
75241 projects
javascript
184084 projects - #8 most used programming language
CSS
56736 projects
Procfile
174 projects

Hacker News Front Page Trends

A Ruby on Rails app that stores Hacker News items that have appeared on the front page, and exposes a few JSON API endpoints that let users search for terms, domains, and users to see how popular they have been on the HN front page over time.

Click here for a live dashboard that uses this API

Screenshot

screenshot

Caveat

HN only provides the exact list of front page items for dates since 11/11/2014, so anything before then is an estimate. For earlier dates, I used a heuristic of sorting by score and taking the top 115 items on weekdays, 80 on weekends, subject to a minimum of 3 points. This definitely isn’t perfect, for example:

  • it excludes job posts before 11/11/2014 since they always have 1 point
  • items with high scores don’t always get to the front page
  • it’s possible that HN has changed its algorithm over time to promote faster or slower front page turnover

But it should be a decent approximation, and the code could also be modified to use other heuristics. It would also probably be an improvement to fetch all job posts from pre 11/11/14 via the HN API.

Structure

There are 3 files of interest:

  1. app/lib/hn_client.rb - code to collect front page data via the HN website and API
  2. app/models/hn_item.rb - code that uses the HnClient to store the appropriate records in PostgreSQL database
  3. app/lib/hn_trends_calculator.rb - code to calculate trends over time and top items for given search terms. The trends endpoint returns 4 metrics for each term/date:
    1. Fraction of all front page items
    2. Number of all front page items
    3. Fraction of total front page score, i.e. the total score of items matching the search term divided by the total score of all front page items
    4. Front page score

The trends calculator supports searching titles, domains (with or without subdomains), and usernames. When searching by title, there are 3 different search styles:

  1. Web search uses PostgreSQL full text search, in particular the websearch_to_tsquery() function and GIN indexes. By default the tsv column uses the simple text search configuration
  2. Case-insensitive exact title match uses the ~* PostgreSQL regular expression operator, combined with a trigram index
  3. Case-sensitive exact title match is the same as #2, but uses the ~ regex operator instead of ~*

Requirements

Requires PostgreSQL 11+, since websearch_to_tsquery() was added in version 11

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].