Solr and the Panama Papers

You'll all have heard of the Panama Papers: the leaking of 11 million documents showing illegal and legal tax evasion around the world. The 2.6Tb data set was so enormous that the International Consortium of Investigative Journalists engaged data analysts to help them index and search through all that data in order to find newsworthy stories.

And those analysts used Apache Solr as the primary search engine to allow hundreds of journalists around the world to collaborate. The also used Apache Tikka to extract data from different file formats and Apache Blacklight to organise and catalogue. I mention these open source projects because I'm very proud to be a member of the Apache Foundation: an open source organisation whose members donate their time toward the betterment of civilisation through tools that are found in almost every computing device. And also because onCourse uses Solr to power search on its websites.

When you search on tags, time of day, keywords, price, distance from a location, and much more it is Solr doing the heavy lifting in the background. Usually in under 10-20ms.

When you use our faceted search to pre-generate a list of search counts for each tag before you even try to search on them, that's Solr working incredibly fast to search on every term.

The technology we can leverage in 2016 is just amazing.