The last time I wrote about Lightroom, I was using sqlite to pull out frequencies of focal length. This time it’s keywords: Lightroom lets you build any number of custom keyword sets to apply to photos. It automatically builds a set of “recently used” keywords, but I thought it would also be handy to have a set of my most commonly-used keywords. While Lightroom has a command to export a list of keywords, that list doesn’t include frequencies. Keywords are stored in Lightroom in a table called AgLibraryTag (AgLibraryKeyword in Lightroom 2, see update below). Conveniently, Lightroom writes a count of each keyword to the same table, so it’s easy to get out all the information we need. (Note: The frequency in this table is a cached value and may not reflect the up-to-the-minute reality within your database. Rather than constantly update its databases Lightroom seems to update this count when you view the discrete keywords. I’m not sure how to force a library-wide update of all keyword counts. This is probably close enough, and is simpler/quicker than counting keywords image-by-image.)
Rather than run this data through R to build a histogram as I did with the focal length data, I just use awk this time to make a list with the most frequently-used tags at the bottom. With this list, you can easily build a corresponding tag set in Lightroom.
Remember to change paths to suit, and that (on OSX) you’ll probably need to upgrade your version of sqlite for all this to work. Also, always always work from a copy of your database as this script does.
Daydream: A map of how keywords relate to one another would be awesome.
Update: The above daydream is now possible with Lightroom 2’s related keywords functionality. Write-up here.
Update again (Oct 4 2008): The old code didn’t work with Lightroom 2 due to some things moving around in the database. The below seems to fix it and obtain keyword frequencies for LR2: