schussman.com logo

Keywords in Lightroom

The last time I wrote about Lightroom, I was using sqlite to pull out frequencies of focal length. This time it’s keywords: Lightroom lets you build any number of custom keyword sets to apply to photos. It automatically builds a set of “recently used” keywords, but I thought it would also be handy to have a set of my most commonly-used keywords. While Lightroom has a command to export a list of keywords, that list doesn’t include frequencies. Keywords are stored in Lightroom in a table called AgLibraryTag. Conveniently, Lightroom writes a count of each keyword to the same table, so it’s easy to get out all the information we need. (Note: The frequency in this table is a cached value and may not reflect the up-to-the-minute reality within your database. Rather than constantly update its databases Lightroom seems to update this count when you view the discrete keywords. I’m not sure how to force a library-wide update of all keyword counts. This is probably close enough, and is simpler/quicker than counting keywords image-by-image.)

Rather than run this data through R to build a histogram as I did with the focal length data, I just use awk this time to make a list with the most frequently-used tags at the bottom. With this list, you can easily build a corresponding tag set in Lightroom.

Remember to change paths to suit, and that (on OSX) you’ll probably need to upgrade your version of sqlite for all this to work. Also, always always work from a copy of your database as this script does.

# display a sorted list of lightroom keywords cp ~/Pictures/Lightroom/Lightroom\ Catalog.lrcat ~/lightroom.lrdb /usr/local/bin/sqlite3 -csv ~/lightroom.lrdb 'select ImageCountCache, name from AgLibraryTag where kindName="AgKeywordTagKind";' > /Users/alan/lr-keywords.csv awk -F , '{print $1" "$2}' lr-keywords.csv | sort -n rm ~/lightroom.lrdb rm ~/lr-keywords.csv

Daydream: A map of how keywords relate to one another would be awesome.

Trains composite

Trains composite

I love the discoverability of Lightroom. First you find how neat it works to just manage lots of photos — and that process gets deeper as you use it more — and then you get better at processing raw images, and then you start to explore the other modules and see how the enable you to do creative things all in a single package.

Photo tool nrrdery

Update 6/27/2007: Lightroom v1.1 is out (it’s real, and it’s spectacular, see the O’Reilly Lightroom Blog for more), and it changes the database from a “library” to a “catalog.” In terms of this little tool, this change seems to only entail changing the filename referred to in the wrapper shell script — as I’ve done, below. Otherwise, generating a focal length histogram seems to work just as it did previously.

Camera Nrrdery

For fun: You can see that I use my fixed 50mm and 21mm lenses far more than anything else I’ve got. That’s because they’re so very pretty.

I use Adobe’s Lightroom to manage my RAW photos. It’s a wonderful, splendid tool. Among its features, it provides a handy metadata browser of your photo library, and includes the ability to browse by lens. Recently James Duncan Davidson mentioned being interested in plotting his use of various focal lengths, and commenters responded with a number of good solutions. Since Lightroom uses a SQLite database for its library, tools like SQLite Browser can be used to scan through the database file itself and export tables, at which point it’s straightforward to grep and find focal lengths. This is pretty slick all by itself, but I thought I’d put together a quick tool to automate the extraction and generation of this data. To do that, I use sqlite3 from the command line to dump the metadata table to a file, and then a short bit of R code finds the focal lengths and builds the histogram. The sqlite3 commands and the R code are invoked via a shell script that makes a copy of the main database to work with and cleans up the temp file when it’s all done.

Howto

If you made this this far, you might actually be interested in how it’s all done. After some tinkering, I found from Jeffrey Friedl’s Blog that Lightroom’s current database needs a newer version of sqlite3 than that which ships with OSX. With that update installed, sqlite3 will handle your Lightroom database without any problems.

Here’s the shell wrapper. Change paths to suit:

#!/bin/bash cp ~/Pictures/Lightroom/Lightroom\ Catalog.lrcat ~/lightroom.lrdb /usr/local/bin/sqlite3 -csv ~/lightroom.lrdb 'select xmp from Adobe_AdditionalMetadata;' > /Users/alan/lr-metadata.csv R CMD BATCH /Users/alan/bin/lr-getfocallengths.R rm ~/lightroom.lrdb rm ~/lr-metadata.csv convert ~/lr-focallengths.pdf ~/lr-focallengths.jpg

And here’s the R code, which lives in lr-getfocallengths.R and is called by the shell script. Again, fix paths for your own circumstances:

lr <- file("/Users/alan/lr-metadata.csv", "r") lrlines <- readLines(lr) temp <- gsub("(/1)", "", lrlines[grep("exif:FocalLength>", lrlines)]) lengths <- as.numeric(gsub("([^[:digit:]])", "", temp)) lengths<-lengths[lengths<=1000] pdf("/Users/alan/lr-focallengths.pdf") hist(lengths, main="Histogram of Focal Length Use",    xlab="Focal length (mm)", ylab="Number", breaks=seq(0,200, by=4)) dev.off()

A few things to note:

  • Depending on how your version of R is compiled, you can use jpeg(…) instead of pdf(…) to make the output file. My R isn’t currently compiled with jpg support, so I build a pdf file and then use convert on it.
  • There’s some noise in the metadata that leads to the erroneous identification of focal lengths like 83456000. That’s not right at all. I skim off everything above 1000 in line 5 of the R code. (Which is still sort of silly. My longest lens is presently 200mm.)
  • Relatedly, the x axis of the histogram only goes up to 200. To change that, modify the seq(0,200, by=4) accordingly — you can change the upper bound as well as the width of the bins.
  • A really slick way to do all this would be to properly parse the exported table in order to combine data, in order to limit the data to, for example, “favorites” by focal length. These aren’t in their own fields in the database, however, but rather all within a single column that holds all an image’s metadata, which makes it harder to select on multiple conditions. That’s a trick for another day.

About, the short version

I’m a sociologist. This site is powered by Textpattern, TextDrive and the sociological imagination. For more about me and this site, see the long version.

Syndicate me with any of the following: Atom, RSS, sociology and linklog feeds are available.