The year in Lightroom, by the numbers

I started last year to play with pulling data right out of my Lightroom catalog. How fun to combine interests in photography with my need to make data out of things. Last year about this time I posted some 2007 photo stats, and with the release of Lightroom 2 I came up with some keyword network maps of my flickr images.

Over at The Online Photographer, Marc Rochkind did some writing about meta metadata and released a tool for OS X that produces much more summary information than I had previously considered: His tool produces by-lens statistics on cropping and aspect ratio in addition to focal length usage. This generated some thoughtful conversation about composing in the viewfinder versus cropping, and Marc’s work spurred me to think more about my own stats, and so I went back to my own Lightroom 2 catalog with the sqlite browser and R to see if I could reproduce for myself some of the more interesting data that Marc’s tool generated. After some tinkering, I think I have a functional, reusable set of R tools for generalized reporting of Lightroom image data.

Like Marc’s ImageReporter, I can filter by image type, picks, ratings, Quick Collection, camera model (though this matters less for me since I have one P&S and one DSLR) and time period, and I added filtering by color label as well — hey, just for fun, even though I don’t use the color labels (I generally get rating fatigue using anything more than picks.)

So, what do I have? First, a reproduction of the stats I checked out last year: Monthly photos and focal length:

The year in Lightroom

I continue to primarily use my prime lenses, and my picture-taking appears to have notched down dramatically as compared to 2007. This is partly because of work, of course, but also because I’ve become much more selective about what I actually keep in the catalog.

We can break out focal length a bit more. For the two zooms that I use on my K100D, what are the mean focal lengths?

> lensFL [1] 5.8-23.2 mm 15 [3] 85.0 mm f/1.8 85 [5] smc PENTAX-DA 18-55mm F3.5-5.6 AL 31 [7] smc PENTAX-DA 21mm F3.2 AL Limited 21 [9] smc PENTAX-DA 50-200mm F4-5.6 ED 121 [11] smc PENTAX-DA 70mm F2.4 Limited 70 [13] smc PENTAX-FA 35mm F2 AL 35 [15] smc PENTAX-FA 50mm F1.4 50

So that’s kind of interesting, suggesting that I use the 200mm zoom at about the middle of its range. But the mean isn’t necessarily informative. Here’s a plot of focal length for one of those zooms:

Focal lengths plot, DA 50-200mm lens, 2008

So, I use the 50-200mm lens primarily for shots at either extreme of its length, and I already have a 50mm fixed lens that takes better photos than the zoom at that distance. Moreover, breaking out just picks with this lens shows a three-to-one preference for 200mm than for 50mm. I think that means I need a long prime. Ka-ching!

I can also consider crop: How am I doing at composing in-camera? Here’s how often I crop, by lens, as a percentage:

	smc PENTAX-DA 18-55mm F3.5-5.6 AL   9.13 %
	smc PENTAX-DA 21mm F3.2 AL Limited 17.67 %
	smc PENTAX-DA 50-200mm F4-5.6 ED    6.93 %
	smc PENTAX-DA 70mm F2.4 Limited    23.78 %
	smc PENTAX-FA 35mm F2 AL           10.71 %
	smc PENTAX-FA 50mm F1.4            24.67 %

And, when I do crop, how much of the original composition do I keep?

	smc PENTAX-DA 18-55mm F3.5-5.6 AL  78.3 %                            
	smc PENTAX-DA 21mm F3.2 AL Limited 81.8 %                            
	smc PENTAX-DA 50-200mm F4-5.6 ED   81.6 %                            
	smc PENTAX-DA 70mm F2.4 Limited    80.9 %                            
	smc PENTAX-FA 35mm F2 AL           83.4 %                            
	smc PENTAX-FA 50mm F1.4            82.5 %

So, I’m cropping quite a bit. As Marc found in his exploration, these numbers go up when I filter by picks. I was surprised that I crop as much as I do with the DA21mm in particular, since I think of my use of it as being mostly for wide landscapes; but even those often enough are a bit crooked, enough to warrant at least some adjustment of tilt —- and Lightroom calls that adjustment a crop (fairly).

Does cropping mean I do a poor job at composing in-camera? Possibly. I have to admit that knowing I can crop gives me a conscientious freedom when I’m shooting, but these numbers give me something to think about. Maybe careful composition will be something to work on as I go forward.

We can cut all this in a few other ways. I’d like to take a look at my common keywords during a given time period, for example, but that will wait for the follow-up post, I think. This is more than enough nerdery for one January 1st afternoon.

Photo tool nrrdery

Update 1/1/2009: I’ve built some much more sophisticated sets of stats from my Lightroom 2 catalog.

Update 6/27/2007: Lightroom v1.1 is out (it’s real, and it’s spectacular, see the O’Reilly Lightroom Blog for more), and it changes the database from a “library” to a “catalog.” In terms of this little tool, this change seems to only entail changing the filename referred to in the wrapper shell script — as I’ve done, below. Otherwise, generating a focal length histogram seems to work just as it did previously.

Camera Nrrdery

For fun: You can see that I use my fixed 50mm and 21mm lenses far more than anything else I’ve got. That’s because they’re so very pretty.

I use Adobe’s Lightroom to manage my RAW photos. It’s a wonderful, splendid tool. Among its features, it provides a handy metadata browser of your photo library, and includes the ability to browse by lens. Recently James Duncan Davidson mentioned being interested in plotting his use of various focal lengths, and commenters responded with a number of good solutions. Since Lightroom uses a SQLite database for its library, tools like SQLite Browser can be used to scan through the database file itself and export tables, at which point it’s straightforward to grep and find focal lengths. This is pretty slick all by itself, but I thought I’d put together a quick tool to automate the extraction and generation of this data. To do that, I use sqlite3 from the command line to dump the metadata table to a file, and then a short bit of R code finds the focal lengths and builds the histogram. The sqlite3 commands and the R code are invoked via a shell script that makes a copy of the main database to work with and cleans up the temp file when it’s all done.


If you made this this far, you might actually be interested in how it’s all done. After some tinkering, I found from Jeffrey Friedl’s Blog that Lightroom’s current database needs a newer version of sqlite3 than that which ships with OSX. With that update installed, sqlite3 will handle your Lightroom database without any problems.

Here’s the shell wrapper. Change paths to suit:

#!/bin/bash cp ~/Pictures/Lightroom/Lightroom\ Catalog.lrcat ~/lightroom.lrdb /usr/local/bin/sqlite3 -csv ~/lightroom.lrdb 'select xmp from Adobe_AdditionalMetadata;' > /Users/alan/lr-metadata.csv R CMD BATCH /Users/alan/bin/lr-getfocallengths.R rm ~/lightroom.lrdb rm ~/lr-metadata.csv convert ~/lr-focallengths.pdf ~/lr-focallengths.jpg

And here’s the R code, which lives in lr-getfocallengths.R and is called by the shell script. Again, fix paths for your own circumstances:

lr <- file("/Users/alan/lr-metadata.csv", "r") lrlines <- readLines(lr) temp <- gsub("(/1)", "", lrlines[grep("exif:FocalLength>", lrlines)]) lengths <- as.numeric(gsub("([^[:digit:]])", "", temp)) lengths<-lengths[lengths<=1000] pdf("/Users/alan/lr-focallengths.pdf") hist(lengths, main="Histogram of Focal Length Use",    xlab="Focal length (mm)", ylab="Number", breaks=seq(0,200, by=4))

A few things to note:

  • Depending on how your version of R is compiled, you can use jpeg(…) instead of pdf(…) to make the output file. My R isn’t currently compiled with jpg support, so I build a pdf file and then use convert on it.
  • There’s some noise in the metadata that leads to the erroneous identification of focal lengths like 83456000. That’s not right at all. I skim off everything above 1000 in line 5 of the R code. (Which is still sort of silly. My longest lens is presently 200mm.)
  • Relatedly, the x axis of the histogram only goes up to 200. To change that, modify the seq(0,200, by=4) accordingly — you can change the upper bound as well as the width of the bins.
  • A really slick way to do all this would be to properly parse the exported table in order to combine data, in order to limit the data to, for example, “favorites” by focal length. These aren’t in their own fields in the database, however, but rather all within a single column that holds all an image’s metadata, which makes it harder to select on multiple conditions. That’s a trick for another day.

Sweave and typeface control

Sweave has a tendency to preclude latex packages from setting typefaces, apparently. Here’s a much simpler fix (R 2.3.0+) than my hacky solution of using \input or \include to typeset the sweave-generated file in a wrapper file. (I suppose I should upgrade my R installation so that I can actually use this, but I shudder at the thought of updating all those packages again. I’ll put it off for just a few more weeks, I think.)

Sweave and complex projects

A thread on R-Help recently discussed using Sweave/LaTeX for complicated projects. Two really useful tips were highlighted in that conversation—I use the first of them regularly: In the beginning of a Rnw/Snw file, use the prefix.string option to set the location of an includes directory: \SweaveOpts{prefix.string=/Path/to/directory}

This is really useful for organizing all the files that built by your project. (Mine are all directed to an includes directory that lives beneath my main manuscript directory.)

The second tip is to use a makefile to build a project that consists of multiple Sweave files. I like makefiles as much as the next guy, but here is my TextMate-specific solution for the same dilemma: Within the TM project, use the TM_SWEAVE_MASTER variable to name a master file, and in this file, simply plug in a single Sweave section that invokes source (for R-only files) or Sweave for your project’s various files. When you want to build the whole project (for example when you begin work for the day and need to load up all your data) all you do is open up the project and invoke the Sweave -> Sweave Project in R command.

For example, my dissertation project sets TM_SWEAVE_MASTER to “diss-master.Snw,” and that file looks like this:


The several R files do a little bit of data tweaking and build some tables/figures. Once all that data is loaded up, the two Sweave files can be built. I do this once per work session, using the Sweave Project in R command, which makes sure everything in the project is up to date. Subsequently, I can simply Sweave any individual Snw file (using the corresponding TextMate command), without having to recompile the entire project. (This all of course integrates well with using TM_LATEX_MASTER, also set at the project level, to order LaTeX to typeset the overall document.) I have found it to be a really nice and functional workflow.

R console in TextMate

A neat new bit of functionality for TextMate came through the svn pipe today, thanks to Haris:

Log for r5298

Added Console functionality to R. To use it, you need to have up and running. Open a new document and set its language to “R Console”. Start by typing ”> ”. From now on, pretend you are in the’s console.


Sweave Bundle update

Note (mostly) to self: I made a few small updates to the Sweave Bundle for TextMate. Interested parties, you know what to do.

useR! 2006

The second international conference of R users recently took place in Vienna, and the conference site has now posted slides and abstracts of both the keynotes and the regular presentations. There’s a ton of stuff there: Discussions of R for all sorts of statistical and graphics purposes, using R in teaching, and talks about R in a wide variety of disciplines and practices. It’s a gold mine.

Jumping Ship: Moving from emacs to TextMate

Update: Not quite ready to give up all the nice authoring features of emacs, I built, with some tinkering, a reftex-style citation command for the TextMate/LaTeX bundle. It has since been incorporated into the main LaTeX bundle.

Update: The Sweave bundle is updated as of Oct 5 2006. Thanks to Haris for the contributions and improvements.

Yesterday I linked to a screencast that shows off some the neat things that one can do with the math bundle in TextMate. TextMate continues to get better, and it has become my primary editor on OSX. Kieran posted a comment about TextMate’s relative lack of functionality with regard to LaTeX and R:

I was looking into TextMate but its latex and R support is still fairly basic — there’s no real equivalent to auctex/reftex’s functionality, and the R bundle is rudimentary. This is a pity, as it seems like a really powerful environment, and for some time I’ve been looking for a way to escape from Emacs and use an OS X native, modern editor/IDE. Maybe soon.

Kieran is right in part: Auctex and Reftex are excellent additions to emacs and TextMate can’t yet match them. But it does offer some nice advantages over emacs, so I thought I’d write up a few thoughts on my transition to TextMate and the ways I’ve found to compensate for no longer having access to my beloved C-c C-c RET.

Why switch

OS X is a pleasant working environment, and even builds of emacs that are meant to fit nicely in that environment still don’t feel native. Aquamacs is one such attempt. For brand new users of emacs, Aquamacs may be a good tool, but for those of us with pre-existing byzantine .emacs files, Aquamcs adds a whole additional level of confusion by changing keybindings and introducing a whole new set of configuration options. Despite all the attempts to make it more modern, setting one’s typeface in the editor remains a frustrating exercise. My machine is relatively modern and speedy, and emacs still takes a good long time to load fully, even after pruning unnecessary cruft from my config file. And once loaded, there are just enough interface differences to be jarring: scroll bars, for example, are something that most emacs builds have never really sorted out. It’s 2006; can I please get a scroll bar that works the same way as every other scroll bar on my machine?

It’s not about bling, however. As projects such as my dissertation grew in size — multiple data, LaTeX, R, and Sweave files spread around the place — the organization of all that material started to occupy an increasing chunk of my cognitive space. “Where’s file goober, and how is it related to file data?” Although emacs handles lots of files just brilliantly, and switching between them is a snap if you’ve loaded the right iswitch-b package, it doesn’t help much with the organization end of things. That was my real original incentive to switch: I wanted my software to take a little of the load off of my brain, and perhaps to do it a little more quickly.

Finally, switching is a nice opportunity to review the way I get work done and think conscientiously about how to improve. On the flip side, it’s also a nice way to structurally procrastinate.

Why not switch?

Emacs is powerful. It can read email, browse the web, make a fully-functional wiki right on your desktop, and, on those occasions when appropriate, it can edit text files with championship ability. It is a mature working environment with brilliant integration with LaTeX, BibTeX, and R. Just using it can make one feel like a ninja, albeit a meek, deskbound one with rapidly deteriorating vision and nascent repetetive stress disorder in the wrists. As Kieran commented, nothing quite approaches the combination of AucTeX and RefTeX. I’m still reaching for those key-bindings that, alas, don’t work in TextMate no matter how many times I C-c [ them.

Built-in magic

TextMate immediately addressed my core reason to switch with its handling of projects. It has a project drawer into which you can simply drag files and folders, create separators, and arbitrarily organize them all. It seems like a small thing, but the ability to see all the files that comprise a project, and then navigate them easily, is something that’s a) really important in order to have a clear sense of what I’m working on, and b) remarkably difficult in emacs.




Navigating those files is easy, as well: You can find and click in the project list, switch tabs with the keyboard, or hit cmd-T to bring up a file browser that finds files as you type: “cmt-T cl” narrows the list to those files that match the pattern “cl.” It pretty nicely approximates the autocompletion of switching buffers in emacs.


Many pieces loosely joined

TextMate is, like emacs, extensible almost to the point of absurdity. The architect of this extensibility built TextMate to hook into virtually any programming language, shell command, and external application. TextMate comes with built-in support for LaTeX and BibTeX compilation, as well as completing citations and labels within LaTeX documents. The latter function isn’t nearly as slick as using RefTeX, but it works fairly well. There are a few useful screencast demonstrations of those features. Haris, the author of those screencasts, has contributed tremendously to cite key and label completion — they work pretty well, thanks much to him.

I’ve made what I think are a few improvements to existing bundles in order to faciliate my own work: I’ve modified the LaTeX compile command to switch to xelatex if necessary, for example.

More in depth, but still fairly simple, is my rudimentary Sweave bundle for TextMate.1 TextMate allows one to set environment variables at the global or project level, so, for instance, I can assign a “master document” variable to my dissertation. This allows me to generate LaTeX output from a single in-process Sweave file, or, with an alternate command, to re-run the entire Sweave project through R and then begin LaTeX compilation. With the bundle TextMate (mostly; it’s still a work in progress) correctly parses Sweave files, allowing for context-sensitive actions depending on the position of the caret in a file: Within a Sweave document, I can generate LaTeX, compile that associated LaTeX file, send selected code to R, or build the entire master document. It works pretty slick now that it’s set up. Whereas in Emacs, the ties between various files was frequently opaque, I’ve found that keeping track of those relationships and compiling documents is more transparent and much easier.

Is it worth it?

For me, it has been worth it. The tinker to work ratio starts out pretty high, but that’s not unusual. Breaking emacs habits is tougher, even after four months, and I’d love a citation mode that works more akin to that found in RefTeX — the ability to invoke the command and then choose citation types, for example — and I still miss some of the enhancements from AucTeX; it trained me too well to C-c C-s to insert a section, for example. The ease with which one can build bundles and interface with external applications suggests that it won’t be long before someone may start building equivalent tools, and that will be a happy day.

In the meantime, TextMate is fast, allows me to visualize my projects, and works well enough with the other applications I use, as well as within my workflow, to justify the switch. It’s a good app, and it has improved my work.

Revise and extend (ie, updated)

I forgot to mention another issue about switching. Up until now, my tools have been almost entirely cross-platform for the past five or six years. TextMate is OS X-specific, so I can’t smoothly use the same set of tools on the Windows laptop like I could with emacs. This gives me some pause: It’s nice to have a mostly universal workflow, in which I could sit down at a PC, Mac, or linux machine, sync some files, and work. But over the past year with the iMac, I’ve picked up a few other non cross-platform tools: BibDesk is a great BibTeX manager, and I’ve been doing a ton of stuff using OmniOutliner Pro in the past handful of months, so switching away from emacs on one platform isn’t as much of a transition on that front as it might have been a year ago. Besides, the Mac is a nice platform to work on. And, hey, one of these days I’ll be able to trade up the Toshiba for a shiny new *Book of some kind.

1 The SWeave bundle is now distributed as a regular TextMate Bundle via the subversion bundle repository. [ return to text ]

R-Help, 1 April edition

The R-Help mailing list is a wealth of information. While I have no doubt (well, mostly) that the people who frequent the mailing list are all nice people in real life, woe is the newbie who asks a question that is easily answerable by consulting one of a half-dozen arcane texts, conducting an exhaustive search of list archives, or using R’s internal help system. “Read the posting guide!” will be accompanied by a curt response that often suggests how truly easy it was to find this answer for anyone not still working on their own cell division. (Okay, it’s not really that bad, but it can certainly be an intimidating place due to the sheer number of super-smart residents who have little tolerance for perceived time-wasters.)

This all adds up to my not being quite sure how to take today’s April Fool’s posts by list heavyweight Frank Harrell.

I have never taken a statistics class nor read a statistics text, but I am in dire need of help with a trivial data analysis problem for which I need to write a report in two hours. I have spent 10,000 hours of study in my field of expertise (high frequency noise-making plant biology) but I’ve always thought that statistics is something that can be mastered on short notice.

Briefly, I have an experiment in which a response variable is repeatedly measured at 1-day intervals, except that after a plant becomes sick, it is measured every three days. We forgot to randomize on one of the important variables (soil pH) and we forgot to measure the soil pH. Plants that begin to respond to treatment are harvested and eaten (deep fried if they don’t look so good), but we want to make an inference about long-term responses.

There’s more, including a couple of helpful responses, so you know, read the whole thing. The message ends with this conclusion, which is actually fairly representative of a good number of frantic help-me posts: “I would appreciate receiving a few paragraphs of description of the analysis that I can include in my report, and I would like to receive R code to analyze the data no matter which variables I collect. I do value your time, so you will get my everlasting thanks.”

Take-home message: Read the posting guide, design your analysis carefully, and don’t look crossways at Frank Harrel in a dark alley.


Looking up some information on ordinal logit models in R today, I came across Zelig. Produced by Kosuke Imai, Gary King, and Olivia Lau, Zelig is a sort of R meta-package that wraps, for example, various fuctions of the MASS or VGAM libraries into a smaller set of commands. The authors subtitled Zelig “Everyone’s Statistical Software,” with the idea that it will make R more accessible while not cutting off any of its power. Zelig appears to be a couple of years old now, but I hadn’t run into it before; maybe I’m hanging around on the wrong mailing lists?

Nice find: Dataninja

Dataninja was just the right site to stumble across tonight. The production of an “economist and (future) economics PhD student,” Dataninja is packed full of good data and workflow stuff: Techniques to convert from spreadsheets to LaTeX code, tips for working with Stata, R pointers (including homemade reference cards), applescripts, programming tools, links to data sets, and more. As they say, read the whole thing.

Packed full. It’s a great resource.

About, the short version

I’m a sociologist-errant. This site is powered by Textpattern, Pair Networks and the sociological imagination. For more about me and this site, see the long version.

RSS feed