schussman.com logo

Local culture revisited

Several years ago I stumbled over the Netflix “local favorites” list and had a good time exploring it. Well, the New York Times has gone and made a really cool presentation of that data, for 2009, for a dozen U.S. cities. Check it out. Good stuff.

Another year of photo data!

Following in the modest two-year tradition I’ve established (see 2007 and 2008 posts), here is my 2009 photo data from my Lightroom catalog!

[ quick howto: Lightroom 1 & 2 (and 3) databases are in sqlite3 format, which means that freely-available tools can extract data from them. I use sqlite3, some shell scripting, and R (and occasionally excel) to produce summaries of that data. Why? Data offers some insight into the kinds of photos I take. Mostly, though, it’s fun. I’d be happy to expand on the actual code that goes into these plots, if there’s interest. ]

Below is a set of plots that summarize some of this year’s data. Click through to flickr to see the larger version.

2009 photo data!

What’s interesting this year? Well, crop ratios looked pretty similar to last year, so this year, for the first time (suggested in a post by Keitha, whose photos I admire tremendously, and whose Pentax lens set I envy with the fire of a million anti-glare-coated nine-aperture-bladed all-metal suns) I pulled out some information about aperture for each of the prime lenses that I shoot with. You can see these four frequency plots (for each of the Pentax DA 70mm F2.4 ltd, FA 50mm F1.4, FA 35mm F2.0 and DA 21mm F3.2 ltd lenses) in the left hand column of the image. Right off the bat you can see that I shot a lot with the FA 35mm this year (which is confirmed by the “overall lens use” plot on the right column). In fact, I took that lens along as my sole lens on a few long weekend trips to Ventura, CA, and the San Juan Islands, and really loved its performance. It does great at large apertures, but I also used it a lot for street shooting at f/8 and smaller apertures.

Runner-up in frequency this year is the FA 50mm F/1.4, which ordinarily I would say is my favorite lens (and it very much still is; it just wasn’t as convenient a focal length to take as my only lens on those vacations). Its sweet spot [where it’s sharpest but still has a nice narrow depth of field] is about F/4, which is where I primarily use it.

Neither the DA 70mm F/2.4 or the DA 21mm F/3.2 got as much use this year, but I really love some of the photos I took with those lenses. In fact, I carried these two lenses specifically for their light weight and trim size on the Flagstaff photowalk I organized in July.

Car / Cat Ranch house / wide Crow Pomegranite Backside Doorman

How did 2009 stack up to 2008? In terms of absolute frequency, nearly identical! I kept 1308 frames last year, compared to keeping 1340 in 2008. Far fewer of those are picks, or posted to flickr — though a good number are waiting for me to come back to, to finish workup or to make a print.

And that’s it for the 2009 photo stats! I did re-work my keyword network code, so perhaps can follow up this post with a little more about keyword relationships.

If you’d like to know more about extracting and summarizing info from your own Lightroom catalog, please let me know (and check out my other lightroom-related posts)

And, as last year, I hope soon to follow up with a report on my 2009 photo goals, and to set a few for 2010.

Visibility

Energy visibility

Our electric utility recently replaced our meter box with a digital one that somewhat magically sends its data back to the mothership. The cool bonus of this is that we can track our energy usage along a number of metrics. The online monitoring application isn’t super-sophisticated and I’d like more flexibility in how it aggregates data, but it does give us a window into how we use power. Not surprisingly, our usage increases in the morning, evening, and weekends. But what we hadn’t expected to see was just how much the usage spiked when the hot tub timer switched on. Turns out it’s a whole lot easier to not be “hot tub people” when you can see just how much power it’s using (and money costing).

Coming around again

Matt Yglesias asks “What are Today’s Protests Missing?” Turns out he asked much the same question a few years ago, and I had some thoughts at the time about what seems to be a common feature of both the left and right: When compared to the protest of ye old days, contemporary mass mobilization is greeted by public intellectuals with a sigh and either one of a) regret that it isn’t ye old days anymore when protests were coherent and organized, or b) dismissive sneering about how the hippies have never been good for anything and still aren’t good for anything.

This time around, Matt makes a really important point, that coherence of movements often is really only sensible in hindsight:

Both Gandhi and King led movements that were committed to vaguely defined and quite sweeping visions of social change that, among other things, included opposition to capitalism and all forms of war. Their goals look well-defined in retrospect because they achieved a great deal so, in retrospect, MLK’s leadership resulted in the Civil Rights Act and the Voting Rights Act and Gandhi’s leadership led to independence for India. But all mass-movements are prone to ill-defined goals.

That’s a part of one of the key observations I made in response to this same thread a few years ago:

The single largest event of the period was a Washington, D.C., antiwar rally of November 15, 1969, attended by an estimated 250,000 people. A quick read of the coverage of that weekend—like yesterday’s march, it really was a series of events, not a single event—demonstrates that participants were there to take part for many reasons, although they all ended up under the anti-war banner: Students protested the draft; religious activists ranging from Catholic to Quaker participated; radical leftists were there, as were elderly women and parents with their children, as were small groups seeking violent confrontations; also present were African American organizers and advocates for the poor, protesting the war’s diversion of funds from domestic programs. This is still an oversimplified list of participants; it’s clear that while the war was the most tangible target of the protests, many grievances actually brought protesters out. Like this weekend’s march, officially organized by United for Peace and Justice, that series of events had a nominal set of organizers, but plenty of other groups also participated. In a sister protest across the country, where another 100,000 people demonstrated, Physicians for Social Responsibility and the Gay Liberation Front were among notable organizations represented.

This is not to say that the context for contemporary protest hasn’t changed: Political opportunity structure is different, modes and tools of mobilization are transforming, and movement organizations are functioning in some very different ways. But we need to be aware of the reality of the good old days of American protest in order to make sense of what has changed and what hasn’t changed.

 

Update: Brayden King, one of my old office-mates, has more thoughts on this topic. Typically for him, it’s good, smart, well-researched stuff.

Facebook network visualization

Quick ‘n dirty visualization of the clusters of relationships among my facebook friends:

Facebook network visualization

Data generated with Bernie Hogan’s My Online Social Network app on facebook, and visualized with GUESS. Good stuff, Bernie!

Thanks to Marc Smith — he’s one of the nodes up there — for the link to the flickr version of this image over at Connected Action.

Brayden King and Kieran Healy (they’re up there in my visualization, too) have posted their own plots over at orgtheory: one, two.

The year in Lightroom, by the numbers

I started last year to play with pulling data right out of my Lightroom catalog. How fun to combine interests in photography with my need to make data out of things. Last year about this time I posted some 2007 photo stats, and with the release of Lightroom 2 I came up with some keyword network maps of my flickr images.

Over at The Online Photographer, Marc Rochkind did some writing about meta metadata and released a tool for OS X that produces much more summary information than I had previously considered: His tool produces by-lens statistics on cropping and aspect ratio in addition to focal length usage. This generated some thoughtful conversation about composing in the viewfinder versus cropping, and Marc’s work spurred me to think more about my own stats, and so I went back to my own Lightroom 2 catalog with the sqlite browser and R to see if I could reproduce for myself some of the more interesting data that Marc’s tool generated. After some tinkering, I think I have a functional, reusable set of R tools for generalized reporting of Lightroom image data.

Like Marc’s ImageReporter, I can filter by image type, picks, ratings, Quick Collection, camera model (though this matters less for me since I have one P&S and one DSLR) and time period, and I added filtering by color label as well — hey, just for fun, even though I don’t use the color labels (I generally get rating fatigue using anything more than picks.)

So, what do I have? First, a reproduction of the stats I checked out last year: Monthly photos and focal length:

The year in Lightroom

I continue to primarily use my prime lenses, and my picture-taking appears to have notched down dramatically as compared to 2007. This is partly because of work, of course, but also because I’ve become much more selective about what I actually keep in the catalog.

We can break out focal length a bit more. For the two zooms that I use on my K100D, what are the mean focal lengths?

> lensFL [1] 5.8-23.2 mm 15 [3] 85.0 mm f/1.8 85 [5] smc PENTAX-DA 18-55mm F3.5-5.6 AL 31 [7] smc PENTAX-DA 21mm F3.2 AL Limited 21 [9] smc PENTAX-DA 50-200mm F4-5.6 ED 121 [11] smc PENTAX-DA 70mm F2.4 Limited 70 [13] smc PENTAX-FA 35mm F2 AL 35 [15] smc PENTAX-FA 50mm F1.4 50

So that’s kind of interesting, suggesting that I use the 200mm zoom at about the middle of its range. But the mean isn’t necessarily informative. Here’s a plot of focal length for one of those zooms:

Focal lengths plot, DA 50-200mm lens, 2008

So, I use the 50-200mm lens primarily for shots at either extreme of its length, and I already have a 50mm fixed lens that takes better photos than the zoom at that distance. Moreover, breaking out just picks with this lens shows a three-to-one preference for 200mm than for 50mm. I think that means I need a long prime. Ka-ching!

I can also consider crop: How am I doing at composing in-camera? Here’s how often I crop, by lens, as a percentage:

	smc PENTAX-DA 18-55mm F3.5-5.6 AL   9.13 %
	smc PENTAX-DA 21mm F3.2 AL Limited 17.67 %
	smc PENTAX-DA 50-200mm F4-5.6 ED    6.93 %
	smc PENTAX-DA 70mm F2.4 Limited    23.78 %
	smc PENTAX-FA 35mm F2 AL           10.71 %
	smc PENTAX-FA 50mm F1.4            24.67 %

And, when I do crop, how much of the original composition do I keep?

	smc PENTAX-DA 18-55mm F3.5-5.6 AL  78.3 %                            
	smc PENTAX-DA 21mm F3.2 AL Limited 81.8 %                            
	smc PENTAX-DA 50-200mm F4-5.6 ED   81.6 %                            
	smc PENTAX-DA 70mm F2.4 Limited    80.9 %                            
	smc PENTAX-FA 35mm F2 AL           83.4 %                            
	smc PENTAX-FA 50mm F1.4            82.5 %

So, I’m cropping quite a bit. As Marc found in his exploration, these numbers go up when I filter by picks. I was surprised that I crop as much as I do with the DA21mm in particular, since I think of my use of it as being mostly for wide landscapes; but even those often enough are a bit crooked, enough to warrant at least some adjustment of tilt —- and Lightroom calls that adjustment a crop (fairly).

Does cropping mean I do a poor job at composing in-camera? Possibly. I have to admit that knowing I can crop gives me a conscientious freedom when I’m shooting, but these numbers give me something to think about. Maybe careful composition will be something to work on as I go forward.

We can cut all this in a few other ways. I’d like to take a look at my common keywords during a given time period, for example, but that will wait for the follow-up post, I think. This is more than enough nerdery for one January 1st afternoon.

My Lightroom 2 Backup Strategy

Related: A more recent post about archive and backup in Lightroom.

With this morning’s comment from martie asking about a crashed hard drive, I got to thinking about making my own Lightroom 2 backup plan a bit more automated and reliable. My general approach is to periodically copy my catalog file and image directories to an external hard drive, but there’s been nothing systematic about it until now.

I’ve previously described a bit of my Lightroom file structure, noting that I import new photos into a single directory per import. As part of a strategy to save space on the MacBook where I do my actual work, I periodically move those folders to an external hard disk currently named Grundle. This is simply a matter of dragging the folder, in the left-hand directories pane of Lightroom, from one hard drive to another.

Lightroom display of multiple drives

While this copying step is manual, the rest of the system is now automated, thanks to this tutorial at MacResearch and a bash script by Aidan Clark. The bash script took just a bit of tinkering to work with Lightroom’s catalog file, which by default will have rsync-breaking spaces in it, and to perform the second backup from the external volume to the iMac. I’ll post those specific and very minor modifications if there is interest.

Here’s the final result: Using OS X’s launchd tool, whenever I mount Grundle on my MacBook, whether via network or direct firewire connection, my Lightroom 2 catalog file is copied to Grundle using rsync. And, whenever I mount Grundle on the upstairs iMac, a similar combination of launchd and rsync copies both the catalog file and the image directories from Grundle to the iMac. This means that in the course of regular use of my two Macs and that external drive, both my Lightroom catalog and folders full of images get backed up.

One caveat to this system is that the backup of the image folders still involves that manual step of moving them from the laptop to Grundle. I could automate this the same way the catalog backup is done, but that could potentially mean trying to backup a gig or more at a time over the wifi network — a time- and bandwidth- consuming process that isn’t really necessary. Now, the obvious down-side is that the newest photos I’ve taken are always the ones most vulnerable to data loss, and that’s obviously not a highly desirable thing. But I’m satisfied with my current workflow of moving folders to Grundle generally when I’m done working with that set of images. I’ll continue to think about this situation and may come up with some additional redundancy for that stage of processing.

Update: Okay, I buckled. A bit more tinkering and I now have my current folders of raw images copied to Grundle. After I relocate the folder using Lightroom, the folder will disappear from the backup directory, so I don’t have redundant backup files stacking up anywhere. Nice and clean, and everything’s safe.

Update the second: One item I neglected to mention in the original post was the automated backup feature built in to Lightroom: Available in the catalog settings menu (alt-cmd-,) this feature performs scheduled backups of your catalog file only, to a location you specify, and can be set to run a backup on any of several schedules. My process above includes allowing that backup to run weekly — it never hurts to have a little more failsafe security. The benefit of automating getting that backup to another hard drive is one more important layer of keeping safe your data.

mycrocosm beats me out of the gate

A little while back I had a fun idea: I bet I could use twitter to collect and store little, ad-hoc data statements; with a simple parser, those statements could be used to make data. A little ad-hoc database right inside twitter! I even got myself a domain name where I could tinker with it.

Well, mycrocosm beat me to it. It’s cool. It makes graphs. Rad. Exhibit A, on my time spent engaged with the Olympics:

10000

Also. Tinkering with mycrocosm, I found Google Charts. Holy smokes!

Update

And today I see daytum, another service of the same sort. It’s invitation-only, dammit. But it looks cool.

Lightroom 2: Related Keywords are Dreamy

As it happens, Lightroom 2.0 has just the thing I daydreamed about a handful of months ago. The new version’s data includes a table of keyword co-occurrences that makes it possible to produce things like this:

My flickr tag neighborhood

This graph shows keyword relationships that occur within a hop from my “flickr” keyword — which I use to keep track of photos that I upload there. In other words, it’s sort of a descriptive keyword neighborhood of what I’ve put up on flickr.

Color is a little subjective. The darker blue, the higher the ratio between unique neighbors and total neighbors. That is, darker blue nodes are connected to relatively few unique other neighbors than the lighter blue nodes.

Of course, you could use any focal keyword for this kind of thing: Starting with a lens-specific keyword would produce a rough map of the neighborhood associated with that lens, and might reveal how I tend to use that lens. The possibilities are pretty endless — and totally a fun kick in the pants to tinker with.

Dots and lines

Sparklines are fun to tinker with and can provide quick glimpses of data. Here are some not-quite-realtime twitter sparklines, built with this small and useful tool and a bit of scripting. 30-days of twitter: How about a change plot: Or, if you like, the straight-up histogram:

Feelin' fine

Way cool:

(image page at flickr)

We Feel Fine aggregates and provides clicky-feely visualizations of expressions of emotions online, via text found in blogs, flickr pages and google.

I spent a good chunk of today trying to figure out why a single dumb plot was coming out all hinky; these guys have colored affect balls swirling apparently effortlessly around your mouse cursor. I feel inadequate, sure, but I feel wildly enthusiastic, as well. This is cool stuff.

(Via Chris at Ruminate.)

Nice find: Dataninja

Dataninja was just the right site to stumble across tonight. The production of an “economist and (future) economics PhD student,” Dataninja is packed full of good data and workflow stuff: Techniques to convert from spreadsheets to LaTeX code, tips for working with Stata, R pointers (including homemade reference cards), applescripts, programming tools, links to data sets, and more. As they say, read the whole thing.

Packed full. It’s a great resource.

Data collection

I very much enjoyed Drek’s thoughts about data today, and I am looking forward to his following up on this post with some discussion of important elements of research design: For example, the differences between collecting experimental data, conducting various sorts of field research, and performing simulations.

Easier done than said

Outrage fatigue has set in, making it hard to get steamed about stuff like this anymore. These guys just stand up and lie, with contrary evidence right in front of them. We get lies about the economy, lies about the tax cut, and lies about going to war.

The same continues to happen with regard to Tim Lambert’s ongoing whacking of John Lott with the honesty stick. On the efforts among Lott supporters to debunk a study that contradicts their own “research,” Lambert points out that, contrary to repeated claims otherwise, the study’s data is publicly available from ICPSR.

I checked, and yep, Lambert’s right. It took exactly seven seconds and a single click of a “search” button to find the study and whole mess of downloadable data.

Evidence. Right there. Data. Available. How do people get away this this crap? Unfortunately, the ability to readily disprove an egregious lie—er, excuse me, “extension of the truth” as I’m told we’re calling it now—seems to be easier done than said.

Edit: Oops. Accidentally dropped the “S” from ICP*S*R (to the joy of political scientists?).


About, the short version

I’m a sociologist-errant. This site is powered by Textpattern, TextDrive Joyent and the sociological imagination. For more about me and this site, see the long version.

Copyright and so forth: Commenters own their own posts, and linked or excerpted material is subject to whatever copyright covers the original. Everything else here is mine, rights reserved.

RSS feed