Getting to the data in Adobe Lightroom–with or without PowerShell
Some Adobe software infuriates me (Flash), I don’t like their PDF reader and use Foxit instead, apps which use Adobe-Air always seem to leak memory. But I love Lightroom. It does things right – like installations – which other Adobe products get wrong. It maintains a “library” of pictures and creates virtual folders of images ( “collections” ) but it maintains metadata in the images files so data stays with pictures when they are copied somewhere else – something some other programs still get badly wrong. My workflow with Lightroom goes something like this.
- If I expect to manipulate the image at all I set the cameras to save in RAW, DNG format not JPG (with my scuba diving camera I use CHDK to get the ability to save in DNG)
- Shoot pictures – delete any where the camera was pointing at the floor, lens cap was on, studio flash didn’t fire etc. But otherwise don’t edit in the camera.
- Copy everything to the computer – usually I create a folder for a set of pictures and put DNG files into a “RAW” subfolder. I keep full memory cards in filing sleeves meant for 35mm slides..
- Using PowerShell I replace the IMG file-name prefix with something which tells me what the pictures are but keeps the camera assigned image number.
- Import Pictures into Lightroom – manipulate them and export to the parent folder of the “RAW” one. Make any prints from inside Lightroom. Delete “dud” images from the Lightroom catalog.
- Move dud images out of the RAW folder to their own folder. Backup everything. Twice. [I’ve only recently learnt to export the Lightroom catalog information to keep the manipulations with the files]
- Remove RAW images from my hard disk
There is one major pain. How do I know which files I have deleted in Lightroom ? I don’t want to delete them from the hard-disk I want to move them later. It turns out Lightroom uses a SQL Lite database and there is a free Windows ODBC driver for SQL Lite available for download. With this in place one can create a ODBC data source – point it at a Lightroom catalog and poke about with data. Want a complete listing of your Lightroom data in Excel? ODBC is the answer. But let me issue these warnings:
- Lightroom locks the database files exclusively – you can’t use the ODBC driver and Lightroom at the same time. If something else is holding the files open, Lightroom won’t start.
- The ODBC driver can run UPDATE queries to change the data: do I need to say that is dangerous ? Good.
- There’s no support for this. If it goes wrong, expect Adobe support to say “You did WHAT ?” and start asking about your backups. Don’t come to me either. You can work from a copy of the data if you don’t want to risk having to fall back to one of the backups Lightroom makes automatically
I was interested in 4 sets of data shown in the following diagrams. Below is image information with the Associated metadata, and file information. Lightroom stores images (Adobe_Images table) IPTC and EXIF metadata link to images – their “image” field joins to the “id_local” primary key in images. Images have a “root file” (in the AgLibraryFile table) which links to a library folder (AgLibraryFolder) which is expressed as a path from a root folder (AgLibraryRootFolder table). The link always goes to the “id_local” field I could get information about the folders imported into the catalog just by querying these last two tables (Outlined in red)
The SQL to fetch this data looks like this to just get the folders
SQLlite is one of the dialects of SQL which doesn’t accept AS in the FROM part of a SELECT statement . Since I run this in PowerShell I also put a where clause in, which inserts a parameter. To get all the metadata the query looks like this
Note that since some images don’t have a camera or lens logged the joins to those tables needs to be a LEFT join not an inner join. Again the version I use in PowerShell has a Where clause which inserts a parameter.
OK. so much for file data – the other data I wanted was about collections. The list of collections is in just one table (AgLibraryCollection) so very easy to query, and but I also wanted to know the images in each collection.
Since one image can be in many collections,and each collection holds many images AgLibraryCollectionImage is a table to provide a many to relationship. Different tables might be attached to AdobeImages depending on what information one wants from about the images in a collection, I’m interested only in mapping files on disk to collections in Lightroom, so I have linked to the file information and I have a query like this.
Once I have an ODBC driver (or an OLE DB driver) I have a ready-made PowerShell template for getting data from the data source. So I wrote functions to let me do :
Get-LightroomItem -ListFolders -include $pwd
To list folders, below the current one, which are in the Lightroom Library,Get-LightroomItem -include "dive"
To list files in Lightroom Library where the path contains “dive” in the folder or filename,Get-LightroomItem | Group-Object -no -Property "Lens" | sort count | ft -a count,name
To produce a summary of Lightroom items by lens used.Get-LightroomCollection
lists all collections andGet-LightroomCollectionItem -include musicians | copy -Destination e:\raw\musicians
will copy the original files in the “musicians” collection to another disk
And this:
Stores paths of Lightroom items in the current folder ending in .DNG in $paths; then gets files in the current folder and moves those which are not in $paths
(i.e. in Lightroom.) specifying -Whatif
allows the files to be confirmed before being moved.
I’ve shared the PowerShell code as a gist