Organize photos for many overlapping projects

Question

I am working with a larger collection of photographs (~10,000) for my PhD thesis (in biology) and I am currently struggling with the organization of these. Sorry, if this question is in the wrong stackexchange community, I just thought that if someone people here should know about organization of photos.

My collection:

To elaborate a bit more on what I would like to achieve: I am currently doing my PhD working and need to take a lot of pictures to identify or protocol different plants. Since my project is divided in 4-5 different sub-projects (each with further sub-divisions) it is hard to keep an overview. Also, some pictures are of general interest for my project and so should be linked not only to the corresponding sub-project but as well to the whole. Additionally, I am awaiting more photos to pile up as this was only the first out of three field seasons...

My question:

How would you organize such as a photographic library?
What kind of folder structure would you suggest?

If I am searching in this forum (eg. here or here) or in any search engine, mostly ideas for daily life or professional pictures come up. But these use a very different kind of organization from what I need. For example, it is of little use to me to organize my pictures into people, places, events and dates as often suggested (eg. here).

Regarding folder structure, I thought about sort pictures corresponding to days (eg. 2019-06-23). But I am afraid this would get to messy!? Or maybe I could sort pictures into the sub-projects and use tags in the metadata to link them to other sub-projects?

Side notes:

I am running a Linux machine (Manjaro, specifically) and so far I tried out digiKam for organization of my library. But I guess the OS is only of secondary relevance here (ie. each OS should have at least some software to deal with this kind of problem).
For now I do not need to sync my files between devices.
Usually, I do not need to edit my photos. Only maybe 1-2 for the cover later...

I don't see how anyone but you can really answer this question. I mean, we can offer suggestions, but there's not gonna be a general answer. Maybe take this to Photography Chat? — mattdm, Commented Jun 24, 2019 at 3:25
Why could you not use taxonomic rank as with other biological classification? — Stan, Commented Jun 24, 2019 at 6:04
In Linux you can have "links" (hard or soft), in essence the same file appearing in several directories. Otherwise see this — xenoid, Commented Jun 24, 2019 at 7:35
Are you sure that software meant for organizing photographs in specific is appropriate for your use case, as opposed to a more generalized DAM system? — mattdm, Commented Jun 24, 2019 at 14:21

TheLuckless · Accepted Answer · 2019-06-24 22:02:50Z

First step should be to conduct a detailed review of the data to establish solid project expectations so you can make these sort of decisions with confidence.

Key thing we're looking to answer is:

How much overlap can an image have between sub-projects?
How much 'meta data' do we want to embed in the file structure vs an external storage/library file?
How much work would we want to do by ourselves, vs how much would we rather push off onto the computer to do for us...

Personally I would use Windows with Lightroom as an image library management solution, but this is less than ideal if you intend to remain within a Linux ecosystem.

However tools like Lightroom are kind of a bloated option with a lot of extra features that we probably don't need for this kind of project.

In a linux environment we may be better off scripting much of the handling ourselves over relying on ready-made tools. [It is also an excellent skill building task that gives highly useful experience in data-management.]

Manually sorting images into folders is 'less than ideal', prone to error, and awkward to reliably correct. This is especially true if there are large overlaps in sub-projects that any given image is likely to be involved in, or if you decide a major change is required later on.

When dealing with data, we can waste our time, or we can waste a computers time. Choose wisely.

Keywording and Database interaction is a far more robust option than getting overly complicated with folders. Unless there is effectively zero overlap in sub-projects that an image is included in, then it is far better to allow the computer to "do the sorting" for us.

Keep the core of the archive simple with a standard timestamp based file structure.

Project/Year/Month/Day/[timestamped_filename]

or even just

Project/Year_Month_Day/[timestamped_filename]

From here we are going to want either existing software that can act like Lightroom or another Image Cataloging software, or create scripts to deal with things for us.

General workflow would be:

Import images [or their file names] into a database. [flagged as 'new']
Keyword and add meta-data as required by project. [Once finished, remove 'new' flag. Even in something like Lightroom we want to maintain a clear indicator of whether the entry for a given image is 'finished' or if it needs more work before it is ready to move forward within the project.]
Define 'Views' of the data based on the above keywording and metadata to select the specific images needed for a given state of the project.

Project file structure then looks like:

\Project\
-\Core Image folder\[Subfolders]
-\Library, database, or Metadata\
-\Temporary Exports or 'views' folders\ {Flexible data generated on the fly as needed}

If you are comfortable with scripting and simple databases, then it is fairly easy to build out a basic toolchain yourself to generate view folders that contain symlinks back to the original source image if you don't require robust image review/editing tools from a more complex piece of software along the lines of Lightroom.

The specifics on how to implement something like this yourself easily vary, but the heart of it would be to define your target for a specific grouping based on a database query, which you then pass through a file management script.

Select all images that had keyword Alpha to create a list, and use that list to pull copies/links into a Sub-Project View Folder without changing data stored in the Core Image Folder.

If requirements of a sub-project change, then the old folder can be deleted and a replacement generated on the fly without having to manually copy and paste files to or from folders.

Remember to backup your core data and metadata! Exports/views can be regenerated on the fly as needed, but you want to make sure the original data is hard to lose or corrupt.

Can you explain how this database would be used if you needed to embed multiple images from multiple outings/organizational groups into a single document, including viewing and picking images and tracking the inclusion and removal of images from particular projects? — xiota, Commented Jun 24, 2019 at 20:58
@xiota Did that update help make the idea any clearer? - I'm trying to keep the answer fairly general without it drifting too far toward becoming a draft design doc for an overly specific project and work environment. - Figure "What the tool should do" is on topic here, and "how to actually do it" is best left to another stack exchange. — TheLuckless, Commented Jun 24, 2019 at 22:19
"Use a database" sounds like a good idea in principle, but image selection is inherently visual. If the database doesn't have a built-in viewing and selection tools, there would be a lot of back and forth between the image viewer and database. Then how will the user embed the images in the document? Document creation tools support filesystem based navigation, so there would be more back and forth between the word processor and database. Then what if users want to move some misplaced images? Do they have to reimport and tag from scratch? Do they end up creating an image manager from scratch? — xiota, Commented Jun 24, 2019 at 22:32

xiota · Accepted Answer · 2019-06-24 22:56:27Z

Consider using hard links in multiple folders with different methods of organization, as xenoid comments. That way, you can use whatever method is most convenient to find the images you want.

Organized by Date/location.

YYYYMMDD (City, ST) Location - Description/YYYYMMDD-HHMMSS Description.jpg

Organized by Taxonomy, as Stan ponders.

Organized by Project.

Projects/Title/Images/YYYYMMDD-HHMMSS Description.jpg

Any other organizational method you like.

To create the links, you can use cp -l src/ dst/. (ln is less convenient because it works on a per file basis.) Then use your favorite file manager to move files from dst/ to whereever you want them. To preserve links when backing up to external drives, you can use rsync. If you make a mistake and copy some files instead of linking them, existing tools, such as hardlink, can find and relink them.

Since they are hard links, advantages include:

There is no need to keep track of special locations for images that are shared among multiple projects. Just create new links within each project folder using any existing link that is convenient.
You can name and organize files independently of each other as needed.
Changes to one file will affect all the others. File contents will be kept in sync. (Covers and specially edited images should be copied first to unlink them from the original.)
The amount of hard drive space will not significantly increase as a result of having multiple "copies" in multiple locations.
You don't have to worry about broken links, as you would with soft links.
When you copy or archive folders (eg, to send to colleagues), the files will be copied in their entirety. (No need to track down "originals".)
Any software that can access the filesystem can be used to work with the files.
Tags embedded in the file as XMP metadata will be synced across all copies and move with the file, as xenoid notes. However, some software may not respect such tags.
No external database to keep in sync when you move files.

Disadvantages:

If you want to delete an image, you'll have to track down and delete each of the hard links. There may be tools to assist the process. At least ls tells you how many links are left.

rob j crowe · Accepted Answer · 2019-06-24 03:56:06Z

0

I'd use adobe lightroom personally. You can use it to make keywords in a hierarchy to tag images for the various phases. It also let's you bulk tag images w/ keywords which I think would save you some time. It also gives you smart collection to autogenerate collections based on those keywords aka smart collections. so you could have a smart collection for each phase / subphase. I'm sure you could do the same thing w/ free solutions but it might turn into a kludge and it might be worth paying for the lightroom package thats like 10$/month in the US. You might be able to find a used version of it like version 4 or 5.

answered Jun 24, 2019 at 3:56

rob j crowe

1,4011 gold badge9 silver badges12 bronze badges

\$\begingroup\$ OP is using Linux. \$\endgroup\$
– xiota
Commented Jun 24, 2019 at 13:23
\$\begingroup\$ "But I guess the OS is only of secondary relevance here " \$\endgroup\$
– rob j crowe
Commented Jun 24, 2019 at 20:09
\$\begingroup\$ It's not clear if that statement refers to Linux (vs Windows vs Mac) or "Manjaro specifically" (vs other Linux distros). It could also be meant to steer away from recommending specific software toward explaining underlying organization and management with respect to using images across multiple projects. \$\endgroup\$
– xiota
Commented Jun 24, 2019 at 21:06
\$\begingroup\$ lol. okay whatever you say... \$\endgroup\$
– rob j crowe
Commented Jun 24, 2019 at 21:33

Add a comment |

Stack Exchange Network

Organize photos for many overlapping projects

My collection:

My question:

Side notes:

3 Answers 3

Not the answer you're looking for? Browse other questions tagged
file-management
linux
science
or ask your own question.

Linked

Hot Network Questions

Organize photos for many overlapping projects

My collection:

My question:

Side notes:

3 Answers 3

Not the answer you're looking for? Browse other questions tagged file-managementlinuxscience or ask your own question.

Linked

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
file-management
linux
science
or ask your own question.