New to Shotgun or Toolkit? Check out Our Story, learn about Our Product, or get a Quick Toolkit Overview. If you have specific questions, please visit our Support Page!
Status and CleanupHi everyone! Welcome back for part nine of our series dedicated to building a simple pipeline using Toolkit. Here are links to the previous posts, just in case you missed anything:
Introduction, Planning, & Toolkit Setup
Publishing from Maya to Nuke
Dataflow and Workflow
This week’s post expands on a few topics that we’ve already mentioned, and introduces some new ones. Our primary goal is to outline how the idea of tracking the status of published files can be combined with other concepts we’ve been exploring to aid in data flow and cleanup of old or unused files.
Published File StatusLike any entity in Shotgun, PublishedFile entities have a status field. While this field hasn’t been utilized heavily by Toolkit in the past, it provides some useful information that can be used to provide additional functionality in the pipeline.
One of the ways that we’ve used the status field in our simple pipeline is to define a “deprecated” status for published files that should no longer be used. Jesse Emond, the intern our team has been lucky to host, implemented this feature. He did this by adding a new status to the PublishedFile entity in Shotgun. Once the new status is available, a user with sufficient permissions can set the status of any published file to deprecated.
The way that Jesse made use of that in our pipeline was to modify the tk-multi-loader2 app to filter out published files from its list of available files to import/reference. In this way, when a published file is deprecated in Shotgun, users already making use of the file can continue to do so, but new references to the file will not be created. Jesse’s forked tk-multi-loader2 repository can be found here.
Another concept that is handled well by a status is to mark a specific version of a published file as the “official” version. This acts as an indication to users that, unless they have a specific reason to use a different version of the published file, they should be referencing that version. In addition to being a visual indicator for users, it could also be used in apps like tk-multi-loader2 to pre-select the official version of each published file that is presented to the user. This would ensure that most users are making use of what is generally considered to be the “correct” version of a published file. Similarly, the tk-multi-breakdown app could be made to present users with the “official” version of each published file instead of assuming that the latest is always what should be used.
There is a bit of a problem here with how Toolkit associates different versions of the same published file. The way it is structured out of the box is that PublishedFile entities each stand alone, and their association with other versions of the same file is handled at the code level.
PublishedFile entities. One way to handle that would be to associate each PublishedFile with a parent “BasePublishedFile” entity that would act as the representation of ALL versions of the file. This would provide a location to store version-independent information as well as data that’s exclusive to a single version of a published file.
Location-Specific StatusWe alluded to this briefly in a previous post about multi-location workflows, but it is important to discuss it again, as it plays a big role in a multi-location pipeline and how data on disk is cleaned up when it’s no longer needed.
As discussed in the previous post, the idea is to maintain a PublishedFile status per location.
PublishedFile entities themselves, which are used to track the global status of the published file, such as whether it has been deprecated. For location-specific statuses, we have a different set of requirements to sort out, as well as different statuses to track. Below are a few examples of location-specific statuses.
The “online” status indicates that the published file exists on disk at that location and is available to be read or referenced. In a subscription-based workflow, the online status would indicate that the published file is ready to be subscribed to and imported/referenced without the need to transfer the data from another location.
The “deleted” status indicates that the published file was online at that location, but has since been deleted there. Because a published file has been deleted in one location does not indicate that it has been deleted in any or all other locations, so it still might be possible to find another location that does have the file online and transfer it to the location where the file was previously deleted if need be.
Marked for Deletion:
The “marked for deletion” status, or “MFD” for short, is an indication that the file should be deleted in that location as soon as it is safe to do so. What’s considered “safe” to delete would be dictated by that file’s active subscriptions, which we discussed at length in last week’s post. Other benefits of marking a file for deletion rather than immediately deleting the data are speed and better balancing of file-server load. Because marking a file for deletion involves very little immediate processing, a large number of published files can be marked very quickly, which frees up the artist or TD performing the cleanup to move on to other tasks. It also allows the system to delete the data at some ideal time, and at a rate that’s healthy for the file servers.
The “transferring” status indicates that the published file is not yet online at that location, but is in the process of being transferred there. This will help resolve race conditions associated with multiple users or processes attempting to use a published file that isn’t yet online in quick succession. Rather than queue up the transfer of the file multiple times, apps can understand that they only need to wait for the existing transfer to complete before continuing their work.
Multiple Concurrent Statuses:
The types of statuses associated with locations will often need to coexist with one another. Using the statuses I’ve listed above as examples, it’s entirely reasonable for a published file to be considered both “online” and “marked for deletion.” Given how easy it is to add custom fields to entities in Shotgun, having a set of checkbox fields so that each can be checked on/off independent of the others is a great way to go. If there are a subset of statuses that are considered to be mutually exclusive, then a list field offering a choice of each could be used.
Cleaning Up Published FilesEvery studio has dealt with a lack of available disk space at some time or another, and many run on the ragged edge of running out of space on a daily basis. This means that it is very important to have a robust system in place for safely removing files when they are no longer in use. To do this, a lot of data from the pipeline about file usage is required, as we need to know both what is on disk and who is using it. Toolkit provides the former, but the latter will require tracking more than what’s provided out of the box, as Josh wrote about last week when he outlined the basics of a subscription-based pipeline.
First off, cleanup should be handled per location. Removing a file from disk in Los Angeles does not mean that the same file should also be removed in Vancouver. It is possible and entirely reasonable to know that a file isn’t being used in one location, remove the file in that location, and leave it online elsewhere because it’s still being actively subscribed to in those other locations.
As for how published files are deleted, we would provide two different approaches: on-demand deletion, and deferred deletion.
This is the most straightforward of the two approaches, and involves a user telling the system that they want to delete a specific published file. There are a few questions that need to be asked of the system before it’s known whether the file CAN be deleted, and once the data has been removed from disk there is additional processing that needs to occur.
This is where the “marked for deletion” status discussed earlier comes into play. The process of marking a file for deletion will also need to be checked to make sure it’s allowed, but those rules are much less stringent than those exercised for on-demand deletion.
Once a published file is marked for deletion, some process needs to periodically check to see if the file can be deleted and perform an on-demand deletion of the published file as described above. This process can be a script run as a cron job as often as is appropriate for your workflow. This script would ask the database for a list of PublishedFile records that have the marked for deletion status in that location and pass that list of files to the on-demand deletion routine. That will check to see if it is safe to delete the published file; if it is, then the file is deleted, and if it is not then it does nothing.
This cron would run locally in each location and would only ever operate on published files that are marked for deletion in its location.
Pipeline RulesThe two methods of deletion discussed above mention checking a set of rules to see if the requested action is allowed. These types of rule checks will come up in many places in a pipeline and are not reserved to cleanup systems.
- May I make this published file “official”? The answer should be “no” if that published file has been deleted in all locations, as the file itself is no longer available for use. The same is true if the file has been taken off of frontline storage and put into archive, as users won’t be able to access the data.
- May I transfer this published file to my location? The answer should be “no” if it is already in the process of being transferred to that location.
- May I mark this published file for deferred deletion? The answer should be “no” if it is the “official” version. The same would be true for on-demand deletion.
There are many other questions that need to be asked for various actions that can be taken within a production pipeline, so it makes sense to provide a simple, centralized location to store this logic. Implementing a simple “Rules” API is a good way to abstract away the logic used to answer your tool’s questions about the pipeline. It also means that those rules can change over time without needing to update the tools/apps themselves, as the rules of the pipeline are centralized.
ConclusionThat’s it for week 9! We hope you’ve enjoyed reading about statuses and how they can be used to aid in managing published files. As always, if you have any questions or suggestions, please add a comment below.
We will be back with week 10, but it will come two weeks later than normal, as the entire Shotgun team will be attending our annual summit next week and the week after is Thanksgiving in the USA. When we get back we will be hard at work putting together what will be the final post in the series! Even though we will be concluding with our next post, we’d like to invite everyone to offer topics for future pipeline related blog entries. We are very open to the idea of writing more in the future, so please let us know if there is something you would like to see discussed!
About Jeff & Josh
Jeff was a Pipeline TD, Lead Pipeline TD, and Pipeline Supervisor at Rhythm and Hues Studios over a span of 9 years. After that, he spent 2+ years at Blur Studio as Pipeline Supervisor. Between R&H and Blur, he has experienced studios large and small, and a wide variety of both proprietary and third-party software. He also really enjoys writing about himself in the third person.
Josh followed the same career path as Jeff in the Pipeline department at R&H, going back to 2003. In 2010 he migrated to the Software group at R&H and helped develop the studio’s proprietary toolset. In 2014 he took a job as Senior Pipeline Engineer in the Digital Production Arts MFA program at Clemson University where he worked with students to develop an open source production pipeline framework.
Jeff & Josh joined the Toolkit team in August of 2015.