Troubleshooting Crashes in ArchivesSpace

info

If you've landed on this article it's likely because you're either being proactively warned about crashing ASpace, or, you've just crashed and are looking for troubleshooting tips. If you'd like to skip straight to those tips, click here if ASpace crashed while running a spreadsheet import and here if ASpace crashed while running a report. Otherwise, please read on for understanding crashing in general.

What just happened?

Once ASpace crashes, you may experience a few things: you may get an error message like the one at the top of this article, or you may see a frozen screen as if ASpace is churning on something and won't let go, or you didn't realize there was a problem at all until Atlas contacted you to let you know your server had been restarted.

The most common cause of crashes is running background jobs. The two most likely jobs to crash ASpace are:

Large spreadsheet imports
Running reports, and especially custom reports

What happens now?

One of the benefits of hosting with Atlas is that we are monitoring your server 24/7. In fact, we often see a crash occur before you may even be aware of it. Since we are monitoring your servers for you, we usually act as fast as possible to restart your server. This may have just happened to you a few minutes ago, and that's why you're here! Happily, resolving a crash is simple: we restart your server and you return to work. But remember that re-trying what you just tried is likely to simply crash ASpace again. Read on for tips on avoiding your next crash.

What's the risk in crashing ASpace?

There are two principal risks:

That someone in your institution was in the middle of a task and had not gotten the chance to save before the server went down.
That the crash will corrupt your database. This is not specific to ArchivesSpace; there is always risk to databases when a crash occurs in the middle of an ongoing operation. If there are ever any severe repercussions from a crash, Atlas can roll back your database to the most recent backup. In all my years of working with ASpace, I have never seen this happen. Don't panic.

In the end, the biggest risk is likely one of inconvenience: you may need to contact us to restart your server, your colleagues may be confused why ASpace is suddenly down, there will be a delay while it comes back up, and you will lose whatever work was unsaved at the time. Our Hosting Team will work to get you back up and running as soon as possible.

What causes crashes?

Why one task will be successful and another will crash the server can be ambiguous. The intent of this section is to try to unpack what makes an operation complex enough to overwhelm (and crash) the server and how being aware of this (and the trends in your own data) may help you avoid crashes in the future.

First, what exactly is a crash?

Despite using computers and applications for many years, it can be unclear what a crash really is. A crash in this context is when ArchivesSpace runs out of the memory (RAM) necessary to complete an operation. Without the memory to function, it cannot complete the job, cannot give that memory back to the other aspects of the application, and so ASpace as a whole crashes. It's a bit like running out of gas (or charge!) while going up a hill.

What uses so much memory?

The short answer is: the complexity of the operation you have just tried to run. Why something was complex is going to be more specific to what in particular caused the crash; read on for more about complexity in ASpace in general.

Complexity in ArchivesSpace (in general)

Not everything below will apply to the circumstances of your crash, but it's worth understanding the biggest possible picture and then narrowing down to the factors that may have affected your most recent operation.

ASpace operations are generally going to fall into one of two categories: [creating or editing data], or, [searching and reporting on data]. One type of operation makes changes and triggers indexing, the other simply brings information together by querying the database.

In both cases, the biggest takeaway is this statement right here: Complexity in ArchivesSpace is best thought of as not the number of records in the database but as the number of links between those records. The creating, indexing, and reporting of linked records (especially deep linking, where things are linked to things that are linked to things) are the variables that can deplete system resources rapidly and comprehensibly enough to cause a crash.

Links are probably not the first thing that comes to mind when you think of the descriptive depth of your holdings. You are probably more likely to think of the number of records, like "We have 345 manuscript collections." You are probably not inclined to think of a sentence like this: "We have 345 manuscript collections linked to 1,345 Agents, 13 Classifications, and 2,345 Subjects, plus 10,427 top containers linked to 345 locations."

Be careful not to conflate the number of records with the number of links, as that can lead to false narratives. For instance, you may know that you have 1,500 processed collections and think that is a lot, but if your institution only ever links to one or two agents per resource, that's probably not that bad from a reporting point of view. Meanwhile, you might only have 300 resources, and think your data is simple enough to pass through any report, but you may not consider that you undertook a really robust Agent linking or EAC-CPF project a few years ago and your 300 resources are linked to 300 Agents that themselves have 3,000 links between them.

Another example of a false narrative is in linking to Classifications. Let's say you have 1,500 Resources and 10 Classifications, and a peer institution also has 1,500 Resources but 100 Classifications. So far it seems like 100 Classifications are going to cause more problems for reporting than 10 will, but it may be the opposite: because you only have 10 Classifications, you have more links per record (1,500 / 10 = 150 links per Classification). Your peer institution has spread those 1,500 links out over a greater number of records (1,500 / 100 = 15 per Classification), and so that may represent less of a burden in reporting despite there being more of them.

For both data creation and reporting, you may also have to remind yourself, especially if you imagine the traditional finding aid when thinking of your data, that there are a lot of individual record types in ASpace that are linked together. For example, each of the following is its own record and represents at least one link every time it is used, or in the case of certain records, multiple links:

Every archival object is its own record which much be linked, along with...
Every digital object
Every top container
Every location
Every event

You may also need to expand your definition of linked records. It's relatively easy to understand that an Agent record is linked to a Resource, but less obvious that an Extent is also a linked record. Here is a list of linked record types that you may not expect:

Extents
Dates
Instances
User Defined Fields
Languages

Also, pay attention to your linking landscape. Ask yourself a few questions about your particular data and practice:

What kinds of records do you link to beyond the usual (Agents, Resource, Subjects, Accessions, Digital Objects). Do you use events? Locations? Assessments?
Do you create associative relationships between Agents?
Do you link Agents and Subjects at only the Resource level or also the Archival Object level?
Do you use Classifications? If yes, do you have a lot or a few?
- And remember the prompt about false narratives, above! Here the answer is that having a few is actually potentially more complex than having a lot!
- It just depends on which side of the link you're thinking about: [one resource] linked to [one classification] is simple, but [one classification] linked to [500 resources] is complex.

Being aware of linking as a measure of complexity can help you unpack what makes one operation bigger than another if you begin to think of it more as "how many links did I just create" instead of "how many rows did I just upload."

Please read on for troubleshooting tips.

Troubleshooting Crashes from Spreadsheet Ingests

This section of the article talks specifically about troubleshooting tips for spreadsheet ingests, but we welcome you to read the entire article for understanding more about crashes in ASpace in general.

Remember that every row in your spreadsheet is creating and linking records, and that one row is not only creating a linking a single archival object, or digital object, etc, but depending on what other data is present, might be creating other records as well. The more rows and columns filled in, the more record types represented by each, the larger and larger the operation until you run out of system resources and crash.

The following are troubleshooting tips:

1. Split Your Spreadsheets

The most effective troubleshooting advice for spreadsheet ingests is very simple: split your spreadsheet into smaller spreadsheets and ingest them one at a time, ideally with some time in between to let ASpace catch up on indexing.

Unfortunately there is no way to accurately predict how many rows is too many rows for ingest because there are too many variables to account for, including the record this data is being added to, how many records and different links are being created in the ingest, and how many active users are in the application at once. Since it varies so much, row limits are something usually gleaned from trial and error.

One pro-tip that makes splitting spreadsheets a bit easier is to know that you can control where in the hierarchy a spreadsheet ingest should begin. This is helpful to know if you need to split the file.

For example, let's say you have a spreadsheet that you expect is too large for ingest, but doing so means you're going to have to split your data in the middle of a series. You may be concerned that the second ingest is going to put that data in the wrong place (i.e. not in the series you already started). It's good to know that ASpace will begin the ingest at whatever record you are on when you hit Load Via Spreadsheet.

If you're uncertain of this, it's best to practice in a test environment (see below).

2. Account for Indexing Strain

Note that a successful ingest is not the end of the story! When you upload a spreadsheet successfully, ArchivesSpace creates all the new records and the new links between them in only a few seconds, but that is not the end of the operation! ArchivesSpace can create those records that fast, but the follow-up operation is to index them. Indexing is invisible to you the user, but can be ongoing for minutes to hours after a large ingest completes.

If you have just successfully ingested a large spreadsheet and you immediately follow up with another large spreadsheet, the ongoing indexing operation from the first upload is already using RAM, making your second ingest more likely to fail. If you have the time and you feel you're pushing the limits on what ASpace can handle, allow at least 30 minutes (upwards to three hours) between large uploads. See the Timing section below for more suggestions on timing.

Note that all this advice is for the larger end of spreadsheet ingests; smaller spreadsheets can be uploaded in quicker succession, but bear all these variables in mind as you proceed.

3. Test!

If you are an Enterprise customer hosting a sandbox at Atlas, and you are concerned about whether or not a spreadsheet upload will crash ASpace, test in your Sandbox first! Your Sandbox experience may be slightly different than Production because other users won't be logged in and doing simultaneous tasks, but it's certainly close enough for testing purposes. And if you crash the Sandbox, no problem!

If you are not an Enterprise customer, you can always use the ASpace Sandbox (login/password admin/admin), but any data there is publicly visible (unless deleted) and that server will be provisioned differently that your server here at Atlas.

4. Timing will help

Since you likely share your instance of ArchivesSpace with other people working at your institution, the timing on running spreadsheet ingests may help prevent crashes. For example, if someone in your institution is also running a spreadsheet ingest at the same time you are, that will definitely have an impact on whether ASpace crashes, and if it does crash, it will also interrupt the unsaved work of your colleague.

Strategies for timing include:

Time running the largest ingests to when you know your colleagues aren't working, or aren't working heavily, like the early morning, right at the end of a day, or over lunch.
Announce that you are about to run a large ingest on your institutional messaging app or however you keep in touch with your colleagues throughout a work day to ensure others aren't performing large tasks at the same time.
Last thing on a Friday (or at the end of any work week) is a great time to run a larger than usual ingest because it gives ASpace time to complete indexing without affecting others.

Troubleshooting Crashes from Reporting

This section of the article talks specifically about troubleshooting tips for custom reports, but we welcome you to read the entire article for understanding more about crashes in ASpace in general.

The following are troubleshooting tips:

1. Understand what Makes a Report So Complex

Troubleshooting custom reports requires yet more of a deep dive into complexity. It's complex!

Here's why: When you generate a custom report, you are actually authoring a database query, it's just that you're doing it from within the ASpace interface and not in SQL.

For example, look at this one field available for limiting Accession reports:

By clicking this, you are saying you want to see all Accessions with the Acquisition Type of Purchase. That sounds very simple. But what this really looks like to the database is more like this, written in SQL:

SELECT * FROM archivesspace.accessionWHERE acquisition_type_id IN(SELECT id FROM archivesspace.enumeration_valueWHERE enumeration_id LIKE '6' AND value LIKE 'purchase');

Queries become complex when they relate to more than one table in the database. In the example above, the query wants data from a table called accession (one table) but relies on the enumeration value of "purchase" stored in enumeration_value (another table). This shows that even the simplest queries can get complex fast and in hidden or unexpected ways!

Speaking of unexpected, did you know that Dates are in their own table? Extents? Instances? These things appear on only one screen in the interface, but they are stored in separate tables in the database itself:

This is of course hidden from view from the user perspective, but links between tables is the ultimate Why behind why linking is the true measure of complexity in ASpace, and consequentially, in ASpace custom reports.

2. Start slow, aim low

The best advice for getting started with custom reports or to prevent crashing is to start slow and with low row counts. Incremental testing will allow you to understand this feature and slowly work your way up to seeing if The Ultimate Report You Have Always Wanted will crash ArchivesSpace.

For example, if you already have a report in mind, start by aiming low on row counts. Run the report to see if it's returning the type of data you want, and once you've perfected it, incrementally increase the report Limit until you reach the lowest return that is still viable.

3. Avoid linked records if you don't need them

It is a certainty that the more linked records that are included in a custom report, the more your risk of crashing ASpace. Only select the linked records you know you need, or if you're experimenting, reduce the row Limit on the report as you try adding more linked records, then, once you have all the linked records you know you want, increase the row limit incrementally.

4. Timing will help

Since you likely share your instance of ArchivesSpace with other people working at your institution, the timing on running custom reports may help prevent crashes. For example, if someone in your institution is also running a large spreadsheet ingest when you click Run for a custom report, that will definitely have an impact on whether the custom report crashes ASpace, and if it does crash, it will also interrupt the unsaved work of your colleague.

Strategies for timing include:

Time running the largest reports to when you know your colleagues aren't working, or aren't working heavily, like the early morning, right at the end of a day, or over lunch.
Announce that you are about to run a large report on your institutional messaging app or however you keep in touch with your colleagues throughout a work day to ensure others aren't performing large tasks at the same time.

What just happened?​

What happens now?​

What's the risk in crashing ASpace?​

What causes crashes?​

First, what exactly is a crash?​

What uses so much memory?​

Complexity in ArchivesSpace (in general)​

Troubleshooting Crashes from Spreadsheet Ingests​

1. Split Your Spreadsheets​

2. Account for Indexing Strain​

3. Test!​

4. Timing will help​

Troubleshooting Crashes from Reporting​

1. Understand what Makes a Report So Complex​

2. Start slow, aim low​

3. Avoid linked records if you don't need them​

4. Timing will help​

What just happened?

What happens now?

What's the risk in crashing ASpace?

What causes crashes?

First, what exactly is a crash?

What uses so much memory?

Complexity in ArchivesSpace (in general)

Troubleshooting Crashes from Spreadsheet Ingests

1. Split Your Spreadsheets

2. Account for Indexing Strain

3. Test!

4. Timing will help

Troubleshooting Crashes from Reporting

1. Understand what Makes a Report So Complex

2. Start slow, aim low

3. Avoid linked records if you don't need them

4. Timing will help