Data management on eventbuilders
Created by: jorana
Yesterday runs 9236, 9237, 9238 were irrecoverably lost from the event builders. The big question is how it can be that we at some point seem to have lost the data stored on /data/xenonnt_processed
or have somehow passed line 586 in Ajax
.
The reason turned out to be a faulty logic in this line that said that that are less than two hours old should be deleted.
RECONSTRUCTION These are the events that happened (focusing on runs 9236).
"2020-08-31T18:05:47.776Z" - Processing finished
Bootstrax
successfully processed the run and then deleted the live_data at this time which is only done after we have successfully stored all the data (there is a check in set_status_finished
that makes sure the data has been written to disk). Furthermore, one can see from the deleted entries that we have saved .
"2020-08-31T19:.." - Ajax
deletes the 'unregistered' data
The bug was here
7208899 MainThread root clean_unregistered:: found 398 runs stored on/data/xenonnt_processed/. Checking that each is in the runs-database
7209768 MainThread root remove_if_unregistered:: run 9236 is NOT registered in the runDB
7209768 MainThread root No data for 009236 found! Double checking /data/xenonnt_processed/!
7209769 MainThread root Cleaning /data/xenonnt_processed/009236-raw_records_nv-rfzvpzj4mf
7209770 MainThread root Cleaning /data/xenonnt_processed/009236-raw_records_aqmon-rfzvpzj4mf
"2020-08-31T20:18:53.934Z" - Ajax
removes entries from runs-database
In the clean_database routine, ajax
notices that this run is stored for >2 h and that processing has finished. For this we check if the data is actually stored on this host on line 586. The corresponding output from ajax
is added below:
10812030 MainThread root Loop finished, take a 3600 s nap
14412139 MainThread root clean_unregistered:: found 396 runs stored on/data/xenonnt_processed/. Checking that each is in the runs-database
14412978 MainThread root clean_abandoned:: No more matches in rundoc
14413442 MainThread root clean_database:: delete entry of data from 9236 at /data/xenonnt_processed/009236-raw_records_aqmon-rfzvpzj4mf as it does not exist
14413442 MainThread root deleting /data/xenonnt_processed/009236-raw_records_aqmon-rfzvpzj4mf finished
14413442 MainThread root changing data field in rundoc
14413442 MainThread root update with {'host': 'eb5.xenon.local', 'type': 'raw_records_aqmon', 'file_count': 36, 'at': datetime.datetime(2020, 8, 31, 20, 18, 53, 934849, tzinfo=<UTC>), 'by': 'eb5.xenon.local.ajax'}
...
"2020-08-31T20:19:05" - Bootstrax
notices that all processed data is now removed and fails the run
Please note that this further substantiates that the processing did occur as needed.