Skip to content

Data management on eventbuilders

Created by: jorana

Yesterday runs 9236, 9237, 9238 were irrecoverably lost from the event builders. The big question is how it can be that we at some point seem to have lost the data stored on /data/xenonnt_processed or have somehow passed line 586 in Ajax.

The reason turned out to be a faulty logic in this line that said that that are less than two hours old should be deleted.

RECONSTRUCTION These are the events that happened (focusing on runs 9236).

"2020-08-31T18:05:47.776Z" - Processing finished Bootstrax successfully processed the run and then deleted the live_data at this time which is only done after we have successfully stored all the data (there is a check in set_status_finished that makes sure the data has been written to disk). Furthermore, one can see from the deleted entries that we have saved . afbeelding

"2020-08-31T19:.." - Ajax deletes the 'unregistered' data The bug was here

7208899 MainThread root clean_unregistered::    found 398 runs stored on/data/xenonnt_processed/. Checking that each is in the runs-database
7209768 MainThread root remove_if_unregistered::        run 9236 is NOT registered in the runDB
7209768 MainThread root No data for 009236 found! Double checking /data/xenonnt_processed/!
7209769 MainThread root Cleaning /data/xenonnt_processed/009236-raw_records_nv-rfzvpzj4mf
7209770 MainThread root Cleaning /data/xenonnt_processed/009236-raw_records_aqmon-rfzvpzj4mf

"2020-08-31T20:18:53.934Z" - Ajax removes entries from runs-database In the clean_database routine, ajax notices that this run is stored for >2 h and that processing has finished. For this we check if the data is actually stored on this host on line 586. The corresponding output from ajax is added below:

10812030 MainThread root Loop finished, take a 3600 s nap
14412139 MainThread root clean_unregistered::   found 396 runs stored on/data/xenonnt_processed/. Checking that each is in the runs-database
14412978 MainThread root clean_abandoned::      No more matches in rundoc
14413442 MainThread root clean_database::       delete entry of data from 9236 at /data/xenonnt_processed/009236-raw_records_aqmon-rfzvpzj4mf as it does not exist
14413442 MainThread root deleting /data/xenonnt_processed/009236-raw_records_aqmon-rfzvpzj4mf finished
14413442 MainThread root changing data field in rundoc
14413442 MainThread root update with {'host': 'eb5.xenon.local', 'type': 'raw_records_aqmon', 'file_count': 36, 'at': datetime.datetime(2020, 8, 31, 20, 18, 53, 934849, tzinfo=<UTC>), 'by': 'eb5.xenon.local.ajax'}
...

"2020-08-31T20:19:05" - Bootstrax notices that all processed data is now removed and fails the run Please note that this further substantiates that the processing did occur as needed.

afbeelding