Skip to content

Gap-size-based clustering in two steps (peaklets)

Luca Scotto Lavina requested to merge github/fork/jhowl01/twostepgap into master

Created by: jhowl01

Adds an additional, intermediate data-kind called "peaklets", with the same dtype as peaks. Peaks are then constructed from peaklets via "merging". This is performed by a new function, using the attributes of several constituent peaklets to construct a peak, rather than reverting to hits.

Using two data structures allows clustering in two steps, with some pseudo-classification performed in between. Thus, we can apply different clustering criteria to S1-like and S2-like waveforms. This is beneficial, since these waveforms look very different: we really want to cut S1's off quickly at the tail, but small S2s can be sparse and stochastic. This method was used in pax, but breaking it up into multiple strax plugins allows us more play in the intermediate pseudo-classification and merging (e.g. using the sum-waveform).

Before merging, a few issues remain that I wasn't sure quite how to handle (see inline comments):

  • L60: Making a smarter choice for the down-sampling threshold during the merging
  • L64: Not assuming standard time-per-sample to start with.
  • L57: It's pretty essential to numba-fy this, but my numba implementation wound up slower than this, I think since np.repeat() isn't supported so I made it uglier.

Potential other issues:

  • L28: Vectorize to avoid the for-loop or numba-fy a double-loop? The latter was much slower for me, and not sure how to do the former. Currently, profiling shows merge_peaks() is way more dominant, but this might change when its numba'ed.
  • L89: Do we want to do something smarter (interpolate) than just repeating downsampled samples?

Merge request reports

Loading