Outline of Algorithms

Each waveform is processed to collect a number of summary values:

First, the maximum bin of the waveform is located, and the counts in the waveform are histogrammed. Then, a gaussian is fit to the count histogram to try to gauge the magnitude of the noise within the waveform. If the fit succeeds, the values of μ and σ it provides are used to find the first and last samples with values more than 5σ away from μ. During this computation the algorithm also tries to eliminate single spikes in the the waveform by removing from consideration whichever bin has the largest value if that bin's value is more than twice the average of the next several highest bins.

Lastly, the waveform is processed to try to find the start time of a hit contained within it. The algorithm first rectifies the waveform and then smooths it using a moving average, currently using a window width of ten bins. The maximum point of the smoothed wave is located and then the algorithm searches backwards from it for a point where 10% of the maximum is reached. When such a point is found, the average value within a window before it (currently 50 bins) is compared with the average value within the same window after the maximum bin, offset additionally by a number of bins equal to the smoothing window, so that the maximum bin itself does not contaminate the average. If the pre-candidate average is less than half of the post-maximum average, the candidate bin is taken to be the start of the hit. Otherwise the search continues down the waveform until either a suitable hit start is found or the end of the waveform is reached, at which point the algorithm concludes that there is no hit.

In the figure below the black wave is the level 1 processed waveform and the green is the smoothed waveform. The red marker indicates the maximum point of the smoothed wave and the blue marker indicates the chosen hit start bin. In the fourth waveform the maximum was located, but no suitable hit start was found.

Combined version with both plots overlayed

Output Format

The file now begins with a version number line which currently looks like:

Version 3 8 15

The first number (3) indicates the current version number; the number in and of itself carries no particular meaning, but will be increased each time the file format is revised. The latter two numbers allow older reader code to read newer files, as follows: The second number (8) is the number of fields which are general to the entire trigger, and the third is the number (15) of fields specific to each channel. That is, each of the data lines is arranged in blocks, as shown schematically here, and the numbers given in the version line are the width of the blocks, in data items.

The intention is that any additional fields added in future will be placed at the ends of the blocks, so that readers can read as many fields as they understand within a block, then skip any which remain.

Following the version line, each data line corresponds to one trigger.

The fields within the general block are:

  1. The trigger time-stamp, in tenths of nanoseconds since the start of the year. This information is duplicated in the next four entries.
  2. The time-stamp day
  3. The time-stamp hour
  4. The time-stamp minute
  5. The time-stamp seconds, to 10 decimal places
  6. The mainboard ID in hexadecimal
  7. The daq software name, currently always "mdaq"
  8. The name of the waveform file belonging to the trigger

And the fields within the each channel block are:

  1. The number of the bin with the largest absolute value
  2. The absolute value of the maximum bin
  3. The squared error per bin of the gaussian fit to the count histogram
  4. The mean computed by the fit (0.0 if the fit failed)
  5. The standard deviation computed by the fit (-1.0 if the fit failed)
  6. The time in nanoseconds of the first bin whose value, with the fit mean subtracted, is over 5 times the fit standard deviation (-1000 if no such bin was found)
  7. The value of that bin, divided by the fit standard deviation (0 if no bin was found)
  8. The time in nanoseconds of the last bin whose value, with the fit mean subtracted, is over 5 times the fit standard deviation (-1000 if no such bin was found)
  9. The value of that bin, divided by the fit standard deviation (0 if no bin was found)
  10. The time in nanoseconds of the signal start as reported by the HitFinder algorithm (-1000 if no signal was found)
  11. The number of bins which were over 5 times the standard deviation from the mean
  12. The sum of the squares of the values of all bins which were not omitted as spikes, divided by the number of such bins
  13. The time in nanoseconds of the signal start as reported by the MD Leading Edge algorithm
  14. The quality parameter reported by the Hit Finder algorithm, in arbitrary units
  15. The quality parameter reported by the MD Leading Edge algorithm, in arbitrary units

There are, then, currently a total of 68 items per line. An example line, with line breaks and spacing added here for clarity, looks like:

124613788970587772 144  5 29 38.8970587772  31492cf3354 mdaq     031492cf3354.47.A_binary.split.149.Run6581.csv
    438   111  1.60e+01 -4.95e+01  1.86e+01   214.62   5.97   215.57   5.48   133.77     2 4.758e+02   207.91   2.41 9797.73
    438   107  7.18e+01 -4.78e+01  1.67e+01   213.64   5.16   215.57   5.70   164.64     3 4.009e+02   203.51   2.02 8505.88
    508   137  3.89e+00 -5.10e+01  2.51e+01   246.85   5.46   246.85   5.46   179.34     1 9.766e+02   206.24   2.05 18573.13
    415  4157  3.35e+01 -6.26e+01  7.99e+01   173.46   8.42   247.31  13.82     0.00   111 3.459e+05   172.48   0.00 16663.22

Plots of Output

Compared here are plots of some of the output fields for two runs: a forced trigger run (7622, shown in blue), and a transmitter data run (7642, shown in black).