Analysis of Locust Data Set

1. How to proceed?
- 1.1. Loading the data
- 1.2. Getting the code
2. Preliminary Analysis
3. Data renormalization
- 3.1. A quick check that the MAD "does its job"
4. Interactive data exploration
5. Spike detection
6. Cuts
7. Getting "clean" events
8. Dimension reduction
9. Clustering
10. "Brute force" superposition resolution
11. Getting the spike trains
12. Custom developed code
- 12.1. Individual function definitions

Sys.setlocale(category="LC_MESSAGES",locale="C")

[1] "C"

1 How to proceed?

The locust data set is located at the following web site: http://xtof.disque.math.cnrs.fr/data. So you should start by dowloading the four file Locust_x.tar.gz (where x = 1, 2, 3, 4) in some directory where you will perform your analysis (see second sub-section).

1.1 Loading the data

The data are simply loaded, once we know where to find them and if we do not forget that they have been compressed with gzip. The data are stored with one file per recording channel in double format. They were sampled at 15 kHz and there is 20 s of them. Since they are compressed, we cannot read them directly into R from the depository nd we must download them first:

reposName <-"http://xtof.disque.math.cnrs.fr/data/"
dN <- paste("Locust_",1:4,".dat.gz",sep="")
sapply(1:4, function(i)
       download.file(paste(reposName,dN[i],sep=""),
                     dN[i],mode="wb")
       )

Once the data are in our working directory we can load them into our R workspace:

dN <- paste("Locust_",1:4,".dat.gz",sep="")
nb <- 20*15000
lD <- sapply(dN,
             function(n) {
                 mC <- gzfile(n,open="rb")
                 x <- readBin(mC,what="double",n=nb)
                 close(mC);x
             }
             )
colnames(lD) <- paste("site",1:4)

We can check that our lD object has the correct dimension:

dim(lD)

[1] 300000      4

1.2 Getting the code

The individual functions developed for this kind of analysis are defined at the end of this document (Sec. Individual function definitions). They can also be downloaded as a single file sorting_with_r.R which must then be sourced with:

source("sorting_with_r.R")

where it is assumed that the working directory of your R session is the directory where sorting_with_r.R was downloaded.

2 Preliminary Analysis

We are going to start our analysis by some "sanity checks" to make sure that nothing "weird" happened during the recording.

2.1 Five number summary

We should start by getting an overall picture of the data like the one provided by the summary method of R which outputs a 5 numbers plus mean summary. The five numbers are the minimum, the first quartile, the median, the third quartile and the maximum:

summary(lD,digits=2)

    site 1           site 2           site 3           site 4     
Min.   :-9.074   Min.   :-8.229   Min.   :-6.890   Min.   :-7.35  
1st Qu.:-0.371   1st Qu.:-0.450   1st Qu.:-0.530   1st Qu.:-0.49  
Median :-0.029   Median :-0.036   Median :-0.042   Median :-0.04  
Mean   : 0.000   Mean   : 0.000   Mean   : 0.000   Mean   : 0.00  
3rd Qu.: 0.326   3rd Qu.: 0.396   3rd Qu.: 0.469   3rd Qu.: 0.43  
Max.   :10.626   Max.   :11.742   Max.   : 9.849   Max.   :10.56

We see that the data range (maximum - minimum) is similar (close to 20) on the four recording sites. The inter-quartiles ranges are also similar.

2.2 Were the data normalized?

We can check next if some processing like a division by the standard deviation (SD) has been applied:

apply(lD,2,sd)

site 1 site 2 site 3 site 4 
     1      1      1      1

2.3 Discretization step amplitude

We clearly see that these data have been scaled, that is, normalized to have an SD of 1. Since the data have been digitized we can easily obtain the apparent size of the digitization set:

apply(lD,2, function(x) min(diff(sort(unique(x)))))

     site 1      site 2      site 3      site 4 
0.006709845 0.009194500 0.011888433 0.009614042

2.4 Detecting saturation

Before embarking into a comprehensive analysis of data that we did not record ourselves (of that we recorded so long ago that we do not remember any "remarkable" event concerning them), it can be wise to check that no amplifier or A/D card saturation occurred. We can quickly check for that by looking at the length of the longuest segment of constant value. When saturation occurs the recorded value stays for many sampling points at the same upper or lower saturating level. We use function cstValueSgts that does this job.

ndL <- lapply(1:4,function(i) cstValueSgts(lD[,i]))
sapply(ndL, function(l) max(l[2,]))

[1] 2 2 2 2

We see that for each recording site, the longest segment of constant value is two sampling points long, that is 2/15 ms. There is no ground to worry about saturation here.

2.5 Plot the data

We are going to profit from the time series (ts and mts for multiple time series) objects of R by redefining our lD matrix as:

lD <- ts(lD,start=0,freq=15e3)

It is then straightforward to plot the whole data set:

plot(lD)

Figure 1: The whole (20 s) locust data set.

It is also good to "zoom in" and look at the data with a finer time scale:

plot(window(lD,start=0,end=0.2))

Figure 2: First 200 ms of the locust data set.

3 Data renormalization

We are going to use a median absolute deviation (MAD) based renormalization. The goal of the procedure is to scale the raw data such that the noise SD is approximately 1. Since it is not straightforward to obtain a noise SD on data where both signal (i.e., spikes) and noise are present, we use this robust type of statistic for the SD. Luckily this is simply obtained in R:

lD.mad <- apply(lD,2,mad)
lD <- t((t(lD)-apply(lD,2,median))/lD.mad)
lD <- ts(lD,start=0,freq=15e3)

where the last line of code ensures that lD is still an mts object. We can check on a plot how MAD and SD compare:

plot(window(lD[,1],0,0.2))
abline(h=c(-1,1),col=2)
abline(h=c(-1,1)*sd(lD[,1]),col=4,lty=2,lwd=2)

Figure 3: First 200 ms on site 1 of the locust data set. In red: +/- the MAD; in dashed blue +/- the SD.

3.1 A quick check that the `MAD` "does its job"

We can check that the MAD does its job as a robust estimate of the noise standard deviation by looking at Q-Q plots of the whole traces normalized with the MAD and normalized with the "classical" SD.

lDQ <- apply(lD,2,quantile, probs=seq(0.01,0.99,0.01))
lDnormSD <- apply(lD,2,function(x) x/sd(x))
lDnormSDQ <- apply(lDnormSD,2,quantile, probs=seq(0.01,0.99,0.01))
qq <- qnorm(seq(0.01,0.99,0.01))
matplot(qq,lDQ,type="n",xlab="Normal quantiles",ylab="Empirical quantiles")
abline(0,1,col="grey70",lwd=3)
col=c("black","orange","blue","red")
matlines(qq,lDnormSDQ,lty=2,col=col)
matlines(qq,lDQ,lty=1,col=col)
rm(lDnormSD,lDnormSDQ)

Figure 4: Performances of MAD based vs SD based normalizations. After normalizing the data of each recording site by its MAD (plain colored curves) or its SD (dashed colored curves), Q-Q plot against a standard normal distribution were constructed. Colors: site 1, black; site 2, orange; site 3, blue; site 4, red.

We see that the behavior of the "away from normal" fraction is much more homogeneous for small, as well as for large in fact, quantile values with the MAD normalized traces than with the SD normalized ones. If we consider automatic rules like the three sigmas we are going to reject fewer events (i.e., get fewer putative spikes) with the SD based normalization than with the MAD based one.

4 Interactive data exploration

Although we can't illustrate properly this key step on a "static" document it is absolutely necessary to look at the data in detail using (the corresponding function definition can be found in time-series specific method for data exploration):

explore(lD,col=c("black","grey70"))

Upon using this command the user is invited to move forward (typing "n" + RETURN or simply RETURN), backward (typing "f" + RETURN), to change the abscissa or ordinate scale, etc.

5 Spike detection

We are going to filter the data slightly using a "box" filter of length 5. That is, the data points of the original trace are going to be replaced by the average of themselves with their two nearest neighbors. We will then scale the filtered traces such that the MAD is one on each recording sites and keep only the parts of the signal which above 4:

lDf <- filter(lD,rep(1,5)/5)
lDf[is.na(lDf)] <- 0
lDf.mad <- apply(lDf,2,mad)
lDf <- t(t(lDf)/lDf.mad)
thrs <- c(4,4,4,4)
bellow.thrs <- t(t(lDf) < thrs)
lDfr <- lDf
lDfr[bellow.thrs] <- 0
remove(lDf)

We can see the difference between the raw trace and the filtered and rectified one on which spikes are going to be detected with:

plot(window(lD[,1],0,0.2))
abline(h=4,col=4,lty=2,lwd=2)
lines(window(ts(lDfr[,1],start=0,freq=15e3),0,0.2),col=2)

Figure 5: First 200 ms on site 1 of data set lD. The raw data are shown in black, the detection threshold appears in dashed blue and the filtered and rectified trace on which spike detection is going to be preformed appears in red.

Spikes are then detected as local maxima on the summed, filtered and rectified traces. To do that we use a peak detecting function defined in peaks and associated methods definition.

sp0 <- peaks(apply(lDfr,1,sum),15)

The returned object, sp0, is essentially a vector of integer containing the indexes of the detected spikes. To facilitate handling it is in addition defined as an object of class eventsPos meaning that entering its name on the command line and typing returns, that is, calling the print method on the object gives a short description of it:

sp0


eventsPos object with indexes of 1795 events. 
  Mean inter event interval: 166.81 sampling points, corresponding SD: 143.79 sampling points 
  Smallest and largest inter event intervals: 16 and 1157 sampling points.

We see that length(sp0) 1795 events were detected. Since the mean inter event interval is very close to the SD, the "compound process" (since it's likely to be the sum of the activities of many neurons) is essentially Poisson.

5.1 Interactive spike detection check

We can interactively check the detection quality with the tailored explore method (defined in Definition of an explore method for time series and eventsPos objects):

explore(sp0,lD,col=c("black","grey50"))

That leads to a display very similar to the one previously obtained with explore(lD) except that the detected events appear superposed on the raw data as red dots.

5.2 Remove useless objects

Since we are not going to use lDfr anymore we can save memory by removing it:

remove(lDfr)

5.3 Data set split

In order to get stronger checks for our procedure and to illustrate better how it works, we are going to split our data set in two parts, establish our model on the first and use this model on both parts:

(sp0E <- as.eventsPos(sp0[sp0 <= dim(lD)[1]/2]))
(sp0L <- as.eventsPos(sp0[sp0 > dim(lD)[1]/2]))


eventsPos object with indexes of 908 events. 
  Mean inter event interval: 164.88 sampling points, corresponding SD: 139.06 sampling points 
  Smallest and largest inter event intervals: 16 and 931 sampling points.

eventsPos object with indexes of 887 events. 
  Mean inter event interval: 168.72 sampling points, corresponding SD: 148.59 sampling points 
  Smallest and largest inter event intervals: 16 and 1157 sampling points.

We see that eventsPos objects can be sub-set like classical vectors. We also see that the sub-setting based on total time results in set with roughly the same number of events.

6 Cuts

6.1 Getting the "right" length for the cuts

After detecting our spikes, we must make our cuts in order to create our events' sample. That is, for each detected event we literally cut a piece of data and we do that on the four recording sites. To this end we use function mkEvents which in addition to an eventPos argument (sp1E) and a "raw data" argument (lD) takes an integer argument (before) stating how many sampling points we want to keep within the cut before the reference time as well as another integer argument (after) stating how many sampling points we want to keep within the cut after the reference time. The function returns essentially a matrix where each event is a column. The cuts on the different recording sites are put one after the other when the event is built. The obvious question we must first address is: How long should our cuts be? The pragmatic way to get an answer is:

Make cuts much longer than what we think is necessary, like 50 sampling points on both sides of the detected event's time.
Compute robust estimates of the "central" event (with the median) and of the dispersion of the sample around this central event (with the MAD).
Plot the two together and check when does the MAD trace reach the background noise level (at 1 since we have normalized the data).
Having the central event allows us to see if it outlasts significantly the region where the MAD is above the background noise level.

Clearly cutting beyond the time at which the MAD hits back the noise level should not bring any useful information as far a classifying the spikes is concerned. So here we perform this task as follows:

evtsE <- mkEvents(sp0E,lD,49,50)
evtsE.med <- median(evtsE)
evtsE.mad <- apply(evtsE,1,mad)

plot(evtsE.med,type="n",ylab="Amplitude")
abline(v=seq(0,400,10),col="grey")
abline(h=c(0,1),col="grey")
lines(evtsE.med,lwd=2)
lines(evtsE.mad,col=2,lwd=2)

Figure 6: Robust estimates of the central event (black) and of the sample's dispersion around the central event (red) obtained with "long" (100 sampling points) cuts. We see clearly that the dispersion is back to noise level 15 points before the peak and 30 points after the peak (on all sites). We also see that the median event is not back to zero 50 points after the peak, we will have to keep his information in mind when we are going to look for superpositions.

6.2 Events

Once we are satisfied with our spike detection, at least in a provisory way, and that we have decided on the length of our cuts, we proceed by making cuts around the detected events. :

evtsE <- mkEvents(sp0E,lD,14,30)

Here we have decided to keep 14 points before and 30 points after our reference times. evtsE is a bit more than a matrix, it is an object of class events, meaning that a summary method is available:

summary(evtsE)


events object deriving from data set: lD.
 Events defined as cuts of 45 sampling points on each of the 4 recording sites.
 The 'reference' time of each event is located at point 15 of the cut.
 There are 908 events in the object.

A print method which calls the plot method is also available giving:

evtsE[,1:200]

Figure 7: First 200 events of evtsE. Cuts from the four recording sites appear one after the other. The background (white / grey) changes with the site. In red, robust estimate of the "central" event obtained by computing the pointwise median. In blue, robust estimate of the scale (SD) obtained by computing the pointwise MAD.

Like eventsPos objects, events objects can be sub-set with respect to the rows like usual matrix. Notice that a rather sophisticated plot was obtained with an extremely simple command… The beauty of R class / method mechanism in action.

6.3 Noise

Getting an estimate of the noise statistical properties is an essential ingredient to build respectable goodness of fit tests. In our approach "noise events" are essentially anything that is not an "event" is the sense of the previous section. I wrote essentially and not exactly since there is a little twist here which is the minimal distance we are willing to accept between the reference time of a noise event and the reference time of the last preceding and of the first following "event". We could think that keeping a cut length on each side would be enough. That would indeed be the case if all events were starting from and returning to zero within a cut. But this is not the case with the cuts parameters we tool previously (that will become clear soon). You might wonder why we chose so short a cut length then. Simply to avoid having to deal with too many superposed events which are the really bothering events for anyone wanting to do proper sorting. To obtain our noise events we are going to define function mkNoise which takes the same arguments as function mkEvents plus two number: safetyFactor a number by which the cut length is multiplied and which sets the minimal distance between the reference times discussed in the previous paragraph and size the maximal number of noise events one wants to cut (the actual number obtained might be smaller depending on the data length, the cut length, the safety factor and the number of events).

We cut next noise events with a rather large safety factor:

noiseE <- mkNoise(sp0E,lD,14,30,safetyFactor=2.5,2000)

Here noiseE is also an events object and its summary is:

summary(noiseE)


events object deriving from data set: lD.
 Events defined as cuts of 45 sampling points on each of the 4 recording sites.
 The 'reference' time of each event is located at point 15 of the cut.
 There are 1330 events in the object.

The reader interested in checking the effect of the safetyFactor argument is invited to try something like:

noiseElowSF <- mkNoise(sp0E,lD,14,30,safetyFactor=1,2000)
plot(mean(noiseElowSF),type="l")
lines(mean(noiseE),col=2)

7 Getting "clean" events

Our spike sorting has two main stages, the first one consist in estimating a generative model and the second one consists in using this model to build a classifier before applying to the data. Our generative model will include superposed events but it is going to be built out of reasonably "clean" ones. Here by clean we mean events which are not due to a nearly simultaneous firing of two or more neurons; and simultaneity is defined on the time scale of one of our cuts.

In order to eliminate the most obvious superpositions we are going to use a rather brute force approach, looking at the sides of the central peak of our median event and checking if individual events are not too large there, that is do not exhibit extra peaks. We first define a function doing this job:

goodEvtsFct <- function(samp,thr=3) {
    samp.med <- apply(samp,1,median)
    samp.mad <- apply(samp,1,mad)
    above <- samp.med > 0
    samp.r <- apply(samp,2,function(x) {x[above] <- 0;x})
    apply(samp.r,2,function(x) all(abs(x-samp.med) < thr*samp.mad))
}

We then apply our new function to our realigned sample:

goodEvts <- goodEvtsFct(evtsE,8)

Here goodEvts is a vector of logical with as many elements as events in evtsE. Elements of goodEvts are TRUE if the corresponding event of evtsE is "good" (i.e., not a superposition) and is FALSE otherwise. We can look at the first 200 good events easily with:

evtsE[,goodEvts][,1:200]

Figure 8: First 200 good events of evtsE.

We see that few superpositions are left but the most obvious ones of our previous events figure are gone. We can also look at the sum(!goodEvts) [1] 38 "bad" events with:

evtsE[,!goodEvts]

Figure 9: Bad events of evtsE.

8 Dimension reduction

8.1 Principal component analysis

Our events are living right now in an 180 dimensional space (our cuts are 45 sampling points long and we are working with 4 recording sites simultaneously). It turns out that it hard for most humans to perceive structures in such spaces. It also hard, not to say impossible with a realistic sample size, to estimate probability densities (which what some clustering algorithm are actually doing) in such spaces, unless one is ready to make strong assumptions about these densities. It is therefore usually a good practice to try to reduce the dimension of the sample space used to represent the data. We are going to that with principal component analysis (PCA), using it on our "good" events.

evtsE.pc <- prcomp(t(evtsE[,goodEvts]))

We have to be careful here since function prcomp assumes that the data matrix is built by stacking the events / observations as rows and not as columns like we did in our events object. We apply therefore the function to the transpose (t()) of our events.

8.2 Exploring `PCA` results

PCA is a rather abstract procedure to most of its users, at least when they start using it. But one way to grasp what it does is to plot the mean event plus or minus, say twice, each principal components like:

layout(matrix(1:4,nr=2))
explore(evtsE.pc,1,5)
explore(evtsE.pc,2,5)
explore(evtsE.pc,3,5)
explore(evtsE.pc,4,5)

Figure 10: PCA of evtsEo2 (for "good" events) exploration (PC 1 to 4). Each of the 4 graphs shows the mean waveform (black), the mean waveform + 5 x PC (red), the mean - 5 x PC (blue) for each of the first 4 PCs. The fraction of the total variance "explained" by the component appears in between parenthesis in the title of each graph.

We can see that the first 3 PCs correspond to pure amplitude variations. An event with a large projection (score) on the first PC is smaller than the average event on recording sites 1, 2 and 3, but not on 4. An event with a large projection on PC 2 is larger than average on site 1, smaller than average on site 2 and 3 and identical to the average on site 4. An event with a large projection on PC 3 is larger than the average on site 4 only. PC 4 is the first principal component corresponding to a change in shape as opposed to amplitude. A large projection on PC 4 means that the event as a shallower first valley and a deeper second valley than the average event on all recording sites.

We now look at the next 4 principal components:

layout(matrix(1:4,nr=2))
explore(evtsE.pc,5,5)
explore(evtsE.pc,6,5)
explore(evtsE.pc,7,5)
explore(evtsE.pc,8,5)

Figure 11: PCA of evtsE (for "good" events) exploration (PC 5 to 8). Each of the 4 graphs shows the mean waveform (black), the mean waveform + 5 x PC (red), the mean - 5 x PC (blue). The fraction of the total variance "explained" by the component appears in between parenthesis in the title of each graph.

An event with a large projection on PC 5 tends to be "slower" than the average event. An event with a large projection on PC 6 exhibits a slower kinetics of its second valley than the average event. PC 5 and 6 correspond to effects shared among recording sites. PC 7 correspond also to a "change of shape" effect on all sites except the first. Events with a large projection on PC 8 rise slightly faster and decay slightly slower than the average event on all recording site. Notice also that PC 8 has a "noisier" aspect than the other suggesting that we are reaching the limit of the "events extra variability" compared to the variability present in the background noise.

This guess can be confirmed by comparing the variance of the "good" events sample with the one of the noise sample to which the variance contributed by the first 8 PCs is added:

sapply(1:15, function(n) sum(diag(cov(t(noiseE))))+sum(evtsE.pc$sdev[1:n]^2)-sum(evtsE.pc$sdev^2))

 [1] -277.4651543 -187.5634116 -128.0390777  -91.3186691  -58.8398876
 [6]  -36.3630674  -21.5437224   -8.2644952    0.2848893    6.9067336
[11]   13.3415488   19.4720891   25.2553356   29.1021047   32.5634359

This near equality means that we should not include component beyond the 8th one in our analysis. That's leave the room to use still fewer components.

8.3 Static representation of the projected data

We can build a scatter plot matrix showing the projections of our "good" events sample onto the plane defined by pairs of the few first PCs:

panel.dens <- function(x,...) {
  usr <- par("usr")
  on.exit(par(usr))
  par(usr = c(usr[1:2], 0, 1.5) )
  d <- density(x, adjust=0.5)
  x <- d$x
  y <- d$y
  y <- y/max(y)
  lines(x, y, col="grey50", ...)
}
pairs(evtsE.pc$x[,1:4],pch=".",gap=0,diag.panel=panel.dens)

Figure 12: Scatter plot matrix of the projections of the good events in evtsEo2 onto the planes defined by the first 4 PCs. The diagonal shows a smooth (Gaussian kernel based) density estimate of the projection of the sample on the corresponding PC. Using the first 8 PCs does not make finner structure visible.

8.4 Dynamic representation of the projected data

The best way to discern structures in "high dimensional" data is to dynamically visualize them. To this end, the tool of choice is GGobi, an open source software available on Linux, Windows and MacOS. It is in addition interfaced to R thanks to the rggobi package. We have therefore two ways to use it: as a stand alone program after exporting the data from R, or directly within R. We are going to use it in its stand alone version here. We therefore start by exporting our data in csv format to our disk:

write.csv(evtsE.pc$x[,1:8],file="evtsE.csv")

What comes next is not part of this document but here is a brief description of how to get it:

Launch GGobi
In menu: File -> Open, select evtsE.csv.
Since the glyphs are rather large, start by changing them for smaller ones:
- Go to menu: Interaction -> Brush.
- On the Brush panel which appeared check the Persistent box.
- Click on Choose color & glyph....
- On the chooser which pops out, click on the small dot on the upper left of the left panel.
- Go back to the window with the data points.
- Right click on the lower right corner of the rectangle which appeared on the figure after you selected Brush.
- Dragg the rectangle corner in order to cover the whole set of points.
- Go back to the Interaction menu and select the first row to go back where you were at the start.
Select menu: View -> Rotation.
Adjust the speed of the rotation in order to see things properly.

You should easily discern 10 rather well separated clusters. Meaning that an automatic clustering with 10 clusters on the first 3 principal components should do the job.

9 Clustering

9.1 k-means clustering

Since our dynamic visualization shows 10 well separated clusters in 3 dimension, a simple k-means should do the job:

set.seed(20061001,kind="Mersenne-Twister")
km10 <- kmeans(evtsE.pc$x[,1:3],centers=10,iter.max=100,nstart=100)
c10 <- km10$cluster

Since function kmeans of R does use a random initialization, we set the seed (as well as the kind) of our pseudo random number generator in order to ensure full reproducibility. In order to ensure reproducibility even if another seed is used as well as to facilitate the interpretation of the results, we "order" the clusters by "size" using the integrated absolute value of the central / median event of each cluster as a measure of its size.

cluster.med <- sapply(1:10, function(cIdx) median(evtsE[,goodEvts][,c10==cIdx]))
sizeC <- sapply(1:10,function(cIdx) sum(abs(cluster.med[,cIdx])))
newOrder <- sort.int(sizeC,decreasing=TRUE,index.return=TRUE)$ix
cluster.mad <- sapply(1:10, function(cIdx) {ce <- t(evtsE)[goodEvts,];ce <- ce[c10==cIdx,];apply(ce,2,mad)})
cluster.med <- cluster.med[,newOrder]
cluster.mad <- cluster.mad[,newOrder]
c10b <- sapply(1:10, function(idx) (1:10)[newOrder==idx])[c10]

9.2 Results inspection with `GGobi`

We start by checking our clustering quality with GGobi. To this end we export the data and the labels of each event:

write.csv(cbind(evtsE.pc$x[,1:3],c10b),file="evtsEsorted.csv")

Again the dynamic visualization is not part of this document, but here is how to get it:

Load the new data into GGobi like before.
In menu: Display -> New Scatterplot Display, select evtsEsorted.csv.
Change the glyphs like before.
In menu: Tools -> Color Schemes, select a scheme with 10 colors, like Spectral, Spectral 10.
In menu: Tools -> Automatic Brushing, select evtsEsorted.csv tab and, within this tab, select variable c10b. Then click on Apply.
Select View -> Rotation like before and see your result.

9.3 Cluster specific plots

Another way to inspect the clustering results is to look at cluster specific events plots:

layout(matrix(1:4,nr=4))
par(mar=c(1,1,1,1))
plot(evtsE[,goodEvts][,c10b==1],y.bar=5)
plot(evtsE[,goodEvts][,c10b==2],y.bar=5)
plot(evtsE[,goodEvts][,c10b==3],y.bar=5)
plot(evtsE[,goodEvts][,c10b==4],y.bar=5)

Figure 13: First 4 clusters. Cluster 1 at the top, cluster 4 at the bottom. Scale bar: 5 global MAD units. Red, cluster specific central / median event. Blue, cluster specific MAD.

Notice the increased MAD on the rising phase of cluster 2 on the first recording site. A sing of misalignment of the events of this cluster.

layout(matrix(1:4,nr=4))
par(mar=c(1,1,1,1))
plot(evtsE[,goodEvts][,c10b==5],y.bar=5)
plot(evtsE[,goodEvts][,c10b==6],y.bar=5)
plot(evtsE[,goodEvts][,c10b==7],y.bar=5)
plot(evtsE[,goodEvts][,c10b==8],y.bar=5)

Figure 14: Next 4 clusters. Cluster 5 at the top, cluster 8 at the bottom. Scale bar: 5 global MAD units. Red, cluster specific central / median event. Blue, cluster specific MAD.

Cluster 5 has few events while some "subtle" superpositions are present in cluster 7.

layout(matrix(1:2,nr=2))
par(mar=c(1,1,1,1))
plot(evtsE[,goodEvts][,c10b==9],y.bar=5)
plot(evtsE[,goodEvts][,c10b==10],y.bar=5)

Figure 15: Last 2 clusters. Cluster 9 at the top, cluster 10 at the bottom. Scale bar: 5 global MAD units. Red, cluster specific central / median event. Blue, cluster specific MAD.

Cluster 10 exhibits an extra variability on sites 1 and 4 around its first valley and its peak.

9.4 Dealing with the jitter

9.4.1 A worked out example

Cluster 2 is the cluster exhibiting the largest sampling jitter effects, since it has the largest time derivative, in absolute value, of its median event . This is clearly seen when we superpose the 51st event from this cluster with the median event:

c2_median = apply(evtsE[,goodEvts][,c10b==2],1,median)
plot(c2_median,col='red',lwd=2,type="l")
lines(unclass(evtsE[,goodEvts][,c10b==2][,51]),lwd=2)

Figure 16: The median event of cluster 2 (red) together with event 51 of the same cluster (black).

A Taylor expansion shows that if we write g(t) the observed 51st event, δ the sampling jitter and f(t) the actual waveform of the event then: \[g(t) = f(t+δ) + ε(t) \approx f(t) + δ \, f'(t) + δ^2/2 \, f''(t) + ε(t) \, ;\] where ε is a Gaussian process and where \(f'\) and \(f''\) stand for the first and second time derivatives of \(f\). Therefore, if we can get estimates of \(f'\) and \(f''\) we should be able to estimate δ by linear regression (if we neglect the \(δ^2\) term as well as the potentially non null correlation in ε) or by non linear regression (if we keep the latter).

We start by getting the derivatives estimates:

dataD = apply(lD,2,function(x) c(0,diff(x,2)/2,0))
evtsED = mkEvents(sp0E,dataD,14,30)
dataDD = apply(dataD,2,function(x) c(0,diff(x,2)/2,0))
evtsEDD = mkEvents(sp0E,dataDD,14,30)
c2D_median = apply(evtsED[,goodEvts][,c10b==2],1,median)
c2DD_median = apply(evtsEDD[,goodEvts][,c10b==2],1,median)

We then get something like:

plot(unclass(evtsE[,goodEvts][,c10b==2][,51])-c2_median,type="l",col='red',lwd=2)
lines(1.5*c2D_median,col='blue',lwd=2)
lines(1.5*c2D_median+1.5^2/2*c2DD_median,col='black',lwd=2)

Figure 17: The median event of cluster 2 subtracted from event 51 of the same cluster (red); 2 times the first derivative of the median event (blue)—corresponding to δ=1.5—; 1.5 times the first derivative + 1.5²/2 times the second (black)—corresponding again to δ=1.5—.

If we neglect the \(δ^2\) term we quickly arrive at: \[\hat{δ} = \frac{\mathbf{f'} \cdot (\mathbf{g} -\mathbf{f})}{\| \mathbf{f'} \|^2} \, ;\]

where the 'vectorial' notation like \(\mathbf{a} \cdot \mathbf{b}\) stands here for: \[\sum_{i=0}^{179} a_i b_i \, .\]

For the 51st event of the cluster we get:

(delta_hat = sum(c2D_median*(unclass(evtsE[,goodEvts][,c10b==2][,51])-c2_median))/sum(c2D_median^2))

1.4917182304327

We can use this estimated value of delta_hat as an initial guess for a procedure refining the estimate using also the \(δ^2\) term. The obvious quantity we should try to minimize is the residual sum of square, RSS defined by: \[\mathrm{RSS}(δ) = \| \mathbf{g} - \mathbf{f} - δ \, \mathbf{f'} - δ^2/2 \, \mathbf{f''} \|^2 \; .\]

We can define a function returning the RSS for a given value of δ as well as an event evt a cluster center (median event of the cluster) center and its first two derivatives, centerD and centerDD:

rss_for_alignment <- function(delta,
                              evt=unclass(evtsE[,goodEvts][,c10b==2][,51]),
                              center=c2_median,
                              centerD=c2D_median,
                              centerDD=c2DD_median) {
    sapply(delta,
           function(d) sum((evt - center - d*centerD - d^2/2*centerDD)^2)
           )
}

We then get the graph with:

layout(matrix(1:2,nc=2))
dd <- seq(-5,5,0.05)
plot(dd,rss_for_alignment(dd),type='l',col='black',lwd=2)
dd_fine <- seq(delta_hat-0.5,delta_hat+0.5,len=501)
plot(dd_fine,rss_for_alignment(dd_fine),type='l',col='black',lwd=2)
abline(v=delta_hat,col='red')

Figure 18: The RSS as a function of δ for event 51 of cluster 2. Left, \(δ \in [-5,5]\); right, \(δ \in [\hat{δ}-0.5,\hat{δ}+0.5]\) and the red vertical line shows \(\hat{δ}\).

The left panel of the above figure shows that our initial guess for \(\hat{δ}\) is not bad but still approximately 0.2 units away from the actual minimum. The classical way to refine our δ estimate—in 'nice situations' where the function we are trying to minimize is locally convex—is to use the Newton-Raphson algorithm which consists in approximating locally the 'target function' (here our RSS function) by a parabola having locally the same first and second derivatives, before jumping to the minimum of this approximating parabola. If we develop our previous expression of \(\mathrm{RSS}(δ)\) we get: \[\mathrm{RSS}(δ) = \| \mathbf{h} \|^2 - 2\, δ \, \mathbf{h} \cdot \mathbf{f'} + δ^2 \, \left( \|\mathbf{f'}\|^2 - \mathbf{h} \cdot \mathbf{f''}\right) + δ^3 \, \mathbf{f'} \cdot \mathbf{f''} + \frac{δ^4}{4} \|\mathbf{f''}\|^2 \, ;\] where \(\mathbf{h}\) stands for \(\mathbf{g} - \mathbf{f}\). By differentiation with respect to δ we get: \[\mathrm{RSS}'(δ) = - 2\, \mathbf{h} \cdot \mathbf{f'} + 2 \, δ \, \left( \|\mathbf{f'}\|^2 - \mathbf{h} \cdot \mathbf{f''}\right) + 3 \, δ^2 \, \mathbf{f'} \cdot \mathbf{f''} + δ^3 \|\mathbf{f''}\|^2 \, .\] And a second differentiation leads to: \[\mathrm{RSS}''(δ) = 2 \, \left( \|\mathbf{f'}\|^2 - \mathbf{h} \cdot \mathbf{f''}\right) + 6 \, δ \, \mathbf{f'} \cdot \mathbf{f''} + 3 \, δ^2 \|\mathbf{f''}\|^2 \, .\] The equation of the approximating parabola at \(δ^{(k)}\) is then: \[\mathrm{RSS}(δ^{(k)} + η) \approx \mathrm{RSS}(δ^{(k)}) + η \, \mathrm{RSS}'(δ^{(k)}) + \frac{η^2}{2} \, \mathrm{RSS}''(δ^{(k)})\; ,\] and its minimum—if \(\mathrm{RSS}''(δ)\) > 0—is located at: \[δ^{(k+1)} = δ^{(k)} - \frac{\mathrm{RSS}'(δ^{(k)})}{\mathrm{RSS}''(δ^{(k)})} \; .\]

Defining functions returning the required derivatives:

rssD_for_alignment <- function(delta,
                               evt=unclass(evtsE[,goodEvts][,c10b==2][,51]),
                               center=c2_median,
                               centerD=c2D_median,
                               centerDD=c2DD_median){
    h = evt - center
    -2*sum(h*centerD) + 2*delta*(sum(centerD^2) - sum(h*centerDD)) + 3*delta^2*sum(centerD*centerDD) + delta^3*sum(centerDD^2)
}
rssDD_for_alignment <- function(delta,
                                evt=unclass(evtsE[,goodEvts][,c10b==2][,51]),
                                center=c2_median,
                                centerD=c2D_median,
                                centerDD=c2DD_median){
    h = evt - center
    2*(sum(centerD^2) - sum(h*centerDD)) + 6*delta*sum(centerD*centerDD) + 3*delta^2*sum(centerDD^2)
}

we can get a graphical representation of a single step of the Newton-Raphson algorithm:

rss_at_delta_0 = rss_for_alignment(delta_hat)
rssD_at_delta_0 = rssD_for_alignment(delta_hat)
rssDD_at_delta_0 = rssDD_for_alignment(delta_hat)
delta_1 = delta_hat - rssD_at_delta_0/rssDD_at_delta_0
plot(dd_fine,rss_for_alignment(dd_fine),type="l",col='black',lwd=2)
abline(v=delta_hat,col='red')
lines(dd_fine, rss_at_delta_0 + (dd_fine-delta_hat)*rssD_at_delta_0 + (dd_fine-delta_hat)^2/2*rssDD_at_delta_0,col='blue',lwd=2)
abline(v=delta_1,col='grey')

Figure 19: The RSS as a function of δ for event 51 of cluster 1 (black), the red vertical line shows \(\hat{δ}\). In blue, the approximating parabola at \(\hat{δ}\). The grey vertical line shows the minimum of the approximating parabola.

Subtracting the second order in δ approximation of f(t+δ) from the observed 51st event of cluster 1 we get:

plot(evtsE[,goodEvts][,c10b==2][,51],type="l",col='black',lwd=2)
lines(evtsE[,goodEvts][,c10b==2][,51]-c2_median-delta_1*c2D_median-delta_1^2/2*c2DD_median,col='red',lwd=2)
lines(c2_median+delta_1*c2D_median+delta_1^2/2*c2DD_median,col='blue',lwd=1)

Figure 20: Event 51 of cluster 2 (black), second order approximation of f(t+δ) (blue) and residual (red) for δ—obtained by a succession of a linear regression (order 1) and a single Newton-Raphson step—equal to: delta_1 1.37480481443249.

9.4.2 Making the procedure systematic

Function get_jitter is defined in Sec. Definition of get_jitter. If we try it on the events attribute to cluster 2:

c2_jitter = get_jitter(evtsE[,goodEvts][,c10b==2],c2_median,c2D_median,c2DD_median)

we get the following summary statistic of the estimated jitters:

summary(c2_jitter,digits=2)

  Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
-1.900  -0.380  -0.044   0.041   0.660   1.800

We then get a estimate of the jitter shifted events of cluster 2 with:

c2_noj = unclass(evtsE[,goodEvts][,c10b==2]) - c2D_median %o% c2_jitter - c2DD_median %o% c2_jitter^2/2
attributes(c2_noj) = attributes(evtsE[,goodEvts][,c10b==2])

A plot allows us to compare the "raw version" of the events with the jitter corrected version:

layout(matrix(1:2,nr=2))
par(mar=c(1,1,1,1))
evtsE[,goodEvts][,c10b==2]
c2_noj

Figure 21: Top: cluster 2 events before jitter correction. Bottom: cluster 2 events after second order jitter correction.

9.4.3 Applying the procedure systematically

Using function mk_aligned_events defined in Definition of mk_aligned_events, for each of the 10 clusters:

evtsE_noj = lapply(1:10, function(i) mk_aligned_events(sp0E[goodEvts][c10b==i],lD))

Looking at the first 5 clusters we get:

layout(matrix(1:5,nr=5))
evtsE_noj[[1]]
evtsE_noj[[2]]
evtsE_noj[[3]]
evtsE_noj[[4]]
evtsE_noj[[5]]

Figure 22: First 5 clusters after jitter cancellation. Cluster 1 at the top, cluster 5 at the bottom. Red, cluster specific central / median event. Blue, cluster specific MAD.

Looking at the last 5 clusters we get:

layout(matrix(1:5,nr=5))
evtsE_noj[[6]]
evtsE_noj[[7]]
evtsE_noj[[8]]
evtsE_noj[[9]]
evtsE_noj[[10]]

Figure 23: Last 5 clusters after jitter cancellation. Cluster 6 at the top, cluster 10 at the bottom. Red, cluster specific central / median event. Blue, cluster specific MAD. Notice the vertical scale change compared to the previous figure

The five numbers summary of the remaining jitters for each of the 10 clusters are:

t(sapply(evtsE_noj, function(l) summary(attr(l,"delta"),digits=3)))

        Min. 1st Qu.   Median     Mean 3rd Qu.  Max.
 [1,] -0.493  -0.253  0.02120 -0.01690   0.171 0.465
 [2,] -0.534  -0.225 -0.03800  0.00695   0.286 0.509
 [3,] -0.450  -0.271  0.02930  0.01750   0.270 0.496
 [4,] -0.475  -0.193  0.01990  0.01360   0.227 0.468
 [5,] -0.627  -0.320  0.04860 -0.05690   0.216 0.337
 [6,] -0.530  -0.235  0.01760  0.01860   0.320 0.473
 [7,] -0.513  -0.249 -0.00277 -0.01360   0.234 0.560
 [8,] -0.514  -0.233  0.00456  0.00753   0.256 0.613
 [9,] -0.574  -0.258  0.02530  0.08350   0.256 4.570
[10,] -0.955  -0.275  0.01010 -0.00546   0.255 2.960

10 "Brute force" superposition resolution

We are going to resolve (the most "obvious") superpositions by a "recursive peeling method":

Events are detected and cut from the raw data or from an already peeled version of the data.
The closest center (in term of Euclidean distance) to the event is found—the jitter is always evaluated and compensated for when the distances are computed.
If the residual sum of squares (RSS), that is: (actual data - best center)\(^2\), is smaller than the squared norm of a cut, the best center is subtracted from the data on which detection was performed—jitter is again compensated for at this stage.
Go back to step 1 or stop.

We get a named list of cluster centers lists with:

centers = lapply(1:10, function(i) mk_center_list(attr(evtsE_noj[[i]],"positions"),lD))
names(centers) = paste("Cluster",1:10)

We then apply function classify_and_align_evt defined in Sec. Definition of classify_and_align_evt to every detected event:

round0 = lapply(as.vector(sp0),classify_and_align_evt, data=lD,centers=centers)

We can see how many events (for a total of length(sp0) 1795 ) got unclassified with:

sum(sapply(round0, function(l) l[[1]] == '?'))

10.1 First peeling

Using predict_data we get:

pred0 = predict_data(round0,centers)

We subtract pred0 from lD:

lD1 = lD - pred0

We can compare the original data with the result of the "first peeling":

ii = 1:1500 + 0.9*15000
tt = ii/15000
par(mar=c(1,1,1,1))
plot(tt, lD[ii,1], axes = FALSE,
     type="l",ylim=c(-50,20),
     xlab="",ylab="")
lines(tt, lD1[ii,1], col='red')
lines(tt, lD[ii,2]-15, col='black')
lines(tt, lD1[ii,2]-15, col='red')
lines(tt, lD[ii,3]-25, col='black')
lines(tt, lD1[ii,3]-25, col='red')
lines(tt, lD[ii,4]-40, col='black')
lines(tt, lD1[ii,4]-40, col='red')

Figure 24: 100 ms (between 0.9 and 1.0 s) of the locust data set. Black, original data; red, after first peeling.

10.2 Second peeling

We then take lD1 as our former lD and we repeat the procedure detecting only on the recording site 1 (also using a slightly shorter box filter):

lDf <- filter(lD1,rep(1,3)/3)
lDf[is.na(lDf)] <- 0
lDf.mad <- apply(lDf,2,mad)
lDf <- t(t(lDf)/lDf.mad)
bellow.thrs <- t(t(lDf) < thrs)
lDfr <- lDf
lDfr[bellow.thrs] <- 0
remove(lDf)
(sp1 <- peaks(lDfr[,1],15))


eventsPos object with indexes of 247 events. 
  Mean inter event interval: 1208.87 sampling points, corresponding SD: 1386.01 sampling points 
  Smallest and largest inter event intervals: 18 and 9140 sampling points.

We then apply again function classify_and_align_evt defined in Sec. Definition of classify_and_align_evt to every detected event:

round1 = lapply(as.vector(sp1),classify_and_align_evt, data=lD1,centers=centers)

We can see how many events (for a total of length(sp1) 247) got unclassified with:

sum(sapply(round1, function(l) l[[1]] == '?'))

We then get our prediction

pred1 = predict_data(round1,centers)

We subtract pred1 from lD1:

lD2 = lD1 - pred1

We can compare the first peeling with the second one:

ii = 1:1500 + 0.9*15000
tt = ii/15000
par(mar=c(1,1,1,1))
plot(tt, lD1[ii,1], axes = FALSE,
     type="l",ylim=c(-50,20),
     xlab="",ylab="")
lines(tt, lD2[ii,1], col='red')
lines(tt, lD1[ii,2]-15, col='black')
lines(tt, lD2[ii,2]-15, col='red')
lines(tt, lD1[ii,3]-25, col='black')
lines(tt, lD2[ii,3]-25, col='red')
lines(tt, lD1[ii,4]-40, col='black')
lines(tt, lD2[ii,4]-40, col='red')

Figure 25: 100 ms (between 0.9 and 1.0 s) of the locust data set. Black, first peeling; red, second peeling.

10.3 Third peeling

We then take lD2 as our former lD1 and we repeat the procedure detecting only on the recording site 2:

lDf <- filter(lD2,rep(1,3)/3)
lDf[is.na(lDf)] <- 0
lDf.mad <- apply(lDf,2,mad)
lDf <- t(t(lDf)/lDf.mad)
bellow.thrs <- t(t(lDf) < thrs)
lDfr <- lDf
lDfr[bellow.thrs] <- 0
remove(lDf)
(sp2 <- peaks(lDfr[,2],15))


eventsPos object with indexes of 124 events. 
  Mean inter event interval: 2404.16 sampling points, corresponding SD: 2381.06 sampling points 
  Smallest and largest inter event intervals: 16 and 10567 sampling points.

We then apply again function classify_and_align_evt defined in Sec. Definition of classify_and_align_evt to every detected event:

round2 = lapply(as.vector(sp2),classify_and_align_evt, data=lD2,centers=centers)

We can see how many events (for a total of length(sp2) 124) got unclassified with:

sum(sapply(round2, function(l) l[[1]] == '?'))

We then get our prediction

pred2 = predict_data(round2,centers)

We subtract pred2 from lD2:

lD3 = lD2 - pred2

We can compare the second peeling with the third one:

ii = 1:1500 + 0.9*15000
tt = ii/15000
par(mar=c(1,1,1,1))
plot(tt, lD2[ii,1], axes = FALSE,
     type="l",ylim=c(-50,20),
     xlab="",ylab="")
lines(tt, lD3[ii,1], col='red')
lines(tt, lD2[ii,2]-15, col='black')
lines(tt, lD3[ii,2]-15, col='red')
lines(tt, lD2[ii,3]-25, col='black')
lines(tt, lD3[ii,3]-25, col='red')
lines(tt, lD2[ii,4]-40, col='black')
lines(tt, lD3[ii,4]-40, col='red')

Figure 26: 100 ms (between 0.9 and 1.0 s) of the locust data set. Black, second peeling; red, third peeling.

10.4 Fourth peeling

We then take lD3 as our former lD2 and we repeat the procedure detecting only on the recording site 3:

lDf <- filter(lD3,rep(1,3)/3)
lDf[is.na(lDf)] <- 0
lDf.mad <- apply(lDf,2,mad)
lDf <- t(t(lDf)/lDf.mad)
bellow.thrs <- t(t(lDf) < thrs)
lDfr <- lDf
lDfr[bellow.thrs] <- 0
remove(lDf)
(sp3 <- peaks(lDfr[,3],15))


eventsPos object with indexes of 99 events. 
  Mean inter event interval: 2993.43 sampling points, corresponding SD: 3686.95 sampling points 
  Smallest and largest inter event intervals: 34 and 21217 sampling points.

We then apply again function classify_and_align_evt defined in Sec. Definition of classify_and_align_evt to every detected event:

round3 = lapply(as.vector(sp3),classify_and_align_evt, data=lD3,centers=centers)

We can see how many events (for a total of length(sp3) 99) got unclassified with:

sum(sapply(round3, function(l) l[[1]] == '?'))

We then get our prediction

pred3 = predict_data(round3,centers)

We subtract pred3 from lD3:

lD4 = lD3 - pred3

We can compare the third peeling with the fourth one:

ii = 1:1500 + 3.9*15000
tt = ii/15000
par(mar=c(1,1,1,1))
plot(tt, lD3[ii,1], axes = FALSE,
     type="l",ylim=c(-50,20),
     xlab="",ylab="")
lines(tt, lD4[ii,1], col='red')
lines(tt, lD3[ii,2]-15, col='black')
lines(tt, lD4[ii,2]-15, col='red')
lines(tt, lD3[ii,3]-25, col='black')
lines(tt, lD4[ii,3]-25, col='red')
lines(tt, lD3[ii,4]-40, col='black')
lines(tt, lD4[ii,4]-40, col='red')

Figure 27: 100 ms (between 3.9 and 4.0 s) of the locust data set. Black, third peeling; red, fourth peeling. Different time frame than on the previous plots.

10.5 Fifth peeling

We then take lD4 as our former lD3 and we repeat the procedure detecting only on the recording site 4:

lDf <- filter(lD4,rep(1,3)/3)
lDf[is.na(lDf)] <- 0
lDf.mad <- apply(lDf,2,mad)
lDf <- t(t(lDf)/lDf.mad)
bellow.thrs <- t(t(lDf) < thrs)
lDfr <- lDf
lDfr[bellow.thrs] <- 0
remove(lDf)
(sp4 <- peaks(lDfr[,4],15))


eventsPos object with indexes of 164 events. 
  Mean inter event interval: 1806.91 sampling points, corresponding SD: 1902.64 sampling points 
  Smallest and largest inter event intervals: 25 and 9089 sampling points.

We then apply again function classify_and_align_evt defined in Sec. Definition of classify_and_align_evt to every detected event:

round4 = lapply(as.vector(sp4),classify_and_align_evt, data=lD4,centers=centers)

We can see how many events (for a total of length(sp4) 164) got unclassified with:

sum(sapply(round4, function(l) l[[1]] == '?'))

We then get our prediction

pred4 = predict_data(round4,centers)

We subtract pred4 from lD4:

lD5 = lD4 - pred4

We can compare the fourth peeling with the fifth one:

ii = 1:1500 + 3.9*15000
tt = ii/15000
par(mar=c(1,1,1,1))
plot(tt, lD4[ii,1], axes = FALSE,
     type="l",ylim=c(-50,20),
     xlab="",ylab="")
lines(tt, lD5[ii,1], col='red')
lines(tt, lD4[ii,2]-15, col='black')
lines(tt, lD5[ii,2]-15, col='red')
lines(tt, lD4[ii,3]-25, col='black')
lines(tt, lD5[ii,3]-25, col='red')
lines(tt, lD4[ii,4]-40, col='black')
lines(tt, lD5[ii,4]-40, col='red')

Figure 28: 100 ms (between 3.9 and 4.0 s) of the locust data set. Black, fourth peeling; red, fifth peeling.

10.6 General comparison

We can compare the raw data with the fifth peeling on the first second:

ii = 1:15000
tt = ii/15000
par(mar=c(1,1,1,1))
plot(tt, lD[ii,1], axes = FALSE,
     type="l",ylim=c(-50,20),
     xlab="",ylab="")
lines(tt, lD5[ii,1], col='red')
lines(tt, lD[ii,2]-15, col='black')
lines(tt, lD5[ii,2]-15, col='red')
lines(tt, lD[ii,3]-25, col='black')
lines(tt, lD5[ii,3]-25, col='red')
lines(tt, lD[ii,4]-40, col='black')
lines(tt, lD5[ii,4]-40, col='red')

Figure 29: The first second of the locust data set. Black, raw data; red, fifth peeling.

We can also look at the remaining unclassified events (that don't look like any of our templates):

bad_ones = sapply(round4, function(l) l[[1]] == '?')
r4BE = mkEvents(sp4[bad_ones], lD4)
r4BE

Figure 30: The 60 remaining bad events after the fifth peeling.

10.7 Sixth peeling

We can push things one peeling step further detecting on all sites simultaneously like we did the first time:

lDf <- filter(lD5,rep(1,3)/3)
lDf[is.na(lDf)] <- 0
lDf.mad <- apply(lDf,2,mad)
lDf <- t(t(lDf)/lDf.mad)
bellow.thrs <- t(t(lDf) < thrs)
lDfr <- lDf
lDfr[bellow.thrs] <- 0
remove(lDf)
(sp5 <- peaks(apply(lDfr,1,sum),15))


eventsPos object with indexes of 234 events. 
  Mean inter event interval: 1283.21 sampling points, corresponding SD: 1325.54 sampling points 
  Smallest and largest inter event intervals: 17 and 8484 sampling points.

We then apply again function classify_and_align_evt defined in Sec. Definition of classify_and_align_evt to every detected event:

round5 = lapply(as.vector(sp5),classify_and_align_evt, data=lD5,centers=centers)

We can see how many events (for a total of length(sp5) 234) got unclassified with:

sum(sapply(round5, function(l) l[[1]] == '?'))

This number is a large fraction indeed of the total number of events. We can look at the whole sample of events:

mkEvents(sp5, lD5)

Figure 31: Detected events after fifth peeling (detection done on all sites simultaneously).

They also do not look like any of our templates.

11 Getting the spike trains

Once we have decided to stop the peeling iterations we can extract our spike trains with:

round_all = c(round0,round1,round2,round3,round4,round5)
spike_trains = lapply(paste("Cluster",1:10), function(cn) sapply(round_all[sapply(round_all, function(l) l[[1]]==cn)], function(l) l[[2]]+l[[3]]))
names(spike_trains) = paste("Cluster",1:10)

12 Custom developed code

12.1 Individual function definitions

12.1.1 `cstValueSgts` definition

Formal parameters:

x: a numeric vector.

Returns: an intgeger matrix with as many columns as segments where the derivative is null, the first row contains the index of the beginning of each segment, the second row contains the segments' lengths.

cstValueSgts <- function(x) {
    dx <- as.integer(abs(diff(x,2))/2 <= .Machine$double.eps) ## use "order 2" derivative estimates
    ## ddx is zero where the derivative did not change,
    ## it is one if the derivative becomes null and -1
    ## is the derivative stops being null
    ddx <- diff(dx) 
    ## s contains the indexes of the first
    ## points of segments where the
    ## derivative is null
    s <- (1:length(ddx))[ddx==1]
    sapply(s,
           function(b) {
               n <- ddx[(b+1):length(ddx)]
               c(b+1,min((1:length(n))[n==-1]))
           }
           )
}

12.1.2 Definitions of generic function and time-series specific method for data exploration

Definition of the generic function:

explore <- function(x,...) {
  UseMethod("explore")
}

Definition of the corresponding method for time series: Formal arguments:

x: a ts (time series) object.
offsetFactor: a numeric controlling the the spacing between the recording sites on the plot. Smaller values lead to closer spacing.
...: additional arguments passed to matplot and plot functions called inernally.

explore.ts <- function(x,
                       offsetFactor=0.5, 
                       ...) {

    stopifnot(inherits(x,"ts"))
    yRange <- range(x,na.rm=TRUE)
    plotPara <- list(tlim = start(x)[1] + c(0,0.1),
                     ylim = yRange,
                     yMin = yRange[1],
                     yMax = yRange[2],
                     firstTime = start(x)[1],
                     lastTime = end(x)[1],
                     keepGoing = TRUE)
    
    nFct <- function() {
        leftTime <- plotPara$tlim[1]
        rightTime <- plotPara$tlim[2]
        timeRange <- rightTime - leftTime
        rightTime <- rightTime + timeRange
        if (rightTime > plotPara$lastTime) {
            cat("Recording end reached.\n ")
            rightTime <- plotPara$lastTime
        }
        plotPara$tlim <- c(rightTime - timeRange, rightTime)
        plotPara
    }
    fFct <- function() {
        leftTime <- plotPara$tlim[1]
        rightTime <- plotPara$tlim[2]
        timeRange <- rightTime - leftTime
        leftTime <- leftTime - timeRange
        if (leftTime < plotPara$firstTime) {
            cat("Recording end reached.\n ")
            leftTime <- plotPara$firstTime
        }
        plotPara$tlim <- c(leftTime, leftTime + timeRange)
        plotPara
    }
    qFct <- function() {
        plotPara$keepGoing <- FALSE
        plotPara
    }
    ## Function tFct definition
    ## Allows the user to change the recording duration displayed on the window
    ## The user is invited to enter a factor which will be used to multiply the
    ## present duration displayed.
    ## If the resulting duration is too long a warning is given and the whole
    ## recording is shown.
    ## If possible the center of the displayed window is conserved.
    tFct <- function() {
        
        presentWindowLength <- diff(range(plotPara$tlim))
        tMessage <- paste("Present duration displayed: ", presentWindowLength, " \n", sep = "")
        tMessage <- paste(tMessage,
                          "By what factor do you want to multiply it? \n", sep = "")
        
        theFactor <- as.numeric(readline(tMessage))
        
        if (theFactor <= 0) {
            cat("A negative or null factor does not make sense.\n")
            return(plotPara)
        } ## End of conditional on theFactor <= 0
        
        ## Check that the new display length is reasonable
        totalLength <- plotPara$lastTime - plotPara$firstTime
        if (theFactor * presentWindowLength >= totalLength) {
            cat("Cannot show more data than available but only the entire record.\n ")
            plotPara$tlim[1] <- plotPara$firstTime
            plotPara$tlim[2] <- plotPara$lastTime
            return(plotPara)
        }

        windowCenter <- plotPara$tlim[1] + presentWindowLength / 2
        newLeft <- windowCenter - theFactor * presentWindowLength / 2
        newRight <- windowCenter + theFactor * presentWindowLength / 2
        
        if (!(newLeft >= plotPara$firstTime & newRight <= plotPara$lastTime)) {
            if (newLeft <= plotPara$firstTime) {
                cat("Cannot show data before the recording started, the displayed center wont be conserved.\n ")
                plotPara$tlim[1] <- plotPara$firstTime
                plotPara$tlim[2] <- plotPara$tlim[1] + theFactor * presentWindowLength
            }
            if (newRight >= plotPara$lastTime) {
                cat("Cannot show data after the recording ended, the displayed center wont be conserved.\n ")
                plotPara$tlim[2] <- plotPara$lastTime
                plotPara$tlim[1] <- plotPara$tlim[2] - theFactor * presentWindowLength
            }
            return(plotPara)
        } ## End of conditional on !(newLeft >= plotPara$firstTime & newRight <= plotPara$lastTime)

        plotPara$tlim[1] <- newLeft
        plotPara$tlim[2] <- newRight
        return(plotPara)
        
    }
    ## End of function tFct definition

    ## Function rFct definition
    ## Allows the user to change the maximal value displayed on the abscissa
    ## The user is invited to enter a value.
    rFct <- function() {
        
        leftTime <- plotPara$tlim[1]
        rightTime <- plotPara$tlim[2]
        timeRange <- rightTime - leftTime
        tMessage <- paste("Present latest time displayed: ",
                          rightTime,
                          "\n", sep = "")
        tMessage <- paste(tMessage,
                          "What new latest time do want (return leaves things unchanged)? \n", sep = "")
        
        theNewTime <- as.numeric(readline(tMessage))
        
        if (is.na(theNewTime)) { ## Nothing entered, leave things unchanged 
            return(plotPara)
        } ## End of conditional on is.na(theFactor)
        
        if (theNewTime <= plotPara$firstTime) {
            ## This choice does not make sense
            cat("Cannot display data before recording started.\n")
            return(plotPara)
        }

        if (theNewTime > plotPara$lastTime) {
            cat("Recording end reached.\n ")
            rightTime <- plotPara$lastTime
        } else {
            if (theNewTime <= leftTime) {
                ## The new latest time entered is smaller that the earliest time displayed
                cat("The new latest time is smaller than the earliest, adjustement will be made.\n")
                leftTime <- theNewTime - timeRange
                if (leftTime < plotPara$firstTime) {
                    cat("Adjustment requires a change in displayed duration.\n")
                    leftTime <- plotPara$firstTime
                }
            } ## End of conditional on theNewTime <= leftTime 
            rightTime <- theNewTime
        } ## End of conditional on theNewTime > plotPara$lastTime
        
        plotPara$tlim <- c(leftTime, rightTime)
        plotPara
        
    }

    ## Function lFct definition
    ## Allows the user to change the minimal value displayed on the abscissa
    ## The user is invited to enter a value.
    lFct <- function() {
        
        leftTime <- plotPara$tlim[1]
        rightTime <- plotPara$tlim[2]
        timeRange <- rightTime - leftTime
        tMessage <- paste("Present earliest time displayed: ",
                          leftTime,
                          "\n", sep = "")
        tMessage <- paste(tMessage,
                          "What new earliest time do want (return leaves things unchanged)? \n", sep = "")
        
        theNewTime <- as.numeric(readline(tMessage))
        
        if (is.na(theNewTime)) { ## Nothing entered, leave things unchanged 
            return(plotPara)
        } ## End of conditional on is.na(theFactor)
        
        if (theNewTime >= plotPara$lastTime) {
            ## This choice does not make sense
            cat("Cannot display data after recording ended.\n")
            return(plotPara)
        }

        if (theNewTime < plotPara$firstTime) {
            cat("Recording start reached.\n ")
            leftTime <- plotPara$firstTime
        } else {
            if (theNewTime >= rightTime) {
                ## The new earliest time entered is larger that the latest time displayed
                cat("The new earliest time is larger than the latest, adjustement will be made.\n")
                rightTime <- theNewTime + timeRange
                if (rightTime > plotPara$lastTime) {
                    cat("Adjustment requires a change in displayed duration.\n")
                    rightTime <- plotPara$lastTime
                }
            } ## End of conditional on theNewTime <= leftTime 
            leftTime <- theNewTime
        } ## End of conditional on theNewTime > plotPara$lastTime
        
        plotPara$tlim <- c(leftTime, rightTime)
        plotPara
        
    }
    

    ## Function yMaxFct definition
    ## Allows the user to change the maximal value displayed on the ordinate
    ## The user is invited to enter a value.
    yMaxFct <- function() {
        
        presentWindowRange <- range(plotPara$ylim)
        tMessage <- paste("Present range displayed: [",
                          paste(presentWindowRange, collapse = ","),
                          "] \n", sep = "")
        tMessage <- paste(tMessage,
                          "What new maximal ordinate value do want (return goes back to maximum)? \n", sep = "")
        
        theFactor <- as.numeric(readline(tMessage))

        if (is.na(theFactor)) {
            plotPara$ylim <- c(presentWindowRange[1],plotPara$yMax) 
            return(plotPara)
        } ## End of conditional on is.na(theFactor)
        
        if (theFactor <= plotPara$ylim[1]) {
            cat("The maximum should be larger than the minimum.\n")
            return(plotPara)
        } ## End of conditional on theFactor <= plotPara$ylim[1]

        plotPara$ylim <- c(presentWindowRange[1],theFactor) 
        return(plotPara)
        
    }
    ## End of function yMaxFct definition

    ## Function yMinFct definition
    ## Allows the user to change the minimal value displayed on the ordinate
    ## The user is invited to enter a value.
    yMinFct <- function() {

        presentWindowRange <- range(plotPara$ylim)
        tMessage <- paste("Present range displayed: [",
                          paste(presentWindowRange, collapse = ","),
                          "] \n", sep = "")
        tMessage <- paste(tMessage,
                          "What new minimal ordinate value do want (return goes back to minimum)? \n", sep = "")
        
        theFactor <- as.numeric(readline(tMessage))

        if (is.na(theFactor)) {
            plotPara$ylim <- c(plotPara$yMin, presentWindowRange[2]) 
            return(plotPara)
        } ## End of conditional on is.na(theFactor)
        
        if (theFactor >= plotPara$ylim[2]) {
            cat("The minimum should be smaller than the maximum.\n")
            return(plotPara)
        } ## End of conditional on theFactor >= plotPara$ylim[2]

        plotPara$ylim <- c(theFactor, presentWindowRange[2]) 
        return(plotPara)
        
    }
    ## End of function yMinFct definition

    show <- function(x,
                     plotPara,
                     ...) {

        s <- plotPara$tlim[1]
        e <- plotPara$tlim[2]
        y.m <- plotPara$ylim[1]
        y.M <- plotPara$ylim[2]
        m <- unclass(window(x,start=s,end=e))
        if (class(m) == "matrix") {
            m <- apply(m,2,function(x) ifelse(x < y.m, y.m,x))
            m <- apply(m,2,function(x) ifelse(x > y.M, y.M,x))
            ns <- dim(m)[2]
            offset <- c(0,-(1:(ns-1))*(y.M-y.m))
            m <- t(t(m)+offset*offsetFactor)
            matplot(m,type="l",lty=1,axes=FALSE,xlab="",ylab="",...)
        } else {
            m[m<y.m] <- y.m
            m[m>y.M] <- y.M
            plot(m,type="l",lty=1,axes=FALSE,xlab="",ylab="",ylim=c(y.m,y.M),...)
        }
    }

    plot.new()
    par(mar=c(0.5,0.5,0.5,0.5))
    show(x,plotPara,...)
    
    myMessage <- "Make a choice:\n n or 'return' (next); f (former); l (lower abscissa limit); r (upper abscissa limit) \n t (time scale); Y (upper ordinate limit); y (lower ordinate limit); q (quit) \n "

    while(plotPara$keepGoing) {
        
        myChoice <- readline(myMessage)

        plotPara <- switch(myChoice,
                           n = nFct(),
                           f = fFct(),
                           l = lFct(),
                           r = rFct(),
                           t = tFct(),
                           Y = yMaxFct(),
                           y = yMinFct(),
                           q = qFct(),
                           nFct()
                           )

        show(x,plotPara,...)
        
    } ## End of while loop on keepGoing

    dev.off()
    invisible()
}

12.1.3 `peaks` and associated methods definition

Function peaks returns an object (essentially a vector of indices) of class eventsPos:

peaks <- function(x,
                  minimalDist=15,
                  notZero=1e-3) {
    dx <- c(0,diff(x,2)/2,0)
    dx[abs(dx) < notZero] <- 0
    dx <- diff(sign(dx))
    res <- (1:length(dx))[dx < 0]
    res <- res[-length(res)][diff(res) > minimalDist]
    attr(res,"call") <- match.call()
    attr(res,"nIDx") <- length(x)
    class(res) <- "eventsPos"
    res
}

Method as.eventsPos transforms a vector into an eventsPos object, its formal arguments are:

x: an integer vector with strictly increasing elements.
start: an integer, the sampling point at which observation started.
end: an integer, the sampling point at which observation ended.

as.eventsPos <- function(x,
                         start,
                         end
                         ) {
    x <- as.integer(x)
    stopifnot(all(diff(x)>0))
    if (missing(start)) start <- floor(x)
    if (missing(end)) end <- ceiling(x)
    stopifnot(all(x>=start))
    stopifnot(all(x<=end))
    attr(x,"call") <- match.call()
    attr(x,"nIDx") <- end-start+1
    class(x) <- "eventsPos"
    x
}

Method print.eventsPos prints an eventsPos object:

print.eventsPos <- function(x, ...) {
  cat("\neventsPos object with indexes of ", length(x)," events. \n", sep = "")
  cat("  Mean inter event interval: ", round(mean(diff(x)),digits=2), " sampling points, corresponding SD: ", round(sd(diff(x)),digits=2), " sampling points \n", sep = "")
  cat("  Smallest and largest inter event intervals: ", paste(range(diff(x)),collapse=" and "), " sampling points. \n\n",sep= "")
}

12.1.4 Definition of an `explore` method for time series and `eventsPos` objects

The formal parameters of the method are:

x: an eventsPos object.
y: a ts (time series) object.
offsetFactor: a numeric controlling the the spacing between the recording sites on the plot. Smaller values lead to closer spacing.
events.pch: an integer of a character: the ploting character used to indicate events.
events.col: an integer or a character string coding the color used to indicate the event.
...: additional arguments passed to matplot and plot functions called inernally.

explore.eventsPos <- function(x,y,
                              offsetFactor=0.5,
                              events.pch=16,
                              events.col=2,
                              ...) {
    stopifnot(inherits(y,"ts"))
    yRange <- range(y,na.rm=TRUE)
    plotPara <- list(tlim = start(y)[1] + c(0,0.1),
                     ylim = yRange,
                     yMin = yRange[1],
                     yMax = yRange[2],
                     firstTime = start(y)[1],
                     lastTime = end(y)[1],
                     keepGoing = TRUE)
    
    nFct <- function() {
        leftTime <- plotPara$tlim[1]
        rightTime <- plotPara$tlim[2]
        timeRange <- rightTime - leftTime
        rightTime <- rightTime + timeRange
        if (rightTime > plotPara$lastTime) {
            cat("Recording end reached.\n ")
            rightTime <- plotPara$lastTime
        }
        plotPara$tlim <- c(rightTime - timeRange, rightTime)
        plotPara
    }
    fFct <- function() {
        leftTime <- plotPara$tlim[1]
        rightTime <- plotPara$tlim[2]
        timeRange <- rightTime - leftTime
        leftTime <- leftTime - timeRange
        if (leftTime < plotPara$firstTime) {
            cat("Recording end reached.\n ")
            leftTime <- plotPara$firstTime
        }
        plotPara$tlim <- c(leftTime, leftTime + timeRange)
        plotPara
    }
    qFct <- function() {
        plotPara$keepGoing <- FALSE
        plotPara
    }
    ## Function tFct definition
    ## Allows the user to change the recording duration displayed on the window
    ## The user is invited to enter a factor which will be used to multiply the
    ## present duration displayed.
    ## If the resulting duration is too long a warning is given and the whole
    ## recording is shown.
    ## If possible the center of the displayed window is conserved.
    tFct <- function() {

        presentWindowLength <- diff(range(plotPara$tlim))
        tMessage <- paste("Present duration displayed: ", presentWindowLength, " \n", sep = "")
        tMessage <- paste(tMessage,
                          "By what factor do you want to multiply it? \n", sep = "")
        
        theFactor <- as.numeric(readline(tMessage))
        
        if (theFactor <= 0) {
            cat("A negative or null factor does not make sense.\n")
            return(plotPara)
        } ## End of conditional on theFactor <= 0

        ## Check that the new display length is reasonable
        totalLength <- plotPara$lastTime - plotPara$firstTime
        if (theFactor * presentWindowLength >= totalLength) {
            cat("Cannot show more data than available but only the entire record.\n ")
            plotPara$tlim[1] <- plotPara$firstTime
            plotPara$tlim[2] <- plotPara$lastTime
            return(plotPara)
        }

        windowCenter <- plotPara$tlim[1] + presentWindowLength / 2
        newLeft <- windowCenter - theFactor * presentWindowLength / 2
        newRight <- windowCenter + theFactor * presentWindowLength / 2
        
        if (!(newLeft >= plotPara$firstTime & newRight <= plotPara$lastTime)) {
            if (newLeft <= plotPara$firstTime) {
                cat("Cannot show data before the recording started, the displayed center wont be conserved.\n ")
                plotPara$tlim[1] <- plotPara$firstTime
                plotPara$tlim[2] <- plotPara$tlim[1] + theFactor * presentWindowLength
            }
            if (newRight >= plotPara$lastTime) {
                cat("Cannot show data after the recording ended, the displayed center wont be conserved.\n ")
                plotPara$tlim[2] <- plotPara$lastTime
                plotPara$tlim[1] <- plotPara$tlim[2] - theFactor * presentWindowLength
            }
            return(plotPara)
        } ## End of conditional on !(newLeft >= plotPara$firstTime & newRight <= plotPara$lastTime)

        plotPara$tlim[1] <- newLeft
        plotPara$tlim[2] <- newRight
        return(plotPara)
        
    }
    ## End of function tFct definition

    ## Function rFct definition
    ## Allows the user to change the maximal value displayed on the abscissa
    ## The user is invited to enter a value.
    rFct <- function() {

        leftTime <- plotPara$tlim[1]
        rightTime <- plotPara$tlim[2]
        timeRange <- rightTime - leftTime
        tMessage <- paste("Present latest time displayed: ",
                          rightTime,
                          "\n", sep = "")
        tMessage <- paste(tMessage,
                          "What new latest time do want (return leaves things unchanged)? \n", sep = "")

        theNewTime <- as.numeric(readline(tMessage))
        
        if (is.na(theNewTime)) { ## Nothing entered, leave things unchanged 
            return(plotPara)
        } ## End of conditional on is.na(theFactor)
        
        if (theNewTime <= plotPara$firstTime) {
            ## This choice does not make sense
            cat("Cannot display data before recording started.\n")
            return(plotPara)
        }

        if (theNewTime > plotPara$lastTime) {
            cat("Recording end reached.\n ")
            rightTime <- plotPara$lastTime
        } else {
            if (theNewTime <= leftTime) {
                ## The new latest time entered is smaller that the earliest time displayed
                cat("The new latest time is smaller than the earliest, adjustement will be made.\n")
                leftTime <- theNewTime - timeRange
                if (leftTime < plotPara$firstTime) {
                    cat("Adjustment requires a change in displayed duration.\n")
                    leftTime <- plotPara$firstTime
                }
            } ## End of conditional on theNewTime <= leftTime 
            rightTime <- theNewTime
        } ## End of conditional on theNewTime > plotPara$lastTime
        
        plotPara$tlim <- c(leftTime, rightTime)
        plotPara
        
    }

    ## Function lFct definition
    ## Allows the user to change the minimal value displayed on the abscissa
    ## The user is invited to enter a value.
    lFct <- function() {

        leftTime <- plotPara$tlim[1]
        rightTime <- plotPara$tlim[2]
        timeRange <- rightTime - leftTime
        tMessage <- paste("Present earliest time displayed: ",
                          leftTime,
                          "\n", sep = "")
        tMessage <- paste(tMessage,
                          "What new earliest time do want (return leaves things unchanged)? \n", sep = "")

        theNewTime <- as.numeric(readline(tMessage))

        if (is.na(theNewTime)) { ## Nothing entered, leave things unchanged 
            return(plotPara)
        } ## End of conditional on is.na(theFactor)

        if (theNewTime >= plotPara$lastTime) {
            ## This choice does not make sense
            cat("Cannot display data after recording ended.\n")
            return(plotPara)
        }

        if (theNewTime < plotPara$firstTime) {
            cat("Recording start reached.\n ")
            leftTime <- plotPara$firstTime
        } else {
            if (theNewTime >= rightTime) {
                ## The new earliest time entered is larger that the latest time displayed
                cat("The new earliest time is larger than the latest, adjustement will be made.\n")
                rightTime <- theNewTime + timeRange
                if (rightTime > plotPara$lastTime) {
                    cat("Adjustment requires a change in displayed duration.\n")
                    rightTime <- plotPara$lastTime
                }
            } ## End of conditional on theNewTime <= leftTime 
            leftTime <- theNewTime
        } ## End of conditional on theNewTime > plotPara$lastTime
        
        plotPara$tlim <- c(leftTime, rightTime)
        plotPara
        
    }


    ## Function yMaxFct definition
    ## Allows the user to change the maximal value displayed on the ordinate
    ## The user is invited to enter a value.
    yMaxFct <- function() {

        presentWindowRange <- range(plotPara$ylim)
        tMessage <- paste("Present range displayed: [",
                          paste(presentWindowRange, collapse = ","),
                          "] \n", sep = "")
        tMessage <- paste(tMessage,
                          "What new maximal ordinate value do want (return goes back to maximum)? \n", sep = "")
        
        theFactor <- as.numeric(readline(tMessage))

        if (is.na(theFactor)) {
            plotPara$ylim <- c(presentWindowRange[1],plotPara$yMax) 
            return(plotPara)
        } ## End of conditional on is.na(theFactor)
        
        if (theFactor <= plotPara$ylim[1]) {
            cat("The maximum should be larger than the minimum.\n")
            return(plotPara)
        } ## End of conditional on theFactor <= plotPara$ylim[1]

        plotPara$ylim <- c(presentWindowRange[1],theFactor) 
        return(plotPara)
        
    }
    ## End of function yMaxFct definition

    ## Function yMinFct definition
    ## Allows the user to change the minimal value displayed on the ordinate
    ## The user is invited to enter a value.
    yMinFct <- function() {

        presentWindowRange <- range(plotPara$ylim)
        tMessage <- paste("Present range displayed: [",
                          paste(presentWindowRange, collapse = ","),
                          "] \n", sep = "")
        tMessage <- paste(tMessage,
                          "What new minimal ordinate value do want (return goes back to minimum)? \n", sep = "")
        
        theFactor <- as.numeric(readline(tMessage))

        if (is.na(theFactor)) {
            plotPara$ylim <- c(plotPara$yMin, presentWindowRange[2]) 
            return(plotPara)
        } ## End of conditional on is.na(theFactor)
        
        if (theFactor >= plotPara$ylim[2]) {
            cat("The minimum should be smaller than the maximum.\n")
            return(plotPara)
        } ## End of conditional on theFactor >= plotPara$ylim[2]

        plotPara$ylim <- c(theFactor, presentWindowRange[2]) 
        return(plotPara)
        
    }
    ## End of function yMinFct definition

    show <- function(x,
                     y,
                     plotPara,
                     ...) {

        s <- plotPara$tlim[1]
        e <- plotPara$tlim[2]
        y.m <- plotPara$ylim[1]
        y.M <- plotPara$ylim[2]
        firstIdx <- round(max(1,s*frequency(y)))
        lastIdx <- round(min(end(y)[1]*frequency(y),e*frequency(y)))
        ii <- firstIdx:lastIdx
        xx <- x[firstIdx <= x & x <= lastIdx]
        if (class(y)[1] == "mts") {
            m <- y[ii,]
            if (length(xx) > 0)
                mAtx <- as.matrix(y)[xx,,drop=FALSE]
        } else {
            m <- y[ii]
            if (length(xx) > 0)
                mAtx <- y[xx]
        }
        if (class(m) == "matrix") {
            m <- apply(m,2,function(x) ifelse(x < y.m, y.m,x))
            m <- apply(m,2,function(x) ifelse(x > y.M, y.M,x))
            ns <- dim(m)[2]
            offset <- c(0,-(1:(ns-1))*(y.M-y.m))
            m <- t(t(m)+offset*offsetFactor)
            matplot(m,type="l",lty=1,axes=FALSE,xlab="",ylab="",...)
            if (length(xx) > 0) {
                mAtx <- t(t(mAtx)+offset*offsetFactor)
                matpoints(xx-ii[1]+1,mAtx,pch=events.pch,col=events.col)
            }
        } else {
            
            m[m<y.m] <- y.m
            m[m>y.M] <- y.M
            plot(m,type="l",lty=1,axes=FALSE,xlab="",ylab="",ylim=c(y.m,y.M),...)
            if (length(xx) > 0)
                points(xx-ii[1]+1,mAtx,pch=events.pch,col=events.col)
        }
    }

    plot.new()
    par(mar=c(0.5,0.5,0.5,0.5))
    show(x,y,plotPara,...)
    
    myMessage <- "Make a choice:\n n or 'return' (next); f (former); l (lower abscissa limit); r (upper abscissa limit) \n t (time scale); Y (upper ordinate limit); y (lower ordinate limit); q (quit) \n "

    while(plotPara$keepGoing) {
        
        myChoice <- readline(myMessage)

        plotPara <- switch(myChoice,
                           n = nFct(),
                           f = fFct(),
                           l = lFct(),
                           r = rFct(),
                           t = tFct(),
                           Y = yMaxFct(),
                           y = yMinFct(),
                           q = qFct(),
                           nFct()
                           )
        show(x,y,plotPara,...)
        
    } ## End of while loop on keepGoing

    dev.off()
    invisible()
}

12.1.5 Function `cutSglEvt`

Its formal parameters are:

evtPos: a numeric or integer interpretable as an index, the posistion at which cuts will be produced.
data: a numeric vector of matrix containing the data. If vector the argument is converted as a single column matrix internally. The matrix rows are indexed by sampling points and its columns by recording sites / channels.
before: an integer: the number of sampling points within the cut before the reference time given by evtPos.
after: an integer: the number of sampling points after the reference time.

It returns:

A numeric vector with the cut(s). When several recording sites are used the cuts of each individual sites are placed one after the other.

cutSglEvt <- function(evtPos,
                      data,
                      before=14,
                      after=30
                      ) {
    evtPos <- as.integer(evtPos) ## make sure evtPos is an integer
    before <- as.integer(before) ## make sure before is an integer
    stopifnot(0 <= before) ## make sure before is positive or null
    after <- as.integer(after)
    stopifnot(0 <= after) ## make sure after is positive or null
    if (is.vector(data)) data <- matrix(data,nc=1)
    ns <- dim(data)[2]
    dl <- dim(data)[1]
    stopifnot(0<evtPos, evtPos<=dl) ## make sure evtPos is within range
    sl <- before+after+1 ## the length of the cut
    keep <- -before:after + evtPos
    within <- 1 <= keep & keep <= dl
    kw <- keep[within]
    res <- sapply(1:ns,
                  function(idx) {
                      v <- numeric(sl)
                      v[within] <- data[kw,idx]
                      v
                  }
                  )
    as.vector(res)
}

12.1.6 Function `mkEvents`

Its formal parameters are:

positions: an integer vector with events' positions as indices / sampling points or an eventsPos object.
data: a numeric vector of matrix containing the data or a 'ts' or 'mts' object. If vector the argument is converted as a single column matrix internally. The matrix rows are indexed by sampling points and its columns by recording sites / channels.
before: an integer, the number of sampling points within the cut before the reference times given by positions.
after: an integer, the number of sampling points within the cut after the reference times given by positions.

It returns a matrix with before + after + 1 rows and as many columns as elements in positions. Each column is an "event", that is, a set of cuts on the data. Attribute "positions" contains the value of the argument with the same name and attribute "data" contains the name of the corresponding argument, attribute "before" contains the value of the argument with the same name, attribute "after" contains the value of the argument with the same name, attribute "numberOfSites" contains the number of recording sites. Attribute "delta" is used when events are realligned on their mean waveforms (during "jitter cancellation"). The returned matrix is given an "events" class.

mkEvents <- function(positions,
                     data,
                     before=14,
                     after=30
                     ) {
    positions <- unclass(positions)
    data <- unclass(data)
    res <- sapply(positions,
                  cutSglEvt,
                  data,
                  before,
                  after)
    the.call <- match.call()
    attr(res,"positions") <- positions
    attr(res,"delta") <- NULL
    attr(res,"data") <- the.call[["data"]]
    attr(res,"before") <- before
    attr(res,"after") <- after
    attr(res,"numberOfSites") <- ifelse(is.matrix(data),dim(data)[2],1)
    attr(res,"call") <- match.call()
    class(res) <- "events"
    res
}

12.1.7 `events` methods

summary.events <- function(object,
                           ...) {
  b <- attr(object,"before")
  a <- attr(object,"after")
  ns <- attr(object,"numberOfSites")
  cat("\nevents object deriving from data set: ",attr(object,"data"),".\n",sep="")
  cat(" Events defined as cuts of ", a+b+1, " sampling points on each of the ",ns, " recording sites.\n",sep="")
  cat(" The 'reference' time of each event is located at point ", b+1, " of the cut.\n",sep="")
  if (!is.null(attr(object,"delta"))) {
    cat(" Events were realigned on median event.\n",sep="")
  }
  cat(" There are ", length(attr(object,"positions")), " events in the object.\n\n",sep="")
}

"[.events" <- function(x,i,j,drop = FALSE) {
    y <- NextMethod("[")
    if (!missing(i)) return(NULL)
    if (is.matrix(y) && dim(y)[2] > 1) {
        attr(y,"positions") <- attr(x,"positions")[j]
        attr(y,"delta") <- attr(x,"delta")
        attr(y,"data") <- attr(x,"data")
        attr(y,"before") <- attr(x,"before")
        attr(y,"after") <- attr(x,"after")
        attr(y,"numberOfSites") <- attr(x,"numberOfSites")
        attr(y,"call") <- match.call()
        class(y) <- "events"
    }
    y
}

t.events <- function(x) {
  t(unclass(x))
}

mean.events <- function(x,...) {
    apply(unclass(x),1,mean,...)
}

median.events <- function(x,na.rm = FALSE) {
    apply(unclass(x),1,median,na.rm)
}

'-.events' <- function(e1,e2) {
    stopifnot(length(e2) == dim(e1)[1])
    res <- unclass(e1)-e2
    attr(res,"positions") <- attr(e1,"positions")
    attr(res,"delta") <- attr(e1,"delta")
    attr(res,"data") <- attr(e1,"data")
    attr(res,"before") <- attr(e1,"before")
    attr(res,"after") <- attr(e1,"after")
    attr(res,"numberOfSites") <- attr(e1,"numberOfSites")
    attr(res,"call") <- match.call()
    class(res) <- "events"
    res
}

plot.events <- function(x,
                        y=NULL,
                        evts.lwd = 0.1,
                        medAndMad = TRUE,
                        evts.col = "black",
                        med.col = "red",
                        mad.col = "blue",
                        x.bar = NULL,
                        y.bar = NULL) {

    nsites <- attr(x,"numberOfSites")
    ne <- dim(x)[2]
    cl <- dim(x)[1]/nsites
    ylim <- range(x)
    matplot(x,type="n",xlab="",ylab="",axes=FALSE,ylim=ylim)
    if (nsites > 1) {
        ii <- 2*(1:(nsites %/% 2))
        rect((ii-1)*cl,ylim[1],ii*cl,ylim[2],col="grey80",border=NA)
    }
    matlines(x,col=evts.col,lty=1,lwd=evts.lwd)
    if (medAndMad) {
        med <- apply(x,1,median)
        mad <- apply(x,1,mad)
        lines(med,col=med.col)
        lines(mad,col=mad.col)
    }
    if (!is.null(x.bar)) segments(x0=0,y0=ylim[1]+0.1*diff(ylim),x1=x.bar)
    if (!is.null(y.bar)) segments(x0=0,y0=ylim[1]+0.1*diff(ylim),y1=ylim[1]+0.1*diff(ylim)+y.bar)
}

lines.events <- function(x,
                         evts.lwd = 0.1,
                         evts.col = "black",
                         ...
                         ) {
    matlines(x,col=evts.col,lty=1,lwd=evts.lwd,...)
}

print.events <- function(x, 
                         ... ) {
    plot.events(x,...)
}

12.1.8 `mkNoise` definition

mkNoise <- function(positions,
                    data,
                    before=14,
                    after=30,
                    safetyFactor=2,
                    size=2000) {
    positions <- unclass(positions)
    data <- unclass(data)
    if (!is.matrix(data)) data <- matrix(data,nc=1)
    size <- as.integer(size)
    stopifnot(0 < size) ## make sure size is a positive integer
    sl <- before+after+1
    ns <- dim(data)[2]
    i1 <- diff(positions) ## inter events intervals
    nbI <- (i1-round(safetyFactor*sl))%/%sl ## number of noise sweeps
    ## one can cut from each
    ## interval
    nbPossible <- min(size,
                      sum((nbI)[nbI>0])
                      )
    ## allocate next the memory for the noise events
    noiseMatrix <- matrix(0,
                          nr=ns*sl,
                          nc=nbPossible
                          )
    
    iV <- (1:length(i1))[nbI>0] ## A vector containing the indexes of
    ## the (inter event) intervals from
    ## which at least one noise sweep can be
    ## cut.
    iIdx <- 1 ## an index running over the inter event intervals from
    ## which noise events can be cut.
    nInI <- nbI[iV[iIdx]] ## the number of noise sweeps that can be cut
    ## from the "non empty" inter event interval
    ## iV[iIdx].
    nIdx <- 1 ## An index running over the noise sweeps.
    noisePositions <- integer(nbPossible)
    while (nIdx <= nbPossible) {
        uInI <- 1 ## An index running over the noise sweeps that will be
        ## cut from a given "non empty" inter event interval.
        iPos <- positions[iV[iIdx]] + round(safetyFactor*sl)
        noisePositions[nIdx] <- iPos
        while (uInI <= nInI & 
               nIdx <= nbPossible
               ) {
            ii <- (-before:after) + iPos
            ns <- as.vector(data[ii,])
            noiseMatrix[,nIdx] <- ns
            nIdx <- nIdx + 1
            iPos <- iPos + sl
            uInI <- uInI + 1
        } ## End of while loop on uInI
        iIdx <- iIdx + 1
        nInI <- nbI[iV[iIdx]]
    } ## End of while loop on nIdx

    the.call <- match.call()
    attr(noiseMatrix,"positions") <- noisePositions
    attr(noiseMatrix,"delta") <- NULL
    attr(noiseMatrix,"data") <- the.call[["data"]]
    attr(noiseMatrix,"before") <- before
    attr(noiseMatrix,"after") <- after
    attr(noiseMatrix,"numberOfSites") <- ifelse(is.matrix(data),dim(data)[2],1)
    attr(noiseMatrix,"call") <- match.call()
    class(noiseMatrix) <- "events"
    noiseMatrix
}

12.1.9 Definition of an `explore` method for `pca` results

explore.prcomp <- function(x,
                           pc=1, ##<< an integer: the pc index to add to the mean.
                           factor=2, ##<< a numeric, the scaling factor; that is, the plot shows mean +/- factor * pc.
                           m.col="black", ##<< a character string or an integer, the color used for mean.
                           u.col="red", ##<< a character string or an integer, the color used for mean + factor * pc.
                           l.col="blue", ##<< a character string or an integer, the color used for mean - factor * pc.
                           xlab="Index", ##<< a character string with the abscissa label.
                           ylab="Amplitude", ##<< a character string with the ordinate label.
                           main, ##<< a character string with the title. If 'missing' (default) one is automatically generated.
                           ... ##<< additional arguments passed to 'plot'.
                           ) {
    if (missing(main)) {
        w <- x$sdev[pc]^2/sum(x$sdev^2)
        main <- paste("PC ",pc," (",round(100*w,digits=1),"%)",sep="")
    }
    u <- x$center + factor * x$rotation[,pc]
    l <- x$center - factor * x$rotation[,pc]
    ylim=range(c(l,u))
    plot(x$center,type="l",xlab=xlab,ylab=ylab,col=m.col,main=main,ylim=ylim,...)
    lines(u,col=u.col,...)
    lines(l,col=l.col,...)
}

12.1.10 Definition of `get_jitter`

We define a function that estimates the jitter given:

evts: an event—or a matrix of events where individual events form the columns.
center: the 'central' (median) event on which the alignment will be performed.
centerD: the first derivative of the central event.
centerDD: the second derivative of the central event.

The functions returns a vector of estimated jitters giving the amount of sampling points by which the central event should be shifted in order to best match each individual events. This sampling jitter estimation is performed by a two stages procedure:

Linear regression is first used get a first jitter estimation based on a first order Taylor expansion.
A Newton-Raphson step is used to refine the first estimation (used as a starting point) based on a second order Taylor expansion.

get_jitter <- function(evts,
                       center,
                       centerD,
                       centerDD){
    
    centerD_norm2 <- sum(centerD^2)
    centerDD_norm2 <- sum(centerDD^2)
    centerD_dot_centerDD <- sum(centerD*centerDD)
    
    if (is.null(dim(evts))) evts <- matrix(evts, nc=1)
    
    evts <- evts - center
    h_dot_centerD <- centerD %*% evts 
    delta0 <- h_dot_centerD/centerD_norm2
    h_dot_centerDD <- centerDD %*% evts
    first <- -2*h_dot_centerD + 2*delta0*(centerD_norm2 - h_dot_centerDD) + 3*delta0^2*centerD_dot_centerDD + delta0^3*centerDD_norm2 
    second <- 2*(centerD_norm2 - h_dot_centerDD) + 6*delta0*centerD_dot_centerDD + 3*delta0^2*centerDD_norm2
    as.vector(delta0 - first/second)
}

12.1.11 Definition of `mk_aligned_events`

We define a function taking a vector of spike times, that should all come from the same cluster and correspond to reasonably "clean" events, and three arguments corresponding to the last three arguments of mkEvents, the function returns a events object. The positions attribute of the returned object gives the nearest sampling point to the actual peak. The delta attribute gives the offset between the previous spike position and the "actual" peak position (the actual position is attribute positions minus attribute delta):

mk_aligned_events <- function(positions,
                              data,
                              before=14,
                              after=30){
    dataD = apply(data,2,function(x) c(0,diff(x,2)/2,0))
    dataDD = apply(dataD,2,function(x) c(0,diff(x,2)/2,0))
    evts = mkEvents(positions, data, before, after)
    evtsD = mkEvents(positions, dataD, before, after)
    evtsDD = mkEvents(positions, dataDD, before, after)
    evts_median = apply(evts,1,median)
    evtsD_median = apply(evtsD,1,median)
    evtsDD_median = apply(evtsDD,1,median)
    evts_jitter = get_jitter(evts,evts_median,evtsD_median,evtsDD_median)
    ## positions = positions-[round(x.item(0)) for x in np.nditer(evts_jitter)]
    positions = positions-round(evts_jitter)
    evts = mkEvents(positions, data, before, after)
    evtsD = mkEvents(positions, dataD, before, after)
    evtsDD = mkEvents(positions, dataDD, before, after)
    evts_median = apply(evts,1,median)
    evtsD_median = apply(evtsD,1,median)
    evtsDD_median = apply(evtsDD,1,median)
    evts_jitter = get_jitter(evts,evts_median,evtsD_median,evtsDD_median)
    res = unclass(evts) - evtsD_median %o% evts_jitter - evtsDD_median %o% evts_jitter^2/2
    attributes(res) = attributes(evts)
    attr(res,"positions") <-  positions
    attr(res,"call") <- match.call()
    attr(res,"delta") <- evts_jitter
    res
}

12.1.12 Definition of `mk_center_list`

We define function mk_center_list that given:

positions, a vector of spike times, that should all come from the same cluster and correspond to reasonably "clean" events,
data, a data matrix,
before, the number of sampling point to keep before the peak,
after, the number of sampling point to keep after the peak,

returns a nammed list with the following elements and their content:

center: the estimate of the center (obtained from the median),
centerD: the estimate of the center's derivative (obtained from the median of events cut on the derivative of data),
centerDD: the estimate of the center's second derivative (obtained from the median of events cut on the second derivative of data),
centerD_norm2: the squared norm of the center's derivative,
centerDD_norm2: the squared norm of the center's second derivative,
centerD_dot_centerDD: the scalar product of the center's first and second derivatives,
center_idx: an array of indices going from -before to after.

The peeling procedure, requiers, for each cluster, estimates of its center and of its first two derivatives. Clusters' centers must moreover be built such that they can be used for subtraction, this implies that we should make them long enough, on both side of the peak, to see them go back to baseline. Formal parameters before and after should therefore be set to larger values than the ones used for clustering.

mk_center_list = function(positions,
                           data,
                           before=49,
                           after=80) {
    dataD = apply(data,2,function(x) c(0,diff(x,2)/2,0))
    dataDD = apply(dataD,2,function(x) c(0,diff(x,2)/2,0))
    evts = mkEvents(positions, data, before, after)
    evtsD = mkEvents(positions, dataD, before, after)
    evtsDD = mkEvents(positions, dataDD, before, after)
    evts_median = apply(evts,1,median)
    evtsD_median = apply(evtsD,1,median)
    evtsDD_median = apply(evtsDD,1,median)
    list("center" = evts_median, 
         "centerD" = evtsD_median, 
         "centerDD" = evtsDD_median, 
         "centerD_norm2" = sum(evtsD_median^2),
         "centerDD_norm2" = sum(evtsDD_median^2),
         "centerD_dot_centerDD" = sum(evtsD_median*evtsDD_median), 
         "center_idx" = -before:after)
}

12.1.13 Definition of `classify_and_align_evt`

The formal parameters of classify_and_align_evt are:

a set of events' positions,
a data matrix containing the raw (but normalized data),
a named list of centers,
arguments before and after corresponding to arguments with those names in mk_center_list,

returns a list with the following element:

the name of the closest center in terms of Euclidean distance or "?" if none of the clusters' waveform does better than a uniformly null one,
the new position of the event (the previous position corrected by the integer part of the estimated jitter),
the remaining jitter.

classify_and_align_evt <- function(evt_pos,
                                   data,
                                   centers,
                                   before=14,
                                   after=30
                                   ){
    cluster_names = names(centers)
    n_sites = dim(data)[2]
    centersM = sapply(cluster_names,
        function(cn) centers[[cn]][["center"]][rep(-before <= centers[[cn]][["center_idx"]] &
                                                   centers[[cn]][["center_idx"]] <= after,
                                                   n_sites)])
    
    evt = cutSglEvt(evt_pos,data=data,before=before, after=after)
    delta = -(centersM - evt)
    cluster_idx = which.min(apply(delta^2,2,sum))
    good_cluster_name = cluster_names[cluster_idx]
    good_cluster_idx = rep(-before <= centers[[good_cluster_name]][["center_idx"]] &
        centers[[good_cluster_name]][["center_idx"]] <= after,
        n_sites)
    centerD = centers[[good_cluster_name]][["centerD"]][good_cluster_idx]
    centerD_norm2 = sum(centerD^2)
    centerDD = centers[[good_cluster_name]][["centerDD"]][good_cluster_idx]
    centerDD_norm2 = sum(centerDD^2)
    centerD_dot_centerDD = sum(centerD*centerDD)
    h = delta[,cluster_idx]
    h_order0_norm2 = sum(h^2)
    h_dot_centerD = sum(h*centerD)
    jitter0 = h_dot_centerD/centerD_norm2
    h_order1_norm2 = sum((h-jitter0*centerD)^2) 
    if (h_order0_norm2 > h_order1_norm2) {
        h_dot_centerDD = sum(h*centerDD)
        first = -2*h_dot_centerD + 2*jitter0*(centerD_norm2 - h_dot_centerDD) +
            3*jitter0^2*centerD_dot_centerDD + jitter0^3*centerDD_norm2
        second = 2*(centerD_norm2 - h_dot_centerDD) + 6*jitter0*centerD_dot_centerDD + 
            3*jitter0^2*centerDD_norm2
        jitter1 = jitter0 - first/second
        h_order2_norm2 = sum((h-jitter1*centerD-jitter1^2/2*centerDD)^2)
        if (h_order1_norm2 <= h_order2_norm2) {
            jitter1 = jitter0
        }
    } else {
        jitter1 = 0
    }
    if (abs(round(jitter1)) > 0) {
        evt_pos = evt_pos - round(jitter1)
        evt = cutSglEvt(evt_pos,data=data,before=before, after=after)
        h = evt - centers[[good_cluster_name]][["center"]][good_cluster_idx]
        h_order0_norm2 = sum(h^2)
        h_dot_centerD = sum(h*centerD)
        jitter0 = h_dot_centerD/centerD_norm2
        h_order1_norm2 = sum((h-jitter0*centerD)^2) 
        if (h_order0_norm2 > h_order1_norm2) {
            h_dot_centerDD = sum(h*centerDD)
            first = -2*h_dot_centerD + 2*jitter0*(centerD_norm2 - h_dot_centerDD) + 
                3*jitter0^2*centerD_dot_centerDD + jitter0^3*centerDD_norm2
            second = 2*(centerD_norm2 - h_dot_centerDD) + 6*jitter0*centerD_dot_centerDD + 
                3*jitter0^2*centerDD_norm2
            jitter1 = jitter0 - first/second
            h_order2_norm2 = sum((h-jitter1*centerD-jitter1^2/2*centerDD)^2)
            if (h_order1_norm2 <= h_order2_norm2) {
                jitter1 = jitter0
            }
        } else {
            jitter1 = 0
        }
    }
    if (sum(evt^2) > sum((h-jitter1*centerD-jitter1^2/2*centerDD)^2)) 
        return(list(cluster_names[cluster_idx], evt_pos, jitter1))
    else 
        return(list('?',evt_pos, jitter1))
}

12.1.14 Definition of `predict_data`

The formal parameters of predict_data are:

class_pos_jitter_list: a list of lists. Each sub-list corresponds to a single detected event and as three components: the cluster of origin (or "?" if none was found), the position and the jitter.
centers_list: a list of list. Each sub-list is the result of a call to mk_center_list.
nb_channels: an integer, the number of recording channels.
data_length: an integer, the length of the data.

The function returns a matrix with data_length rows and nb_channels columns with observations predictions based on class_pos_jitter_list and centers_list.

predict_data <- function(class_pos_jitter_list,
                         centers_list,
                         nb_channels=4,
                         data_length=300000) {
    
    res = matrix(0,nc=nb_channels, nr=data_length)
    for (class_pos_jitter in class_pos_jitter_list) {
        cluster_name = class_pos_jitter[[1]]
        if (cluster_name != '?') {
            center = centers_list[[cluster_name]][["center"]]
            centerD = centers_list[[cluster_name]][["centerD"]]
            centerDD = centers_list[[cluster_name]][["centerDD"]]
            jitter = class_pos_jitter[[3]]
            pred = center + jitter*centerD + jitter^2/2*centerDD
            pred = matrix(pred,nc=nb_channels)
            idx = centers_list[[cluster_name]][["center_idx"]] + class_pos_jitter[[2]]
            within = 0 < idx & idx <= data_length
            kw = idx[within]
            res[kw,] = res[kw,] + pred[within,]
        }
    }
    res
}

Analysis of Locust Data Set

Table of Contents

1 How to proceed?

1.1 Loading the data

1.2 Getting the code

2 Preliminary Analysis

2.1 Five number summary

2.2 Were the data normalized?

2.3 Discretization step amplitude

2.4 Detecting saturation

2.5 Plot the data

3 Data renormalization

3.1 A quick check that the MAD "does its job"

4 Interactive data exploration

5 Spike detection

5.1 Interactive spike detection check

5.2 Remove useless objects

5.3 Data set split

6 Cuts

6.1 Getting the "right" length for the cuts

6.2 Events

6.3 Noise

7 Getting "clean" events

8 Dimension reduction

8.1 Principal component analysis

8.2 Exploring PCA results

8.3 Static representation of the projected data

8.4 Dynamic representation of the projected data

9 Clustering

9.1 k-means clustering

9.2 Results inspection with GGobi

9.3 Cluster specific plots

9.4 Dealing with the jitter

9.4.1 A worked out example

9.4.2 Making the procedure systematic

9.4.3 Applying the procedure systematically

10 "Brute force" superposition resolution

10.1 First peeling

10.2 Second peeling

10.3 Third peeling

10.4 Fourth peeling

10.5 Fifth peeling

10.6 General comparison

10.7 Sixth peeling

11 Getting the spike trains

12 Custom developed code

12.1 Individual function definitions

12.1.1 cstValueSgts definition

12.1.2 Definitions of generic function and time-series specific method for data exploration

12.1.3 peaks and associated methods definition

12.1.4 Definition of an explore method for time series and eventsPos objects

12.1.5 Function cutSglEvt

12.1.6 Function mkEvents

12.1.7 events methods

12.1.8 mkNoise definition

12.1.9 Definition of an explore method for pca results

12.1.10 Definition of get_jitter

12.1.11 Definition of mk_aligned_events

12.1.12 Definition of mk_center_list

12.1.13 Definition of classify_and_align_evt

12.1.14 Definition of predict_data

3.1 A quick check that the `MAD` "does its job"

8.2 Exploring `PCA` results

9.2 Results inspection with `GGobi`

12.1.1 `cstValueSgts` definition

12.1.3 `peaks` and associated methods definition

12.1.4 Definition of an `explore` method for time series and `eventsPos` objects

12.1.5 Function `cutSglEvt`

12.1.6 Function `mkEvents`

12.1.7 `events` methods

12.1.8 `mkNoise` definition

12.1.9 Definition of an `explore` method for `pca` results

12.1.10 Definition of `get_jitter`

12.1.11 Definition of `mk_aligned_events`

12.1.12 Definition of `mk_center_list`

12.1.13 Definition of `classify_and_align_evt`

12.1.14 Definition of `predict_data`