Analysis of d11221.002 Data Set

1. The data
2. The software
3. Data pre-processing
4. Data renormalization and spike detection
- 4.1. Using the low-pass subtracted first derivative hDdB
- 4.2. Using the high-passed version hDfir
5. Cuts
- 5.1. Using the low-pass subtracted first derivative hDdB
- 5.2. Using the high-passed version hDfir
6. Dimension reduction
- 6.1. Using the low-pass subtracted first derivative hDdB
  - 6.1.1. Principal component analysis
- 6.2. Using the high-passed version hDfir
7. Clustering
- 7.1. Using the low-pass subtracted first derivative hDdB
- 7.2. Using the high-passed version hDfir

Sys.setlocale(category="LC_MESSAGES",locale="C")

[1] "C"

1 The data

1.1 Location and origin

The data set d11221.002.dat is contained in folder d11221 which is freely available (after registration) from the Collaborative Research in Computational Neuroscience (CRCNS) web site. It is located in folder hc-1. I'm assuming here that the data set has been dowloaded and compressed with gzip.

1.2 Information about the data

This data set was contributed by Gyorgy Buzsáki lab, Rutgers University. This data set was used in Henze et al, 2000. The data set is made of recordings on 6 channels:

channels 1, 2, 3 and 4 are the four recording sites of a tetrode
channel 5 or 6 is an intracellular recording from one of the neurons recorded by the tetrode.

The sampling rate was 20 kHz. The data were amplified 1000 times. This stuff is what is reported in the d11221.002.xml file associated with the data (we applied a shift of one here since in the xml file channel numbers start at 0).

1.3 Loading the data

The data are simply loaded, once we know where to find them and if we do not forget that they have been compressed with gzip. We will assume here that the data are located in R's working directory:

dN <- "d11221.002.dat.gz"
nb <- 6*10000000
mC <- gzfile(dN,open="rb")
d1 <- matrix(readBin(mC,what="int",size=2,n=nb),nr=6)
close(mC)

We can check that we have loaded the whole set since in that case the number of columns of our matrix d1 should be smaller than 10e6:

dim(d1)

[1]       6 7500000

So everthing was loaded. In terms of recording duration we get (in minutes):

dim(d1)[2]/2e4/60

[1] 6.25

We can quickly check that we did not commit any major mistake while loading the data by looking at one second of intracellular trace in the middle of the set (in pratice we tried both channel 5 and 6 to see that channel 5 is the intracellular one):

plot(window(ts(d1[5,],start=0,freq=2e4),120,130),ylab="",xlab="Time (s)",main="")

Figure 1: Ten seconds of recording of channel 5 (the intracellular channel) of data set d11221.002. The snapshot is from the central part of the recording.

We do the same for the extracellular channels, except that for clarity we plot only one second of data since they are not high-pass filtered:

plot(window(ts(t(d1[1:4,]),start=0,freq=2e4),120,121),ylab="",xlab="Time (s)",main="")

Figure 2: One second of recording of channels 1 to 4 (the tetrode channels) of data set d11221.002. The snapshot is from the central part of the recording.

1.4 Five numbers data summary

We get our classical five numbers plus mean data summary:

summary(t(d1[1:5,]),digits=1)

V1	V2	V3	V4	V5
Min. :-8288	Min. :-7937	Min. :-7890	Min. :-7960	Min. :-14458
1st Qu.: -599	1st Qu.: -409	1st Qu.: -429	1st Qu.: -411	1st Qu.:-11488
Median : -212	Median : -24	Median : -40	Median : -32	Median :-10530
Mean : -199	Mean : -10	Mean : -26	Mean : -21	Mean :-10580
3rd Qu.: 191	3rd Qu.: 379	3rd Qu.: 368	3rd Qu.: 359	3rd Qu.: -9778
Max. : 7212	Max. : 7286	Max. : 7473	Max. : 7334	Max. : 4523

We see that the four extracellular channels making the tetrode (columns labeled V1 to V4 in the above table) have similar summaries for the last 3 but the first one has lower values – strange. The summary of the intracellular channel follows a different pattern (column V5).

1.5 Data selection for generative model estimation

We are going to work with the first minute of acquired data to estimate our generative model:

hD <- window(ts(t(d1[1:4,]),start=0,freq=2e4),0,60)
rm(d1)

2 The software

We start by loading file sorting.R containing the sorting specific functions from the web. These functions will soon be organized as a "propoper" R package. The URL of the file is https://raw.github.com/christophe-pouzat/Neuronal-spike-sorting/master/code/sorting.R so the loading is done by dowloading the source file first from its url:

"https://raw.github.com/christophe-pouzat/Neuronal-spike-sorting/master/code/sorting.R"

https://raw.github.com/christophe-pouzat/Neuronal-spike-sorting/master/code/sorting.R

to a file named, guess what

"sorting.R"

sorting.R

using function download.file setting the optional argument method to wget since we are downloading from an https:

download.file(url="<<code-url()>>",destfile="<<code-name()>>",method="wget")

The functions defined in the file can then be made accessible from the R workspace – assuming that R has been started from the directory where the previous download took place – with the following R command:

source("<<code-name()>>")

3 Data pre-processing

3.1 Detailed approach

Since the extracellular data were not high-pass filtered we will try our "usual approach", run the analysis on their derivatives (McGill et al, 1985) since taking derivatives high-pass filter the data and reduces spike duration which is good when one deals with overlaps.

hDd <- apply(hD,2,function(x) c(0,diff(x,2)/2,0))
hDd <- ts(hDd,start=0,freq=2e4)

We get a summary of our derivatives:

summary(hDd,digits=2)

Series 1	Series 2	Series 3	Series 4
Min. :-5.9e+02	Min. :-6.0e+02	Min. :-7.3e+02	Min. :-8.1e+02
1st Qu.:-2.0e+01	1st Qu.:-2.3e+01	1st Qu.:-2.1e+01	1st Qu.:-2.0e+01
Median : 0.0e+00	Median : 0.0e+00	Median : 0.0e+00	Median : 0.0e+00
Mean : 5.0e-04	Mean : 2.0e-04	Mean : 4.0e-04	Mean : 6.0e-04
3rd Qu.: 2.0e+01	3rd Qu.: 2.3e+01	3rd Qu.: 2.1e+01	3rd Qu.: 2.0e+01
Max. : 6.4e+02	Max. : 5.9e+02	Max. : 5.9e+02	Max. : 7.5e+02

And we can plot the first 10 seconds:

plot(window(hDd,0,10),ylab="",xlab="Time (s)",main="")

Figure 3: First 10 sec of the first derivative of recording of channels 1 to 4 (the tetrode channels) of data set d11221.002.

Here explore the data interactively is a must:

explore(hDd)

This exploration shows strange LFP like waves around 6.5, 16, 20.5, 28.5, 31.5, 40, 44.5, 57 s.

3.1.1 An attempt to suppress LFPs

Here is an attempt to get rid of them focusing on the data in time interval [6,7].

hDdw <- window(hDd,6,7)

3.1.2 Box filter

We can try box filters with several window lengths to match the low frequency pattern:

hDdwf200 <- filter(hDdw,rep(1,201)/201)
hDdwf100 <- filter(hDdw,rep(1,101)/101)
hDdwf50 <- filter(hDdw,rep(1,51)/51)

We can plot the "original" data from the third recording site together with the different filtered versions:

plot(window(hDdw[,3],6.55,6.6),ylab="",xlab="Time (s)",
     main="",col="grey70")
lines(window(hDdwf200[,3],6.55,6.6),col=1,lwd=2,lty=3)
lines(window(hDdwf100[,3],6.55,6.6),col=4,lwd=2,lty=2)
lines(window(hDdwf50[,3],6.55,6.6),col=2,lwd=2,lty=1)

Figure 4: 50 ms of the derivative data on site 3 (grey) together with box filtered versions of it obtained with a box length of 10 ms (201 sampling points, dotted black), 5 ms (101 sampling points, dashed blue) and 2.5 ms (51 sampling points, red).

3.1.3 Gaussian filter

As an alternative we can consider a Gaussian filter (which has better spectral properties):

hDdwf200g <- filter(hDdw,dnorm(-100:100,0,100/3))
hDdwf100g <- filter(hDdw,dnorm(-50:50,0,50/3))
hDdwf50g <- filter(hDdw,dnorm(-25:25,0,25/3))

We can plot the "original" data from the thirs recording site together with the different filtered versions:

plot(window(hDdw[,3],6.55,6.6),ylab="",xlab="Time (s)",
     main="",col="grey70")
lines(window(hDdwf200g[,3],6.55,6.6),col=1,lwd=2,lty=3)
lines(window(hDdwf100g[,3],6.55,6.6),col=4,lwd=2,lty=2)
lines(window(hDdwf50g[,3],6.55,6.6),col=2,lwd=2,lty=1)

Figure 5: 50 ms of the derivative data on site 3 (grey) together with box filtered versions of it obtained with a Gaussian filter with an SD of 100/60 ms (dotted black), 50/60 ms (dashed blue) and 25/60 ms (red).

3.1.4 Conclusion

Based on these figuree we could decide to filter the (derivative) data with a Gaussian filter of SD 25/60 ms before subtracting this filtered trace from the (derivative) data:

hDdF <- filter(hDd, dnorm(-25:25,0,25/3))
hDdF[is.na(hDdF)] <- 0
hDdB <- hDd - hDdF

Exploring the transformed data with:

explore(hDdB)

shows that's their still some weird stuff around 16, 28.5, 31.5, 44, 57 s but it looks much better than it did initially. Essentially the data are well behaved except during short periods around the indicated times as illustrated in the next figure:

plot(window(hDdB,57.25,57.35),ylab="",xlab="Time (s)",main="")

Figure 6: The last weird period of the first minute. A box filtered version (with a 2.5 ms box length) has been subtracted from the data.

3.2 A comparison with more traditional filtering approaches

Instead of using the derivative and then subtracting a sliding average of the data we could use something more traditional like what is done by Henze et al, 2000: "The continuously recorded wideband signals were digitally high-pass filtered (Hamming window-based finite impulse response filter, cutoff 800 Hz, filter order 50)." To do that we start by loading the signal package in our work space:

library(signal)

Loading required package: MASS

Attaching package: ‘signal’

The following objects are masked from ‘package:stats’:

    filter, poly

We then make a high-pass Hamming window-based FIR filter with a cutoff at 800 Hz and an order of 50 with:

FIRhw <- fir1(50,0.8/10,"high")

We prepare a filtered version of our raw data:

hDfir <- ts(apply(hD,2,function(x) stats::filter(x,FIRhw)),start=0,freq=2e4)
hDfir[is.na(hDfir)] <- 0

It then very easy to compare the different versions on a given site with explore:

explore(cbind(hDd[,1],hDdB[,1],hDfir[,1]))

We will do with a comparison of a zoom of the last figure on the first recording site:

plot(window(cbind(deriv=hDd[,1],
                  derivB=hDdB[,1],
                  FIRhw=hDfir[,1]
                  ),57.26,57.32
            ),ylab="",xlab="Time (s)",main="")

Figure 7: The last weird period of the first minute on the first recording site. Top trace: the derivative of the raw data; second trace: A Gaussian-filtered version (with a 25/60 ms SD) has been subtracted from the derivative; third trace: the raw data have been high-pass filtered with a Hamming window-based FIR filter with a cutoff at 800 Hz and an order of 50.

The last two traces exhibit similar features as far as spikes are concerned. The middle trace as more high-frequency noise. This comparison suggest that our initial approach: subtract Gaussian-filtered version from the derivative trace works similarly to the more traditional one. Before proceeding further, we detach the signal package:

detach(package:signal)

3.3 A note on the "LFPs"

The "LFPs" isolated by Gaussian-filtering the derivative trace are essantially identical on the four recording sites as shown on the next figure:

plot(window(hDdF,57.25,57.35),,"single",
     col=c("black","grey80","orange","blue"),
     lty=c(2,2,1,1),lwd=c(2,2,1,1),ylab="")

Figure 8: Comparison of the "LFPs" on the last weird period of the first minute on the first recording site. The "LFPs" obtained by Gaussian-filtering (SD : 25/60 ms) the derivative traces are shown: thick dashed black, site 1; thick dashed grey, site 2; thin orange, site 3; thin blue, site 4.

We see 5 oscillations in 40 ms or 125 Hz corresponding to ripples.

Since the four time series are so close it is more informative to plot each one minus the mean of the other three:

hDdFw <- window(hDdF,57.25,57.35)
hDdFwM <- ts(apply(hDdFw,1,mean),start=start(hDdFw),freq=frequency(hDdFw))
hDdFwR <- (hDdFw-hDdFwM)*4/3
colnames(hDdFwR) <- paste("Site",1:4,"resid.")
plot(hDdFwR,main="")

Figure 9: Comparison of the "LFPs" on the last weird period of the first minute on the first recording site. The "Residual LFPs" obtained by Gaussian-filtering (SD : 25/60 ms) the derivative traces before substrating from each LFP the mean value of the three others are shown.

3.4 Overall view of the LFP

The LFPs on the four recording sites are so similar that is makes sense to average them:

hDdFm <- apply(hDdF,1,sum)/4
hDdFm <- ts(hDdFm,start=0,freq=2e4)

Since after Gaussian-filtering we essentially get a low-passed version of the data, we can sub-sample them by a factor of 10 before displaying the whole LPF:

hDdFms <- hDdFm[seq(1,length(hDdFm),10)]
hDdFms <- ts(hDdFms,start=0,freq=2e3)
plot(hDdFms,xlab="Time (s)",ylab="")

Figure 10: Mean LFP time course. Our previously mentioned "nasty" periods (which are in fact ripples) can be clearly seen.

Here plotting the auto-correlation function computed globally confirms our frequency estimation:

acf(hDdFms,lag.max=2e2)

Figure 11: Auto-correlation of the mean LFP. We see clearly 5 periods in 40 ms.

3.5 Detecting ripples

Since our ripples seem to be roughly 5 periods long with a period duration of 8 ms, we can filter the mean LFP trace with few (3) periods of a cosine function of period 8 ms, take the square and filter the whole thing with a Gaussian filter with an SD of 8 ms:

hDdFmsF <- filter(hDdFms,cos((-48:48)/16*2*pi))
hDdFmsF[is.na(hDdFmsF)] <- 0
hDdFmsFF <- filter(hDdFmsF^2,dnorm(-48:48,0,16))
hDdFmsFF[is.na(hDdFmsFF)] <- 0

The whole filtered trace now looks like:

plot(hDdFmsFF,xlab="Time (s)",ylab="")

Figure 12: Mean LFP time course filtered with 3, 8 ms long periods of a cosine function, squared and filtered with a Gaussian filter of 8 ms SD. The ripples pop out very clearly.

Ripples can now be detected as points exceeding a properly set threshold on the above trace. After few attempts based on generation of plots with:

plot(hDdFmsFF,ylim=c(0,2e5))

We decided to set the threshold at 1e5:

inRipple <- hDdFmsFF > 1e5

We have to be careful here since we have obtained this inRipple logical vector from a sub-sampled version of the trace. But we are going to use this information later on the "original" data so we have to get the ripple periods right. To this end we will get the real time of the beginning and of the end of the ripples:

ii <- seq(along=inRipple)
start.ripple <- ii[c(diff(inRipple),0)==1]
end.ripple <- ii[c(diff(inRipple),0)==-1]
if (min(end.ripple) < min(start.ripple)) start.ripple <- c(1,start.ripple)
if (max(start.ripple) > max(end.ripple)) end.ripple <- c(end.ripple,length(inRipple))
ripple.bounds <- rbind(start.ripple,end.ripple)/frequency(hDdFmsFF)

This gives us a fraction of round(sum(apply(ripple.bounds,2,diff))/end(hDdFmsFF)[1]*100,digits=1) 1.5 % of sample points within a ripple.

4 Data renormalization and spike detection

4.1 Using the low-pass subtracted first derivative `hDdB`

We proceed almost as usual and set the median absolute deviation of each recording site to one but we are going to start by computing the within ripples and out of ripples MAD:

idxRipple <- unlist(lapply(1:dim(ripple.bounds)[2],
                           function(i) (ripple.bounds[1,i]*2e4):(ripple.bounds[2,i]*2e4)
                           )
                    )
hDdB.mad <- apply(hDdB,2,mad)
hDdB.mad.in <- apply(hDdB[idxRipple,],2,mad)
hDdB.mad.out <- apply(hDdB[-idxRipple,],2,mad)
all.mad <- rbind(hDdB.mad,hDdB.mad.in,hDdB.mad.out)
colnames(all.mad) <- paste("site", 1:4)
rownames(all.mad) <- c("global","within ripples","out of ripples")
round(all.mad)

 
              site 1 site 2 site 3 site 4
global             28     33     30     28
within ripples     39     43     41     40
out of ripples     28     33     30     28

We now normalize the whole trace using the global MAD which essentially the out of ripples MAD, but we keep in mind that the noise level is much larger during ripples:

hDdB <- t(t(hDdB)/hDdB.mad)
hDdB <- ts(hDdB,start=0,freq=2e4)

We are going to detect valleys (as opposed to peaks) on a box-filtered version of the data:

hDdBf <- filter(hDdB,rep(1,3)/3)
hDdBf.mad <- apply(hDdBf,2,mad,na.rm=TRUE)
hDdBf <- t(t(hDdBf)/hDdBf.mad)
thrs <- rep(-5,4)
above.thrs <- t(t(hDdBf) > thrs)
hDdBfr <- hDdBf
hDdBfr[above.thrs] <- 0
remove(hDdBf)
(sp.dB <- peaks(apply(-hDdBfr,1,sum),15))


eventsPos object with indexes of 1162 events. 
  Mean inter event interval: 1032.54 sampling points, corresponding SD: 1556.81 sampling points 
  Smallest and largest inter event intervals: 9 and 13946 sampling points.

Among the length(sp.dB) 1162 detected spikes, length(sp.dB[sp.dB %in% idxRipple]) 204 are within the ripples.

Here an interactive exploration of the detection:

explore(sp.dB,hDdB,col=c("black","grey50"))

shows that the detection is quite good; but there is clearly an extra bunch of events showing up during the bursty periods.

4.2 Using the high-passed version `hDfir`

We proceed like in the previous section and set the median absolute deviation of each recording site to one but we are going to start by computing the within ripples and out of ripples MAD:

hDfir.mad <- apply(hDfir,2,mad)
hDfir.mad.in <- apply(hDfir[idxRipple,],2,mad)
hDfir.mad.out <- apply(hDfir[-idxRipple,],2,mad)
all.fir.mad <- rbind(hDfir.mad,hDfir.mad.in,hDfir.mad.out)
colnames(all.fir.mad) <- paste("site", 1:4)
rownames(all.fir.mad) <- c("global","within ripples","out of ripples")
round(all.fir.mad)

 
              site 1 site 2 site 3 site 4
global             42     51     46     43
within ripples     66     72     70     66
out of ripples     42     51     46     43

We now normalize the whole trace using the global MAD which essentially the out of ripples MAD, but we keep in mind that the noise level is much larger during ripples:

hDfir <- t(t(hDfir)/hDfir.mad)
hDfir <- ts(hDfir,start=0,freq=2e4)

We are going to detect valleys (as opposed to peaks). We do not use a box filter here, unlike the previous section, since the data have already been high passed at 800 Hz:

thrs <- rep(-5,4)
above.thrs <- t(t(hDfir) > thrs)
hDfirr <- hDfir
hDfirr[above.thrs] <- 0
(sp.fir <- peaks(apply(-hDfirr,1,sum),15))


eventsPos object with indexes of 1141 events. 
  Mean inter event interval: 1051.75 sampling points, corresponding SD: 1579.84 sampling points 
  Smallest and largest inter event intervals: 8 and 12281 sampling points.

These global features are very similar to the ones obtained in the previous section. Among the length(sp.fir) 1141 detected spikes, length(sp.fir[sp.fir %in% idxRipple]) 196 are within the ripples.

5 Cuts

5.1 Using the low-pass subtracted first derivative `hDdB`

5.1.1 Events

In order to get the cut length "right" we start with long cuts and check how long it takes for the MAD to get back to noise level:

evts.dB <- mkEvents(sp.dB,hDdB,49,50)
evts.dB.med <- median(evts.dB)
evts.dB.mad <- apply(evts.dB,1,mad)

plot(evts.dB.med,type="n",ylab="Amplitude")
abline(v=seq(0,400,10),col="grey")
abline(h=c(0,1),col="grey")
lines(evts.dB.med,lwd=2)
lines(evts.dB.mad,col=2,lwd=2)

Figure 13: Robust estimates of the central event (black) and of the sample's dispersion around the central event (red) obtained with "long" (100 sampling points) cuts. We see clearly that the dispersion is almost at noise level 20 points before the peak and 30 points after the peak (on site 1).

So we decide to make our cuts with 19 points before and 30 points after our reference times (the valleys):

evts.dB <- mkEvents(sp.dB,hDdB,19,30)

It also interesting to compare within ripples with out of ripples events:

evts.dB.in <- mkEvents(sp.dB[sp.dB %in% idxRipple],hDdB,19,30)
evts.dB.in.med <- median(evts.dB.in)
evts.dB.in.mad <- apply(evts.dB.in,1,mad)
evts.dB.out <- mkEvents(sp.dB[!sp.dB %in% idxRipple],hDdB,19,30)
evts.dB.out.med <- median(evts.dB.out)
evts.dB.out.mad <- apply(evts.dB.out,1,mad)

A comparison of the central / median event is instructive here:

out.upr <- evts.dB.out.med+2*evts.dB.out.mad/sqrt(dim(evts.dB.out)[2])
out.lwr <- evts.dB.out.med-2*evts.dB.out.mad/sqrt(dim(evts.dB.out)[2])
ylim <- range(c(out.upr,out.lwr,evts.dB.in.med))
plot(evts.dB.in.med,type="l",col=1,lwd=2,ylim=ylim)
lines(out.upr,
      col=2,lwd=1)
lines(out.lwr,
      col=2,lwd=1)

Figure 14: Robust estimates of the central event within ripples (black) and pointwise 95% confidence interval for the robust estimate of the central event out of ripples (red).

5.1.2 First jitter cancellation

We realign each event on the overall central (median) event:

evts.dBo2 <- alignWithProcrustes(sp.dB,hDdB,19,30,maxIt=5,plot=TRUE)

Template difference: 1.696, tolerance: 1
_______________________
Template difference: 0.752, tolerance: 1
_______________________

A plot of the first 500 aligned events is obtained with:

evts.dBo2[,1:500]

d11221-evts.dBo2.png

Figure 15: The overall median aligned events' sample (first 500 events) obtained from the low-pass subtracted first derivative.

5.1.3 Noise

We cut noise "events" in between "proper events":

noise.dB <- mkNoise(sp.dB,hDdB,19,30,safetyFactor=3,2000)

5.1.4 Clean events

Since we have few superpositions apparent on our events' plot, we are going to get ride of the most obvious ones:

goodEvtsFct <- function(samp,thr=4,deriv.thr=0.1) {
  nbS <- attr(samp,"numberOfSites")
  cl <- dim(samp)[1]/nbS
  samp <- unclass(samp)
  samp.med <- apply(samp,1,median)
  samp.med.mat <- matrix(samp.med,nc=nbS)
  samp.med.D.mat <- apply(samp.med.mat,
                          2,
                          function(x) c(0,diff(x,2)/2,0)
                          )
  samp.med.D.mat <- apply(samp.med.D.mat,
                          2,
                          function(x) abs(x)/max(abs(x))
                          )
  firstAbove <- apply(samp.med.D.mat,
                      2,
                      function(x) min((1:cl)[x >= deriv.thr])
                      )
  lastBellow <- apply(samp.med.D.mat,
                      2,
                      function(x) max((1:cl)[x >= deriv.thr])
                      )
  firstAbove <- min(firstAbove)
  lastBellow <- max(lastBellow)
  lookAt <- rep(TRUE,cl)
  lookAt[firstAbove:lastBellow] <- FALSE
  lookAt <- rep(lookAt,nbS)
  samp <- samp-samp.med
  samp.r <- apply(samp,2,
                  function(x) {
                    x[!lookAt] <- 0
                    x
                  }
                  )
  apply(samp.r,2,function(x) all(abs(x)<thr))
}

We can check how the number of "good" (i.e., classified as not superposed) events changes with the threshold:

sapply(3:10,function(i) sum(goodEvtsFct(evts.dBo2,i)))

[1]  383  804  941  989 1018 1030 1050 1063

We start with a threshold of 6:

good.dB <- goodEvtsFct(evts.dBo2,6)

We can look at the good events:

evts.dBo2[,good.dB]

Figure 16: The sum(good.dB) 989 good events.

Now the not so good events:

evts.dBo2[,!good.dB]

Figure 17: The sum(!good.dB) 173 not good events.

5.2 Using the high-passed version `hDfir`

5.2.1 Events

In order to get the cut length "right" we start with long cuts and check how long it takes for the MAD to get back to noise level:

evts.fir <- mkEvents(sp.fir,hDfir,49,50)
evts.fir.med <- median(evts.fir)
evts.fir.mad <- apply(evts.fir,1,mad)

plot(evts.fir.med,type="n",ylab="Amplitude")
abline(v=seq(0,400,10),col="grey")
abline(h=c(0,1),col="grey")
lines(evts.fir.med,lwd=2)
lines(evts.fir.mad,col=2,lwd=2)

Figure 18: Robust estimates of the central event (black) and of the sample's dispersion around the central event (red) obtained with "long" (100 sampling points) cuts. We see clearly that the dispersion is almost at noise level 30 points before the peak and 40 points after the peak (on site 1).

So we decide to make our cuts with 29 points before and 40 points after our reference times (the valleys):

evts.fir <- mkEvents(sp.fir,hDfir,29,40)

5.2.2 First jitter cancellation

We realign each event on the overall central (median) event:

evts.firo2 <- alignWithProcrustes(sp.fir,hDfir,29,40,maxIt=5,plot=TRUE)

Template difference: 1.87, tolerance: 1
_______________________
Template difference: 2.188, tolerance: 1
_______________________
Template difference: 3.242, tolerance: 1
_______________________
Template difference: 2.933, tolerance: 1
_______________________

A plot of the first 500 aligned events is obtained with:

evts.firo2[,1:500]

d11221-evts.firo2.png

Figure 19: The overall median aligned events' sample (first 500 events) obtained from the high-passed data.

5.2.3 Noise

We cut noise "events" in between "proper events":

noise.fir <- mkNoise(sp.fir,hDfir,29,40,safetyFactor=3,2000)

5.2.4 Clean events

Since we have few superpositions apparent on our events' plot, we are going to get ride of the most obvious ones. We can check how the number of "good" (i.e., classified as not superposed) events changes with the threshold:

sapply(3:10,function(i) sum(goodEvtsFct(evts.firo2,i)))

[1]  355  722  873  926  967  994 1011 1030

We start with a threshold of 7 giving roughly the same number of good events as the threshold of 6 in the previous section:

good.fir <- goodEvtsFct(evts.firo2,7)

We can look at the good events:

evts.firo2[,good.fir]

Figure 20: The sum(good.fir) 967 good events.

Now the not so good events:

evts.firo2[,!good.fir]

Figure 21: The sum(!good.fir) 174 not good events.

6 Dimension reduction

6.1 Using the low-pass subtracted first derivative `hDdB`

6.1.1 Principal component analysis

We compute the PCA decomposition using the clean guys in evts.dBo2:

evts.dB.pc <- prcomp(t(evts.dBo2[,good.dB]))

We then explore the results with function / method explore:

layout(matrix(1:4,nr=2))
explore(evts.dB.pc,1,5)
explore(evts.dB.pc,2,5)
explore(evts.dB.pc,3,5)
explore(evts.dB.pc,4,5)

Figure 22: PCA of evts.dBo2 exploration (PC 1 to 4). Each of the 4 graphs shows the mean waveform (black), the mean waveform + 5 x PC (red), the mean - 5 x PC (blue) for each of the first 4 PCs. The fraction of the total variance "explained" by the component appears in between parenthesis in the title of each graph.

The plot suggest that the first 3 to 4 PCs should be enough since the fourth looks "wiggly" as if it were mainly noise.

Another way to get an upper bound on the number of PCs to keep is to compare the sample variance to the one of the noise plus a few of the first PCs (keeping in mind here that the noise is not homogeneous due to the ripples):

round(sapply(1:10, function(i) sum(diag(cov(t(noise.dB))))+sum(evts.dB.pc$sdev[1:i]^2))-sum(evts.dB.pc$sdev^2))

[1] -103  -74  -60  -48  -37  -27  -18   -9    0    8

This suggest 9 PCs as an upper bound of the number of relevant PCs. We look next at static projections:

panel.dens <- function(x,...) {
  usr <- par("usr")
  on.exit(par(usr))
  par(usr = c(usr[1:2], 0, 1.5) )
  d <- density(x, adjust=0.5)
  x <- d$x
  y <- d$y
  y <- y/max(y)
  lines(x, y, col="grey50", ...)
}
pairs(evts.dB.pc$x[,1:6],pch=".",gap=0,diag.panel=panel.dens)

Figure 23: Scatter plot matrix of the projections of the "good" elements of evts.dBo2 onto the planes defined by the first 6 PCs. The diagonal shows a smooth (Gaussian kernel based) density estimate of the projection of the sample on the corresponding PC.

This plot suggest that the first 3 PCs should be enough and that 4 or perhaps 5 clusters are required (with 2 large ones). To confirm this analysis we export the data (first 6 PCs) in csv format and visualize them with GGobi:

write.csv(evts.dB.pc$x[,1:6],file="evts-dB.csv")

The dynamical exploration confirms the previous diagnostic 3 PCs should be enough and 4 to 5 units should be considered.

6.2 Using the high-passed version `hDfir`

We compute the PCA decomposition using the clean guys in evts.firo2:

evts.fir.pc <- prcomp(t(evts.firo2[,good.fir]))

We then explore the results with function / method explore:

layout(matrix(1:4,nr=2))
explore(evts.fir.pc,1,5)
explore(evts.fir.pc,2,5)
explore(evts.fir.pc,3,5)
explore(evts.fir.pc,4,5)

Figure 24: PCA of evts.firo2 exploration (PC 1 to 4). Each of the 4 graphs shows the mean waveform (black), the mean waveform + 5 x PC (red), the mean - 5 x PC (blue) for each of the first 4 PCs. The fraction of the total variance "explained" by the component appears in between parenthesis in the title of each graph.

The plot suggest that the first 3 to 4 PCs should be enough since the fourth looks "wiggly" as if it were mainly noise. Another way to get an upper bound on the number of PCs to keep is to compare the sample variance to the one of the noise plus a few of the first PCs (keeping in mind here that the noise is not homogeneous due to the ripples):

round(sapply(1:10, function(i) sum(diag(cov(t(noise.fir))))+sum(evts.fir.pc$sdev[1:i]^2))-sum(evts.fir.pc$sdev^2))

[1] -180 -125 -101  -78  -57  -38  -20   -5    8   22

This suggest 9 PCs as an upper bound of the number of relevant PCs. We look next at static projections:

pairs(evts.fir.pc$x[,1:6],pch=".",gap=0,diag.panel=panel.dens)

Figure 25: Scatter plot matrix of the projections of the "good" elements of evts.firo2 onto the planes defined by the first 6 PCs. The diagonal shows a smooth (Gaussian kernel based) density estimate of the projection of the sample on the corresponding PC.

write.csv(evts.fir.pc$x[,1:6],file="evts-fir.csv")

The dynamical exploration confirms the previous diagnostic 3 PCs should be enough and 4 to 5 units should be considered.

7 Clustering

7.1 Using the low-pass subtracted first derivative `hDdB`

Inspection of the projections as well as dynamic visualization suggest that kmeans should do fine here. After trying a model with 5 units we switch to one with 6:

set.seed(20061001,kind="Mersenne-Twister")
nb.clust.dB <- 6
km.dB.res <- kmeans(evts.dB.pc$x[,1:4],
                    centers=nb.clust.dB,
                    iter.max=100,
                    nstart=100)
c.dB.res <- km.dB.res$cluster
cluster.dB.med <- sapply(1:nb.clust.dB,
                         function(cIdx) median(evts.dBo2[,good.dB][,c.dB.res==cIdx])
                         )
sizeC.dB <- sapply(1:nb.clust.dB,
                   function(cIdx) sum(abs(cluster.dB.med[,cIdx]))
                   )
newOrder.dB <- sort.int(sizeC.dB,decreasing=TRUE,index.return=TRUE)$ix
cluster.dB.mad <- sapply(1:nb.clust.dB,
                         function(cIdx) {
                          ce <- t(evts.dBo2[,good.dB])
                          ce <- ce[c.dB.res==cIdx,]
                          apply(ce,2,mad)
                        }
                        )
cluster.dB.med <- cluster.dB.med[,newOrder.dB]
cluster.dB.mad <- cluster.dB.mad[,newOrder.dB]
c.dB.res.b <- sapply(1:nb.clust.dB,
                     function(idx) (1:nb.clust.dB)[newOrder.dB==idx]
                     )[c.dB.res]

The sorted events look like:

layout(matrix(1:nb.clust.dB,nr=nb.clust.dB))
par(mar=c(1,1,1,1))
invisible(sapply(1:nb.clust.dB,
                 function(cIdx)
                 plot(evts.dBo2[,good.dB][,c.dB.res.b==cIdx],y.bar=5)
                 )
          )

Figure 26: The 6 clusters. Cluster 1 at the top, cluster 6 at the bottom. Scale bar: 5 global MAD units. Red, cluster specific central / median event. Blue, cluster specific MAD.

Cluster 3 and 4 have some extra weird events attributed to them. That apart, things look reasonable. We check things with GGobi after generating a csv file with:

write.csv(cbind(evts.dB.pc$x[,1:6],c.dB.res.b),file="evts-dB-sorted.csv")

We can align them on their cluster center:

ujL.dB <- lapply(1:length(unique(c.dB.res.b)),
                 function(cIdx)
                 alignWithProcrustes(sp.dB[good.dB][c.dB.res.b==cIdx],hDdB,19,30)
                 )

 Template difference: 1.168, tolerance: 1
_______________________
Template difference: 0.327, tolerance: 1
_______________________
Template difference: 2.029, tolerance: 1
_______________________
Template difference: 1.106, tolerance: 1
_______________________
Template difference: 0.438, tolerance: 1
_______________________
Template difference: 1.402, tolerance: 1
_______________________
Template difference: 1.151, tolerance: 1
_______________________
Template difference: 0.828, tolerance: 1
_______________________
Template difference: 1.516, tolerance: 1
_______________________
Template difference: 0.662, tolerance: 1
_______________________
Template difference: 1.564, tolerance: 1
_______________________
Template difference: 0.822, tolerance: 1
_______________________
Template difference: 1.039, tolerance: 1
_______________________
Template difference: 0.499, tolerance: 1
_______________________

The aligned events look like:

layout(matrix(1:nb.clust.dB,nr=nb.clust.dB))
par(mar=c(1,1,1,1))
invisible(sapply(1:nb.clust.dB,
                   function(cIdx)
                   plot(ujL.dB[[cIdx]],y.bar=5)
                   )
            )

Figure 27: The 6 clusters after alignment on the clusters' centers. Cluster 1 at the top, cluster 6 at the bottom. Scale bar: 5 global MAD units. Red, cluster specific central / median event. Blue, cluster specific MAD.

We get our summary plot:

library(ggplot2)
template.med <- sapply(1:nb.clust.dB,function(i) median(ujL.dB[[i]]))
template.mad <- sapply(1:nb.clust.dB, function(i) apply(ujL.dB[[i]],1,mad))
sl <- 50
freq <- 20
nb.sites <- 4
tl <- sl*nb.sites
templateDF <- data.frame(x=rep(rep(rep((1:sl)/freq,nb.sites),nb.clust.dB),2),
                         y=c(as.vector(template.med),as.vector(template.mad)),
                         channel=as.factor(rep(rep(rep(1:nb.sites,each=sl),nb.clust.dB),2)),
                         template=as.factor(rep(rep(1:nb.clust.dB,each=tl),2)),
                         what=c(rep("mean",tl*nb.clust.dB),rep("SD",tl*nb.clust.dB))
                         )
print(qplot(x,y,data=templateDF,
            facets=channel ~ template,
            geom="line",colour=what,
            xlab="Time (ms)",
            ylab="Amplitude",
            size=I(0.5)) +
      scale_x_continuous(breaks=0:7)
      )

Figure 28: Summary plot with the 6 templates corresponding to the robust estimate of the mean of each cluster. A robust estimate of the clusters' SD is also shown. All graphs are on the same scale to facilitate comparison. Columns correspond to clusters and rows to recording sites.

7.2 Using the high-passed version `hDfir`

Based on the previous section we use kmeans clustering with 6 centers:

set.seed(20061001,kind="Mersenne-Twister")
nb.clust.fir <- 6
km.fir.res <- kmeans(evts.fir.pc$x[,1:4],
                    centers=nb.clust.fir,
                    iter.max=100,
                    nstart=100)
c.fir.res <- km.fir.res$cluster
cluster.fir.med <- sapply(1:nb.clust.fir,
                         function(cIdx) median(evts.firo2[,good.fir][,c.fir.res==cIdx])
                         )
sizeC.fir <- sapply(1:nb.clust.fir,
                   function(cIdx) sum(abs(cluster.fir.med[,cIdx]))
                   )
newOrder.fir <- sort.int(sizeC.fir,decreasing=TRUE,index.return=TRUE)$ix
cluster.fir.mad <- sapply(1:nb.clust.fir,
                         function(cIdx) {
                          ce <- t(evts.firo2[,good.fir])
                          ce <- ce[c.fir.res==cIdx,]
                          apply(ce,2,mad)
                        }
                        )
cluster.fir.med <- cluster.fir.med[,newOrder.fir]
cluster.fir.mad <- cluster.fir.mad[,newOrder.fir]
c.fir.res.b <- sapply(1:nb.clust.fir,
                     function(idx) (1:nb.clust.fir)[newOrder.fir==idx]
                     )[c.fir.res]

The sorted events look like:

layout(matrix(1:nb.clust.fir,nr=nb.clust.fir))
par(mar=c(1,1,1,1))
invisible(sapply(1:nb.clust.fir,
                 function(cIdx)
                 plot(evts.firo2[,good.fir][,c.fir.res.b==cIdx],y.bar=5)
                 )
          )

Figure 29: The 6 clusters. Cluster 1 at the top, cluster 6 at the bottom. Scale bar: 5 global MAD units. Red, cluster specific central / median event. Blue, cluster specific MAD.

Cluster 2 and 3 have some extra weird events attributed to them. That apart, things look reasonable. We check things with GGobi after generating a csv file with:

write.csv(cbind(evts.fir.pc$x[,1:6],c.fir.res.b),file="evts-fir-sorted.csv")

We can align them on their cluster center:

ujL.fir <- lapply(1:length(unique(c.fir.res.b)),
                 function(cIdx)
                 alignWithProcrustes(sp.fir[good.fir][c.fir.res.b==cIdx],hDfir,19,30)
                 )

 Template difference: 0.869, tolerance: 1
_______________________
Template difference: 0.941, tolerance: 1
_______________________
Template difference: 0.865, tolerance: 1
_______________________
Template difference: 1.052, tolerance: 1
_______________________
Template difference: 0.848, tolerance: 1
_______________________
Template difference: 1.159, tolerance: 1
_______________________
Template difference: 0.652, tolerance: 1
_______________________
Template difference: 0.902, tolerance: 1
_______________________

The aligned events look like:

layout(matrix(1:nb.clust.fir,nr=nb.clust.fir))
par(mar=c(1,1,1,1))
invisible(sapply(1:nb.clust.fir,
                   function(cIdx)
                   plot(ujL.fir[[cIdx]],y.bar=5)
                   )
            )

Figure 30: The 6 clusters after alignment on the clusters' centers. Cluster 1 at the top, cluster 6 at the bottom. Scale bar: 5 global MAD units. Red, cluster specific central / median event. Blue, cluster specific MAD.

We get our summary plot:

library(ggplot2)
template.med <- sapply(1:nb.clust.fir,function(i) median(ujL.fir[[i]]))
template.mad <- sapply(1:nb.clust.fir, function(i) apply(ujL.fir[[i]],1,mad))
sl <- 50
freq <- 20
nb.sites <- 4
tl <- sl*nb.sites
templateDF <- data.frame(x=rep(rep(rep((1:sl)/freq,nb.sites),nb.clust.fir),2),
                         y=c(as.vector(template.med),as.vector(template.mad)),
                         channel=as.factor(rep(rep(rep(1:nb.sites,each=sl),nb.clust.fir),2)),
                         template=as.factor(rep(rep(1:nb.clust.fir,each=tl),2)),
                         what=c(rep("mean",tl*nb.clust.fir),rep("SD",tl*nb.clust.fir))
                         )
print(qplot(x,y,data=templateDF,
            facets=channel ~ template,
            geom="line",colour=what,
            xlab="Time (ms)",
            ylab="Amplitude",
            size=I(0.5)) +
      scale_x_continuous(breaks=0:7)
      )

Figure 31: Summary plot with the 6 templates corresponding to the robust estimate of the mean of each cluster. A robust estimate of the clusters' SD is also shown. All graphs are on the same scale to facilitate comparison. Columns correspond to clusters and rows to recording sites.

Analysis of d11221.002 Data Set

Table of Contents

1 The data

1.1 Location and origin

1.2 Information about the data

1.3 Loading the data

1.4 Five numbers data summary

1.5 Data selection for generative model estimation

2 The software

3 Data pre-processing

3.1 Detailed approach

3.1.1 An attempt to suppress LFPs

3.1.2 Box filter

3.1.3 Gaussian filter

3.1.4 Conclusion

3.2 A comparison with more traditional filtering approaches

3.3 A note on the "LFPs"

3.4 Overall view of the LFP

3.5 Detecting ripples

4 Data renormalization and spike detection

4.1 Using the low-pass subtracted first derivative hDdB

4.2 Using the high-passed version hDfir

5 Cuts

5.1 Using the low-pass subtracted first derivative hDdB

5.1.1 Events

5.1.2 First jitter cancellation

5.1.3 Noise

5.1.4 Clean events

5.2 Using the high-passed version hDfir

5.2.1 Events

5.2.2 First jitter cancellation

5.2.3 Noise

5.2.4 Clean events

6 Dimension reduction

6.1 Using the low-pass subtracted first derivative hDdB

6.1.1 Principal component analysis

6.2 Using the high-passed version hDfir

7 Clustering

7.1 Using the low-pass subtracted first derivative hDdB

7.2 Using the high-passed version hDfir

4.1 Using the low-pass subtracted first derivative `hDdB`

4.2 Using the high-passed version `hDfir`

5.1 Using the low-pass subtracted first derivative `hDdB`

5.2 Using the high-passed version `hDfir`

6.1 Using the low-pass subtracted first derivative `hDdB`

6.2 Using the high-passed version `hDfir`

7.1 Using the low-pass subtracted first derivative `hDdB`

7.2 Using the high-passed version `hDfir`