Kernel Density Estimators

Suppose that we want to estimate the value of , where , and the PDF is totally unknown. We want to use a non-parametric estimation method. We suppose that lies in an Euclidean space.

Let be a region containing the point , the probability of falling into the region is defined by:

Let be i.i.d. data points drawn from , then the probability that of data points fall into region is a binomial distribution:

Fromt the properties of the Binomial distribution, we have that:

  1. becomes smaller with

We have to suppose (A) that the region is large enough such that the points are sufficient to get a sharply peaked binomial distribution.

In that case, for we have that , and also:

Now we suppose (B) that the region is sufficiently small such that the probability is constant for . In this case, we have that:

where is the volume of the region .

Observe that the two assumptions (A) and (B) are contradictory. Which is a limitation of the KDE methods. By assuming this, we can use the two results to derive:

The KDE methods are based on this result. They usually fix and then get from the observations available:

def kde(points: List, region: Region) -> float:
    """
    Generic KDE estimator
    """
    K = 0
    N = len(points)
    for point in points:
        if point in region:
            K += 1
    return K / (N * region.volume)

Parzen estimator

In the function above, we suppose that we can evaluate if the point lies inside the region, but this depends on the shape of the region we use. The Parzen estimator uses a hypercube centered at the point where we want to evaluate the PDF .

We now introduce a kernel function , also called Parzen Window, defined as follows:

To know if a point lies inside the hypercube of side centered on , we need to scale the point coordinates using this formula:

In this way, we can compute by:

and since the volume of an hypercube of dimensions and of edge size is , we can replace and in the equation and get:

class ParzenWindow(Region):

    def __init__(self, h: float, origin: Point):
        self.h = h
        self.origin = origin
        self.volume = h ** len(origin)

    def __contains__(self, point):
        return self._k( (point - self.origin) / self.h )

    def _k(self, u):        
        return int(all([ u_n < 0.5 for u_n in u ]))

Gaussian Kernel

The Parzen Estimator has discontinuities (boundaries of the cubes). A smoother kernel function is the Gaussian:

Where now represents the standard deviation of the Gaussian components.

In general, we can use every kernel function that satisfies:

Nearest-neighbour methods

A density distribution could have locations with smaller structures and location with bigger structures. KDE methods use the same parameter everywhere, and this may not be suitable for every location of the distribution.

Nearest-neighbour methods address this problem: recall the general problem of non-parametric density estimation, let's fix K and use the data to find an appropiate value for V.

We consider a small sphere centered on the point at which we wish to estimate the density , we allow the radius of the sphere to grow until it contains precisely data points. will be the volume of that sphere.

The model produced by kNN is not a true density model since the integral over all spaces diverges.

kNN can also be used as a classifier. An interesting property of kNN with is that in the limit , the error rate is never more than twice the minimum achievable error rate of an optimal classifier.