Nonlinear Filters and Characterization of the Discrete Pulse Transform of Images

The Discrete Pulse Transform (DPT) for images and videos has been developed over the past few years and provides a theoretically sound setting for a nonlinear decomposition of an image or video. In [1] the theoretical basis of the DPT was presented. In this paper we now present a sound characterization of this useful nonlinear hierarchical decomposition by referring to its ability as a separator, the consistency of the decomposition, as well as the smoothing ability of the decomposition. Keywords-LULU; Discrete Pulse Transform; DPT; Nonlinear Decomposition; Feature Detection


I. INTRODUCTION
The Discrete Pulse Transform (DPT) is a nonlinear hierarchical decomposition obtained by the successive application of the LULU operators L n and U n where n increases from 1 to N , where N is the number of data points in the signal.For a concise overview of the onedimensional LULU operators see [2] as well as further collaboration with Laurie and Wild.The LULU operators were extended in detail to multidimensional arrays in [1], which provides a framework for the obvious areas of image processing in two dimensions, as well as video processing in three dimensions.We provide a short overview of these operators and the DPT here for completeness.

A. LULU Operators
The concept of morphological connectivity was introduced by J. Serra and G. Matheron in the 1980's.They recognised the need for the concept of an axiomatic connectivity and thus the axiomatic approach to connectivity was introduced.In 1988 Serra and Matheron, [3], introduced the concept of a connectivity class, for use in Mathematical Morphology.
Definition 1: C is a connectivity class or a connection on the power set P(E) if the following axioms hold: (i) ∅ ∈ C (ii) {x} ∈ C for each x ∈ E (iii) For each family {C i } in C such that C i = ∅, we have C i ∈ C. A set C ∈ C is called connected.
We define the operators L n and U n on A(Z d ), where A(Z d ) is the vector lattice of all real functions defined on Z d with respect to the usual point-wise defined addition, scalar multiplication and partial order.
Definition 2: Let f ∈ A(Z d ) and n ∈ N. Then where It is important to notice that here the collection of nneighbourhoods, N n (x), the LULU operators act on, can take on any shape as the only restriction is on their size.This is the important advantage of the LULU operators, which are only concerned about size, and morphological filters, which operate in conjunction with a specified structuring element with a specified size.This allows for open investigation of the image structures instead of searching for structures of a specific shape.

B. The Discrete Pulse Transform
The operators L n and U n smooth the input signal by removing peaks (the application of L n ) and pits (the application of U n ).These peaks and pits are defined mathematically in Definition 3 and 4.
Definition 3: The results proved in [1] and [4] provide the following summary of the effect of the operators L n and U n on a function f ∈ A(Z d ): (1) The have local maximum (minimum) sets of size n or less.The DPT provides a representation of an image (when d = 2) and higher dimensional arrays at all the scale levels.We obtain a decomposition of a function f ∈ A(Z d ), with finite support.Let N = card(supp(f )).We derive the DPT of f ∈ A(Z d ) by applying iteratively the operators L n , U n with n increasing from 1 to N as follows where the components of (1) are obtained through The set V is the support of the pulse φ, that is supp(φ) = V .We can then reconstruct the original signal as

A. Linear versus Nonlinear
As discussed in [4], the nonlinearity of the LULU smoothers makes theoretical development more complicated than for linear operators.However, taking on the additional complexity is justified since in two dimensions an image is basically the transformation of data by a human eye or measuring instrument.This transformation is significantly complicated to be considered nonlinear [5].Thus taking this stance the analysis of images via nonlinear operators is more logical than that of linear.Linear operators are also notorious for blurring edges.
Linear processing techniques are however a natural starting point for analysis due to the simplicity of their application and theoretical backbone available.Examples of linear filters are the Fourier transform, Hadamard transform, the discrete cosine transform, and wavelets.They also provide sufficient results in most applications, but there are problems in which a nonlinear process would prove more viable and efficient.Pitas and Venetsanopoulos [6] provide examples of such cases, such as signal dependent noise filtering e.g.photoelectron noise of photosensing devices; multiplicative noise appearing as speckle noise in ultrasonic imaging and laser imaging; and nonlinear image degradations e.g. when transmission occurs through nonlinear channels.Advantages of nonlinear filters are 1) the ability to handle various noise types, 2) edge preservation, 3) fine detail preservation, 4) unbiasedness (directional and illumination based) or invariance, and 5) computational complexity [6].
Nonlinear filtering techniques can be broadly classified accordingly in the following areas: order statistic filters, homomorphic filters, polynomial filters, mathematical morphology, neural networks, and nonlinear image restoration [6].The LULU operators fit nicely into the areas of mathematical morphology, due to their similarity to area operators therein, as well as order statistics, two areas which have been integrated quite efficiently in literature [6].Examples of order statistics, discussed in detail in [6], are the median, rank-order filters, maxmin filters, L p -mean filters, and α-trimmed mean filters.The LULU operators are examples of max-min filters but with the disadvantages listed in [6] improved upon.
The basic filters of mathematical morphology are the erosion and dilation, and subsequently the morphological opening and closing, to which the LULU filters are again closely related.

B. Separators
A common requirement for a filter P , linear or nonlinear, is its idempotence, that is P • P = P .For linear operators the idempotence of P implies the idempotence of the complementary operator I − P , where I denotes the identity operator.For nonlinear filters this implication generally does not hold, so the idempotence of I − P , also called co-idempotence, [7], can be considered as an essential measure of consistency.
For every a ∈ Z d the operator is called a shift operator.We now define a separator which mimics the actions required of an operator P .The first three properties in Definition 6 define a smoother.More detail on smoothers can be found in [4].
Definition 6: An operator P : (Scale invariance) (iv) P • P = P (Idempotence) (v) (I − P ) • (I − P ) = I − P (Co-idempotence) Figure 1 illustrates the action of a separator P .It shows how a separator will separate the signal into noise and the true signal without the need for recursive smoothing, that is, it does the separation on the first filter application so that there is no 'signal' left in the 'noise' nor any 'noise' left in the 'signal'.The median smoother is an example of a filter which requires recursive application.The LULU operators L n and U n and all their compositions are separators thus providing a strong separating capability of a signal.

C. Nonlinear Decompositions
The structure of a hierarchical decomposition is as follows in general.The operator F 1 is applied to the input image f to obtain a decomposition of f into f 1 , the smoother image, and D 1 , the noise component removed.This process is repeated with F 2 , F 3 ,...,F N until there is nothing left to remove except the constant image D N .The decomposition then has the form f = β 1 D 1 + β 2 D 2 + ... + β N D N , for some β i , i = 1, ..., N .Such a Fig. 1.The action of a separator P hierarchical decomposition has been investigated intensively in literature, see [8], [9], [10] for some nonlinear cases.However, in no literature have we found a unified theoretical backbone to connect such nonlinear hierarchical decompositions and provide methods of comparison nor methods of testing the capability of the structure of the decomposition.In Tadmor et al [8], for example, a decomposition f = k j=1 u j + v k is obtained, where v k is the noise component and the u j 's the decomposition components, by functional minimization.Tadmor et al discuss convergence of the minimizer, localization and adaptability, but nothing to indicate the strength of the decomposition save numerical visual examples.Similarly Wong et al [10] do not provide a theoretical indication of the strength of their decomposition obtained as a probabilistic scale-space derived from the nonlinear diffusion equation in [11].In [9], the authors even state that comparisons with their proposed nonlinear scale-space and other nonlinear hierarchical decompositions 'are to be made with care'.

D. Consistent Decompositions
As stated in [6] and above in Section II-C, the main limitations of nonlinear decompositions is the lack of a theory with which to compare the ability of various decompositions.The Highlight theorem first conjectured in 2007 [12], and later proved in 2010 [13], provides this much needed backbone.The quality of a nonlinear hierarchical decomposition, such as the Discrete Pulse Transform given in (1), can be characterized through the concept of consistent decomposition (also called strong consistency, [12]).For a linear decomposition of two signals f and g, namely D(f ) = i D i (f ) and D(g) = j D j (g), the equality always holds.This is clearly a desirable property but for a nonlinear decomposition this will in general not be satisfied.However, a weaker form of linearity is provided by the Highlight Theorem.It shows that the multidimensional Discrete Pulse Transform in (1) is strongly consistent, in the sense that the above holds for α, β > 0. The Highlight Theorem [13] states the following: Given a basis of pulses identified by the DPT for a signal f , form a new function g as any linear combination of the pulses of f with heights the same sign as before.Then the DPT of g will identify the same basis of pulses, and recover the new heights.
Figure 2 provides an illustration of the Highlight theorem.The name of theorem indicates its usefulness.Besides the 'weak' linearity it presents for a nonlinear decomposition, which we shall define as highlightlinearity, it allows for the highlighting or emphasizing of specific pulses deemed to be important, without destroying the structure of the decomposition.We state the theorem more precisely in the following form.
The proof of Theorem 7 by Laurie for any dimension can be found in [13].Therein the theory is described by considering the signal as a graph where the data points are the graph vertices and with the edges between the vertices based on the geometry of the of the signal (i.e.connectivity).Laurie also provides an implementation algorithm for the DPT which is O(m) where m is the number of edges in the graph.
The DPT decomposition is also total variation preserving.We assume C on Z d is a graph connectivity, for example for images the pulses of the DPT are based on 4-or 8-connectivity, the individual pulses can be viewed as a graph G = (V n , E m ) with the data points as the n vertices V n and the neighbour relation between data points as the m edges E m .The connectivity of such a graph G can be defined via a relation r ⊂ Z d × Z d , where p ∈ Z d is connected (by an edge) to q ∈ Z d iff (p, q) ∈ r.The relation r reflects what we consider neighbours of a point in the given context.For example, 4-connectivity and 8-connectivity.Let r be a relation on Z d .We call a set C ⊆ Z d connected, with respect to the graph connectivity defined by r, if for any two points p, q ∈ C there exists a set of points {p 1 , p 2 , ..., p k } ⊆ C such that each point is neighbour to the next one, p is neighbour to p 1 and p k is neighbour to q.Here we assume that r is reflexive, symmetric and shift invariant, (p, p + e k ) ∈ r for all k = 1, 2, ..., d and p ∈ Z d , where The Total Variation of f ∈ A(Z d ) is given by Although the importance of total variation preservation for separators cannot be doubted, it is even more so for hierarchical decompositions like the Discrete Pulse Transform, due to the fact that they involve iterative applications of separators.Since the operators L n , U n , n = 1, 2, ..., N , and all their compositions, are total variation preserving, it is easy to obtain the following result, which shows that, irrespective of the length of the vector or the number of terms in the sum , no additional total variation, or noise, is created via the decomposition, namely The proof of this result can be found in [1].We should remark that representing a function as a sum of pulses can be done in many different ways.However, in general, such decompositions increase the total variation, that is, we might have strict inequality in (2) instead of equality.Based on this result we can construct the total variation distribution of images.More precisely, this is the distribution of the total variation of an image among the different layers of the DPT.That is, essentially the plot of T V (D n (f )) vs. n.In Figure 3 we present the total variation distribution of an image, where one can observe how the total variation is distributed over the pulse size.
In the graph a log scale is used on the vertical axis and the pulse size values are grouped to form a histogram.The different character of images naturally manifests through different forms of total variation distributions.
Using the total variation distribution as a guide for the content of the image we can determine, firstly, that the image is relatively 'denoised' when the total variation graph stabalizes, that is, the very little information is removed after this scale by the DPT.In Figure 4

E. Measuring the Smoothing Ability of the LULU Operators
The ability of an operator to effectively remove noise and smooth the signal is usually measured by its output variance or the rate of success in the noise removal [6].Other measures used to assess the performance are the mean square error (MSE) and signal-noise-ratio (SNR) [6].In this section we shall present a method in which to measure the quality of the smoothed image as the question of the smoothing ability of the DPT arises.The aim of a smoother is of course to remove the noise element present.The noise can be due to a number of factors, for example, acquisition, processing, compression, storage, transmission and reproduction of the image [14].The easiest method of evaluation is purely subjective -namely, human visual investigation.In order for evaluation to be objective, quantitative methods need to be used instead.
Quantitative methods can be divided into three categories [14].First, full-reference, where the complete reference (undistorted) image is known with certainty, secondly, no-reference, where this reference image is not known at all, and third, reduced-reference, where only part of the original reference image is known, for example, a set of extracted features.We measure the similarity of the smoothed images P n (f ) to the original unsmoothed image f with the structural similarity index [14], , for two corresponding sets of pixels, x and y, in each image, where µ i , i = x, y is the mean of the pixel values in i, σ 2 i , i = x, y is the variance of the pixel values in i, σ xy the covariance between x and y, c j = (k j L) 2 , j = 1, 2 constants to stabilize the division by the weak denominator, L = 255 for greyscale images and  1 constants (we used k j = 0.05).This measure is a full-reference measure which provides a useful framework since we are comparing a smoothed version of the original distorted image with the original distorted image.The most widely used such measures are the mean-square-error (MSE) and the peak signal-noise-ratio (PSNR), but these measures do not compare well with the perceived visual quality of the human visual system.This measure is introduced in order to penalize errors based on their visibility, that is, to simulate the HVS as much as possible.This measure is applied to 8 × 8 windows in the image for each pixel and a final mean structural similarity index is calculated as the average of these SSIM values, called the MSSIM.An MSSIM value closer to 1 indicates stronger similarity.Wang et al provide MATLAB code for the implementation of the MSSIM as a free download, which was made use of.
Figure 5 provides MSSIM values for various images as the LULU smoothing progresses through the DPT from scale n = 1 up to N .Notice how, based on the content of the images, the reduction in the MSSIM values as the DPT progresses varies from image to image.The graphs provide a mechanism to determine where visual structure is in the image, that is, when the HVS would pick out structures of significance.Notice how the ocean image in Figure 5 contains very little structure and the MSSIM plot decreases gradually through the application of the DPT without any 'occurrences'.The cat image however presents a number of phenomenons in its MSSIM plot which indicate structure.Figure 6 indicates what is present at these scales.Scales 1 to 4030 represent the detail and the remaining scales the large relatively unimportant scales.The eye, face and forehead are represented at scales 4234 to 4235, 4325 to 4335 and 14565 to 14575 respectively.

III. CONCLUSION
We have presented on overview of the LULU operators and the resulting Discrete Pulse Transform for multidimensional arrays.As a new hierarchical decomposition the status of the DPT within the image processing community needs to be justified, thus we provide a characterization of the theoretical backbone of this nonlinear decomposition.This also provides a method of comparison for other nonlinear decompositions, which does not currently exist.The opportunity for further measures

Theorem 7 :
Figure2provides an illustration of the Highlight theorem.The name of theorem indicates its usefulness.Besides the 'weak' linearity it presents for a nonlinear decomposition, which we shall define as highlightlinearity, it allows for the highlighting or emphasizing of specific pulses deemed to be important, without destroying the structure of the decomposition.We state the theorem more precisely in the following form.Theorem 7: For a DPT decomposition of f , let g = N n=1 γ(n) s=1 α ns φ ns where the constants α ns are positive.Then the DPT decomposition of g is obtained as DP T (g) = N n=1

Fig. 3 .
Fig. 3. Full total variation distribution and log histogram total variation distribution of Cat Image various scales of the DPT are picked out based on the total variation graph.Scales 1 to 17859 indicate the smoothed image, that is, when the total variation removal stabalizes.This smoothed image appears very similar to the original (with an MSSIM value of 0.9736 (see section II-E)) but has been smoothed by removing the remaining scale levels, 17860 to 58571, which contain the large undetailed pulses.Scales 1 to 880 provide the texture or detail of the image, scales 4385 to 4395 represent the large eye of the cat and scale 11420 the eyes, nose and mouth of the cat.

Fig. 4 .
Fig. 4. Chelsea image using only scales 1 to 17859, 17860 to 58571, and 1 to 880 in the first row, and 4385 to 4395 and 11420 in the second row, respectively k j

Fig. 6 .
Fig. 6.Cat image at scales 1 to 4030, 4031 to 58571, 4234 to 4235, 4325 to 4335, and 14565 to 14575 respectively application of L n (U n ) removes local maximum (minimum) sets of size smaller or equal to n.(2)No new local minimum (maximum) sets are created where there were none, however the action of L n (U n ) may enlarge existing local minimum (maximum) sets or join two or more local minimum (maximum) sets of f into one local minimum (maximum) set of L n