Thursday, September 26, 2013

Using neural networks to identify patterns

In the previous entry, the basis for classifying our test images was the distance from the training images on the color versus shape plot. What I will be showing you now is the use of neural networks to actually train the computer in identifying the characteristics that I need from the set of images. The term "training" for the previous entry was loosely used, as I simply measured two parameters on the two sets of images and used them as reference points.

The artificial neural network toolbox of Scilab accepts different parameters for the training: number of iterations, tolerance, and learning rate.

Using the same set of data obtained by characterizing the "shape" and "color" of the petals of the blue plumbago (Plumbago auriculata) and galphimia vine (Galphimia gracilis), the following output was given by the ann_FF_run() function:

first sample: 0.5002511    
second sample: 0.5023937     
third sample: 0.5002258  

parameters:

  1. learning rate: 2.5
  2. tolerance: 1
  3. iterations: 700


If you'd remember, the first and third samples are of the blue plumbago and the second one is from a galphimia vine. The results indicate that with the given parameters, the computer was able to determine which of them come from a certain type of plant. Although the value given for the first and second differ only by as much as 0.002, we can say that the output is reliable, as the difference between the first and the third vary only by around 0.000025. The first difference is almost 100 times than that of the second. 

I tried to vary the number of iterations that the ANN toolbox would use, and found this result:
Figure 1. Plot of output versus the number of iterations for the training process. the blue triangle and star correspond to the first and third sample respectively and the yellow circle correspond to the second sample.

You'll see that at some point the outputs are higher for the blue samples than the yellow sample, and in the other the opposite is observed. For this to be applied on a completely unknown sample, we must first compare it to test sets to have a reference.

For this activity, I will give myself a grade of 10/10.

References:
1. A15 - Neural Networks. M. Soriano. 2013.

Thursday, September 19, 2013

Pattern Recognition

How do we classify plants? We usually look at the shapes of their leaves or observe the color of their fruits. As I recall from my previous Biology classes, the leaves vary from one plant to another, may it be the base, apex, or outline of the leaves. Also, we expect that the color of the fruit is the same as the color of the flower. Fruits with a dark red or violet to dark violet colors are rich in anthocyanin, which increases the amount of anti-oxidants in our body. So the next time you buy apples, choose only those with a dark red color. Those that have a red with yellow patches tend to be older and have lost their anthocyanin.

Anyway, I'm a physicist so I might be wrong :D What I'll be discussing in this blog is recognizing patterns from images. I captured photos of two types of flowers in UP Diliman:

Figure 1.  Blue plumbago (Plumbago auriculata)

Figure 2.  Galphimia vine (Galphimia gracilis)

I didn't know what these flowers are. I had to post it on my Facebook account and wait for someone to name them. Luckily, my friend Frances was familiar with these flowers! *phew* It saved me a lot of time looking for their names.

My goal is to have a measure on these petals in such a way that when I have one of them, I'll be able to tell what flower it is. But instead of picking the flowers and tearing the leaves apart, I just took photos of them at different angles and lighting.
Figure 3. Training set for the blue plumbago


Figure 4. Training set for the galphimia vine

No, I really can't pull out the petals even for the sake of the activity. I used GIMP to remove the petals from the flowers to have 10 training images for both flowers. It's difficult to use image segmentation based on the colors, as it will locate the entire flower from the image. Brute force was needed to have training sets.

I characterized the petals according to:

  1. SHAPE - well not actually shape, as I found no means to measure the shape or determine the type of its base, apex, or what. I measured the ratio of the area of the blob (which is simply the number of nonzero pixels) to the area of the region enclosing the blob (from the AnalyzeBlob function of Scilab, I got the coordinates of the bounding box). Figure 5 below illustrates the regions of interest for both type of flowers. As you would see, the blue plumbago occupies more region on the bounding box than the galphimia vine because it is more round. The epithet auriculata pertains to the ear-like feature of the petal [1]. The petal of the galphimia vine has a pointed base, allowing more black pixels to occupy the bounding box.
  2. COLOR - from the normalized chromaticity coordinates, I measured the distance of each data point (on the NCC) from the origin and summed them all up. I know this is a weird method, but then again I can't think of any other way to have a measure of the color.

Figure 5. Estimate of the shape of the two types of petals

Figure 6. Quantifying the values from the normalized
chromaticity coordinates

From these classifiers, I plotted the training sets:
Figure 7. Locating the training sets on the color versus shape plot. Blue dots indicate the training images
for the blue plumbago flowers, yellow dots indicate those of the galphimia vine flowers.

Now that I have a clustered plot (the training sets are quite grouped close to each other), I will now try to locate these three test petals:
Figure 8. Test images. The symbols below indicate the legend for the next plot.
Using the same measures for the shape and color:

Figure 9. Locating the test images.

Based on Figure 9, by visual inspection, we can say that the two blue plumbago test petals are closer to the majority of the blue dots. Similarly, the test petal for the galphimia vine petal was closer to the yellow dots. Using simple distance measurements:

Table 1. Distance of the test images from the training sets

You'll see that both the first and third test images (both blue plumbago petals) have smaller distance from the blue compared to the yellow, while the second image (galphimia vine petal) has a smaller distance from the yellow compared to the blue. This shows that using the two classification schemes, I was able to determine what flower our test images come from. 

For this activity, I will give myself a grade of 12/10.



EDITED:
I tried another classifier for the shape:
Figure 10. Ratio of the longest segment to the area of the petal

Then ended up with this:
Figure 11. Plot of "color" versus "shape" using the measurement from Figure 9.

The results are quite similar:


References:
[1] Harrison, Lorraine (2012). RHS Latin for gardeners. United Kingdom: Mitchell Beazley. 
[2] A14 Pattern recognition. M. Soriano. 2013.

Thursday, September 5, 2013

Image compression using Principal Component Analysis

The file size of images can be reduced by performing image compression. When we capture images using our cameras, they are usually compressed immediately as JPEG files. Most digital single lens reflex cameras (DSLR's for short) allows the user to retain all the information received by the camera sensor (either CMOS or CCD) by providing a raw image. A high resolution image from my DSLR, which is in JPEG has a typical size of 4 to 6 MB. At the same time, a 20 to 40 MB raw file is saved. Discrete cosine transform (which I have mentioned way back) is performed on the image to achieve a smaller file size.

Image compression can also be performed using the concept of principal components. An image can be decomposed into a set of principal components. To visualize how it works, think of a point in a 3-dimensional space. Its distance from the origin can be described by its location along the x-, y-, and z-axis. Furthermore, if we have an multi-dimensional space, the location can be described with respect to the vectors that are orthogonal with each other. However, some of these vectors constitute a small portion on the measurement of the location of one point on the multi-dimensional space. If we neglect them, we can still be able to have a rough estimate of the location.

The same goes for image compression. This image:

Figure 1: Image that will be used for compression. (Photo credit: Norman MascariƱas)
Figure 2. Gray scale image of the red component of Figure 1
after using the pca() function of Scilab, produces a set of images like this (obtained from facpr of the function pca):
Figure 3: Principal components
The original image (800x500 pixels) was split into 10x10 sub-images. This is why we have principal components that are 10x10 pixels each. As mentioned earlier, these components (the eigenvectors of the image) will become the bases for the reconstruction of each sub-image. You will see in Figure 3that the first component (top left) have larger clustering than the other ones, with the last component (bottom right) having the smallest or finest pixels. Choosing the number of principal components depends on the contribution of each of them (which can be obtained from lambda of the function pca). This can be shown as:

Figure 4. Cumulative contribution of the components

You'll see that there is a steep increase in the contribution from the first four components. This means that we adding one component at a time will give a noticeable change in the overall reconstruction of the image. Beyond the 40th component, the contribution no longer changes, which gives us an idea that the reconstruction will not have any difference, whatever number of components we select.

To reconstruct the image, I used the first M components and get the corresponding values in comprinc for each sub-image. The function pca() computes individually per sub-image, so we need to retrieve the "intensity" of each component for every sub-image.


Figure 5. Comparison of image reconstruction, with the number beside each indicating the number of
components used for reconstruction

As expected, the first few reconstructions are more "pixelated", meaning they look like they have chunks of pixels.. Zooming in on the first reconstruction:
Figure 6. Reconstruction using only the first component

It really is a bad reconstruction, and looks like the usual images compressed using DCT at low compression rate. Increasing the number of components, as seen in Figure 5, produces a sharper and smoother image.

Figure 7. Reconstruction with 20 components


Figure 8. Reconstruction with 90 components

Comparing Figures 7 and 8, we see that there are no visible differences between the two. With this, we can just select the right number of component, which would roughly give around 99% of the image. Both of them (Figures 7 and 8) have cumulative contributions greater than 99%.

For this activity, I will give myself a grade of 10/10.

Reference:
A13-Image compression. M. Soriano. 2013

Wednesday, September 4, 2013

Reading notes from a score sheet, and playing them!

Figure 1. Can you guess what music is this? Clue: food!



I won't be naming this score sheet at the moment. I got this score sheet from www.flutetunes.com. For now, I'll discuss how to process this image in a way that the computer will be able to read each note, and hopefully with the corresponding duration of the tone. I selected this piece because first this is a Filipino song (another clue!) and second is that this is in C major, which reduces the complexity of processing the score sheet. The problem with those in other majors is that the natural symbol is often added at certain notes to remove the flat or sharp. Also, I chose this music because there are no rests, which have a lot of varieties in appearances, depending on the duration of the rest. To further simplify the activity, I removed the clef and the time signature:
Figure 2. Same score sheet but with fewer symbols.


First, I try to locate the position of the staves:
Figure 3. Plot of the line scan at the 531th column. This column was chosen
because there will be no intersection with other symbols along the scan.
Figure 3 looks the same as the staves because the plot was made to connect successive points. Since the locations look like Dirac deltas that are separated by around 20 pixels, using only dots or circles to denote the location of the peaks at the cross section would look flat:

Figure 4. Plot of the line scan at the 531th column. Similar to the previous, but
scaling was performed to make it look better.

Now that we have the locations of each line for all the staves. we use them to approximate the position of the notes. But we'll leave that aside.

To simulate the music from the score sheet, we need to determine not only the note but also the duration of the tone. In the score sheet above, there are half notes, quarter notes, and one-eight notes. Also, there are dotted notes, with the dot indicating a duration for the note that is 1.5 times the original duration of the note.

I was able to differentiate the half notes and quarter notes using edge detection and template matching. I selected the round part of a quarter note and obtained its negative, then performed edge detection.
Figure 5. Left: original "note", center: negative of the previous.
right: image for template matching.
The image on the right of Figure 5 was added to a black image with the same size as the image of the score sheet. I'm not going to display it here because the score sheet has a dimension of 807x1362 pixels (row x column), and the image containing the trace of the note has a dimension of 76x17 pixels, making the final image for the template a black image with a small white smudge at the center.

I also obtained the negative of the score sheet:
Figure 6. Negative of the image in Figure 2.
Using template matching using cross correlation and thresholding, as performed in one of my previous entries, I was able to get the following image:
Figure 7. Result of template matching and thresholding.
Comparing Figures 6 and 7, you'll see that the larger blobs correspond to half notes while the smaller ones correspond to the quarter and one-eight notes.

HMMM. For now i'll be neglecting the difference between the quarter note and one-eight note, as well as the dot. If time permits, I'll be working on these things.


After much deliberation on how to trace the notes that belong to the same staff, I've become successful! Yeah!
Figure 8. Location of the centroid of the notes. The lines connecting
the dots indicate that they are on the same staff.

Impressive isn't it? For me it is. After all, I've spent at least 4 days thinking about this. By locating the 5 lines on the series of staves, I scanned through the image from the first row to the row between the first E4 to the second F5, then to the row between the second E4 to the third F5, and so on. In Figure 9 below, the red dots indicate where the scanning through the rows per staff will be limited.
Figure 9. Separation of the staves.

Oh, and by the way, the points in Figure 8 representing the centroid of the notes was determined using the combination of the functions SearchBlobs() and AnalyzeBlobs(). This also allowed me to determine which note would come first for each of the staves.

At the moment, I'm already able to tell which line or space does one note fall on. To know the duration of the note, I used a threshold value different from Figure 7:
Figure 10. Different thresholding
You'll see, by comparing with Figure 2, the large chunks of blob correspond to the half notes while the smaller and diffused speckles correspond to quarter notes and eight notes. I scanned around the centroids, which I computed previously, then count how many non-zero pixels are there.
Figure 11. Area measurement of each blob.
Good thing There are extreme values of the number of pixels for the half notes and quarter notes. I assigned the blobs with more than 300 pixels to be half notes while the rest are considered to be quarter notes. Sorry guys, I'm taking a risk here by having some notes take a longer time than what was indicated in the score sheet. But then again, if we have the half note take a duration of 0.5 second, what huge difference would it make to have all the quarter notes and one-eight notes last for 0.25 second? At the same time, I also neglected the presence of dots beside the notes. Hopefully we would still hear a good music. We'll see later :D or hear.

So here it is, the plot of the locations of the centroid on the image, with the red stars representing the half notes and the blue dots representing the quarter notes and one-eight notes:
Figure 12. Location of the notes on the image. Red stars: half notes; blue dots: quarter notes and one-eight notes

After three days, Scilab was able to finish performing the song. YES THREE DAYS. I don't know why it took Scilab that long. Maybe I should have deleted the other variables to empty up the memory of the computer. And because of that I was not able to do my other blog activities. Oh well, have fun listening to the song!


For this activity, I would give myself a grade of 12/10 because I used different techniques:

  1. edge detection - which is actually a weird but novel approach, i think. this is to pick up only the notes 
  2. template matching - locate the notes based on the reconstructed edge of the note
  3. blob analysis - finding the centroid of the notes
  4. morphological operations - determine the duration of the note. analysis of the size allowed me to determine which are the half notes and which are different
  5. this one is actually not a technique, but it was fun doing the activity by analyzing the entire image, and not by first dissecting it to different staves. what's cool about this is that the program then analyzes the score sheet from the first measure through the final measure, all found on the same image.
And I'm also proud of my work since I did this for two agonizing weeks. Plus the song is very...nationalistic? hahaha. :)


References:
1. A12 - Playing notes by image processing. M. Soriano. 2013.




Tuesday, August 20, 2013

Application of Binary Operations

In this activity, we try to identify the areas of blobs in an image and locate which of them are "abnormal", based on the mean and the standard deviation of the areas. This method is important in biomedical imaging, particularly in identifying cancer cells.

Our sample image, Figure 1, contains numerous circular shapes with uniform areas. With visual inspection, we identify the abnormal cells to be the ones enclosed with a red rectangular shape in Figure 2.
Figure 1. Collection of "cells" with randomly placed "cancer cells"

Figure 2. Identified "cancer cells"

Before we can locate these cancer cells, we first need to "calibrate" our detecting system using a set of normal cells. We were given the following image as a basis for measuring the mean area of the cells:

Figure 3. Collection of "normal cells" for calibration

This image was split into smaller images, and each of them were converted to black and white.
Figure 4. Conversion of image segment to black and white.

Reducing the image to black and white produces an image with random white spots, which are part of the background but are above the threshold value set using the function im2bw(). Since it would be difficult to determine the exact value for the threshold in order to eliminate the noise, we need to have a system that would remove these spots. Morphological operations will allow removal of these random spots by locating the shapes of specific sizes and shapes and retaining only those that will satisfy the structuring element. The function OpenImage(im, se) of Scilab performs an erosion of the image im using the the structuring element se. Afterwards, it performs a dilation on the image using the same structuring element. The function CloseImage(), also a morphological operation, is similar to OpenImage(), but instead performs dilation before erosion.

Using the function CreateStructureElement(), I created a circular structuring element with a dimension of 11x11 pixels. With OpenImage(), the spots were removed:

Figure 5. Noise reduction using OpenImage()

You would also notice that the cells on the image at the right portion of Figure 5 have different gray levels. The cells were labeled using the function SearchBlobs(), which identifies continuous portions on the image. The clump of cells at the right-center only has 1 value for the gray level, meaning the function identifies it as a single cell. This would cause problems later in the process of measuring the mean area of the cells. 

Performing these operations on all the subimages, I now have the following: 

Figure 6. Noise reduction for all the subimages

Figure 7 below shows that we can isolate each blob. From these segments, we can have a measure of the area of the cell, which is simply the number of pixels having 1 as a value.

Figure 7. Isolation of different blobs

Using histplot(), I obtained a histogram of the measurements of the area for all the blobs.
Figure 8. Histogram of the area measurements

There are extreme values for the area with low frequencies, all located beyond the 1000-pixel mark. These measurements arise from the cells that overlap each other and formed a large blob. We neglect these values since they are obviously not the values that we need to approximate for the cell sizes. Selecting only the values for area below 900 pixels and above 300 pixels. The latter criterion was imposed because there are cells located at the boundaries of the image, which constitute the distribution at the left end of the histogram. Using these values, the cell size was found to be 507.23 pixels with a standard deviation of 90.44 pixels.

With these measurements, I was able to locate the cancer cells from the given image:
Figure 9. Identified cancer cells

There was an additional trick however, aside from using the mean and standard deviation values for the area. After I have located the blobs with areas outside the acceptable values, I performed another OpenImage() step, but with a much larger structure element. I used a circular structure element with a radius of 15 pixels. Without this step, the program will retain the blobs that represent overlapping cells, which actually have areas larger than the acceptable value. However, as mentioned previously, these are not the cells that we want to locate. With this, we now only have the cancer cells.

For this activity, I'll be giving myself a grade of 10/10 for accomplishing the activity properly.

Reference
A11 - Application of Binary Operations: Blob Analysis activity manual. M Soriano. 2013.





Sunday, August 11, 2013

Enhancement in the Frequency Domain

In this activity, our goal is to eliminate unwanted information in an image by manipulating in the frequency domain. But first, we investigate what the frequency domain looks like for different types of images. 

Two dots (1x1 pixel each) was placed equidistant from the center of the 128x128 pixel image. These dots represent a pair of Dirac deltas, which is the Fourier transform of the sine function. The spacing between the two Dirac deltas is twice the frequency of the sine function. Using fft2() on the images containing two Dirac deltas (images at the upper part of Figure 1) produces a matrix containing the inverse Fourier transform. That is why the images at the lower part of the same figure show sine functions of increasing frequency (smaller gap between successive bands) as the spacing between the Dirac deltas is increased.

Figure 1. Pair of Dirac deltas and corresponding inverse Fourier transforms.

In the fo








Sunday, July 21, 2013

Fourier transforms

In this exercise, we worked with Fourier transforms and convolution of images using the built-in "Fast Fourier" transform function in Scilab. Fourier transforming a signal will give us a representation of the distribution of the frequencies present in the said signal. A signal can be decomposed into a set of sines and cosines, with amplitude and frequencies depending on the computation of the discrete Fourier transform. 

An image is basically a signal, since it is a collection of information stored in a series of pixels. In photography, for example, information (color, brightness, etc) is transferred from the subject to the pixels of the camera. Depending on the properties of a portion of the subject, the region of the camera's sensor will store a corresponding signal. After processing these signals, we are able to view the subject in a photograph, which would hopefully be as colorful (or vibrant or whatever) as the subject.

Anyways, so there. Scilab has the function fft() and fft2() to perform a discrete Fourier transform, for one-dimensional and two-dimentional signal respectively. In our case, since we're dealing with images, which are 2D, we will be using fft2. Additionally, these two functions also perform the inverse Fourier transform, which is quite handy.


Figure 1. Transformation of two images to frequency space and back to the original.

Side note: It took me so long to finish this exercise because I can't figure out the problems in my Scilab code. Actually there were no errors. It's just that I didn't use the function mat2gray(), which converts a matrix (containing the information from the Fourier transformed image) to a grayscale image. Meryl told me to add this function to create the desired image of the Fourier transform and for the other images. Without her suggestion, I wouldn't be able to do the entire exercise. Seriously hahaha.


Figure 2. Conversion of the matrix to an image using mat2gray()


Going back to Figure 1, after performing Fourier transform, we still need to use the function fftshift(). As seen in the second column of Figure 1, the white regions, which contain the information from the Fourier transform, are distributed to the four corners of the image. Using fftshift() forces these white regions to be placed at the center of the image. As seen on the Fourier transform of the circular aperture, the result is an Airy pattern. Also, the inverse Fourier transform of the Fourier transform of the image containing the letter "A" is inverted, as opposed to the original image.

Simulation of an imaging device

Due to the finite sizes of the optical components of an imaging device, a photograph can have a lot of differences from the subject. The quality of a photograph differs from the subject as a result of the latter's convolution with the transfer function of the imaging device. A set of lenses with large diameters will produce sharper images compared to small diameters, as more rays from the original image will be allowed to pass through, which will then contribute to a wider range of information.

If we have a convolution of two functions, we can obtain the result from the multiplication of the Fourier transform of the subject and the Fourier transform of an imaging device. Afterwards, we perform inverse Fourier transform on the product. 

In an optical system, lenses automatically perform Fourier transform on an image [1]. We can think of it as the aperture being already at the Fourier plane, and we simply multiply it with the Fourier transform of the image. 


Figure 3. Simulation of an imaging device of different aperture diameters.

As expected, those with smaller diameters produced blurred images compared to those with larger diameters.

Template matching

We can also perform template matching using correlation. Similar with the previous one, we work with the Fourier transforms of two images. In this part, we use an image containing the text:
THE RAIN IN SPAIN STAYS MAINLY IN THE PLAIN
and a template image, of the same size as the first image, containing only the letter "A", which is what we need. Using correlation, we can determine the location of the letter "A" (from the template) on the first image. For both images, the font sizes and styles are similar. 

However, instead of multiplying the Fourier transforms right away, we first perform complex conjugation (using the function conj() ) on the template image. 

Figure 4. Text image and template image

Using the described method, I got this image:

Figure 5. Correlated image

By correlating the two images, the peaks (indicated by the bright spots) occurred at the locations where A is found on the text image. As you can see, the letter A is highly smeared on every point on the letters from the text image, causing the blur on the correlated image. At the regions where the template smeared exactly on the same letter, bright spots formed, indicating a strong correlation.

Performing thresholding on Figure 5, I was able to locate the bright spots:

Figure 6. Thresholding of the correlated image

This was produced using the function SegmentByThreshold(), with the threshold set to 0.8. These bright spots are the locations of the letter A in the text image. 

Edge Detection

Edge detection can also be performed by convolving an image with a 3x3 matrix of a specific pattern. The matrices that were created consist of elements having a sum of zero. In the following image, we see each matrix, followed by a 128x128 pixel image with the matrix at the center, and the resulting image after convolving with the original image (as seen in Figure 3).

Figure 7. Convolution of the Fourier transform of the original image and a 3x3 matrix.

The first matrix was able to detect the horizontal edges of the image containing the word "VIP" while the second matrix was able to detect the vertical edges. In addition, the diagonal lines were also detected by the second matrix since these lines are composed of short vertical lines. The final matrix was able to detect both the vertical and horizontal edges on the original image.

For this activity, I would give myself a grade of 10 since I was able to perform the activity correctly.

Reference
Activity 7- Fourier transform model of image transformation. M. Soriano. 2013.