Wednesday, September 4, 2013

Reading notes from a score sheet, and playing them!

Figure 1. Can you guess what music is this? Clue: food!



I won't be naming this score sheet at the moment. I got this score sheet from www.flutetunes.com. For now, I'll discuss how to process this image in a way that the computer will be able to read each note, and hopefully with the corresponding duration of the tone. I selected this piece because first this is a Filipino song (another clue!) and second is that this is in C major, which reduces the complexity of processing the score sheet. The problem with those in other majors is that the natural symbol is often added at certain notes to remove the flat or sharp. Also, I chose this music because there are no rests, which have a lot of varieties in appearances, depending on the duration of the rest. To further simplify the activity, I removed the clef and the time signature:
Figure 2. Same score sheet but with fewer symbols.


First, I try to locate the position of the staves:
Figure 3. Plot of the line scan at the 531th column. This column was chosen
because there will be no intersection with other symbols along the scan.
Figure 3 looks the same as the staves because the plot was made to connect successive points. Since the locations look like Dirac deltas that are separated by around 20 pixels, using only dots or circles to denote the location of the peaks at the cross section would look flat:

Figure 4. Plot of the line scan at the 531th column. Similar to the previous, but
scaling was performed to make it look better.

Now that we have the locations of each line for all the staves. we use them to approximate the position of the notes. But we'll leave that aside.

To simulate the music from the score sheet, we need to determine not only the note but also the duration of the tone. In the score sheet above, there are half notes, quarter notes, and one-eight notes. Also, there are dotted notes, with the dot indicating a duration for the note that is 1.5 times the original duration of the note.

I was able to differentiate the half notes and quarter notes using edge detection and template matching. I selected the round part of a quarter note and obtained its negative, then performed edge detection.
Figure 5. Left: original "note", center: negative of the previous.
right: image for template matching.
The image on the right of Figure 5 was added to a black image with the same size as the image of the score sheet. I'm not going to display it here because the score sheet has a dimension of 807x1362 pixels (row x column), and the image containing the trace of the note has a dimension of 76x17 pixels, making the final image for the template a black image with a small white smudge at the center.

I also obtained the negative of the score sheet:
Figure 6. Negative of the image in Figure 2.
Using template matching using cross correlation and thresholding, as performed in one of my previous entries, I was able to get the following image:
Figure 7. Result of template matching and thresholding.
Comparing Figures 6 and 7, you'll see that the larger blobs correspond to half notes while the smaller ones correspond to the quarter and one-eight notes.

HMMM. For now i'll be neglecting the difference between the quarter note and one-eight note, as well as the dot. If time permits, I'll be working on these things.


After much deliberation on how to trace the notes that belong to the same staff, I've become successful! Yeah!
Figure 8. Location of the centroid of the notes. The lines connecting
the dots indicate that they are on the same staff.

Impressive isn't it? For me it is. After all, I've spent at least 4 days thinking about this. By locating the 5 lines on the series of staves, I scanned through the image from the first row to the row between the first E4 to the second F5, then to the row between the second E4 to the third F5, and so on. In Figure 9 below, the red dots indicate where the scanning through the rows per staff will be limited.
Figure 9. Separation of the staves.

Oh, and by the way, the points in Figure 8 representing the centroid of the notes was determined using the combination of the functions SearchBlobs() and AnalyzeBlobs(). This also allowed me to determine which note would come first for each of the staves.

At the moment, I'm already able to tell which line or space does one note fall on. To know the duration of the note, I used a threshold value different from Figure 7:
Figure 10. Different thresholding
You'll see, by comparing with Figure 2, the large chunks of blob correspond to the half notes while the smaller and diffused speckles correspond to quarter notes and eight notes. I scanned around the centroids, which I computed previously, then count how many non-zero pixels are there.
Figure 11. Area measurement of each blob.
Good thing There are extreme values of the number of pixels for the half notes and quarter notes. I assigned the blobs with more than 300 pixels to be half notes while the rest are considered to be quarter notes. Sorry guys, I'm taking a risk here by having some notes take a longer time than what was indicated in the score sheet. But then again, if we have the half note take a duration of 0.5 second, what huge difference would it make to have all the quarter notes and one-eight notes last for 0.25 second? At the same time, I also neglected the presence of dots beside the notes. Hopefully we would still hear a good music. We'll see later :D or hear.

So here it is, the plot of the locations of the centroid on the image, with the red stars representing the half notes and the blue dots representing the quarter notes and one-eight notes:
Figure 12. Location of the notes on the image. Red stars: half notes; blue dots: quarter notes and one-eight notes

After three days, Scilab was able to finish performing the song. YES THREE DAYS. I don't know why it took Scilab that long. Maybe I should have deleted the other variables to empty up the memory of the computer. And because of that I was not able to do my other blog activities. Oh well, have fun listening to the song!


For this activity, I would give myself a grade of 12/10 because I used different techniques:

  1. edge detection - which is actually a weird but novel approach, i think. this is to pick up only the notes 
  2. template matching - locate the notes based on the reconstructed edge of the note
  3. blob analysis - finding the centroid of the notes
  4. morphological operations - determine the duration of the note. analysis of the size allowed me to determine which are the half notes and which are different
  5. this one is actually not a technique, but it was fun doing the activity by analyzing the entire image, and not by first dissecting it to different staves. what's cool about this is that the program then analyzes the score sheet from the first measure through the final measure, all found on the same image.
And I'm also proud of my work since I did this for two agonizing weeks. Plus the song is very...nationalistic? hahaha. :)


References:
1. A12 - Playing notes by image processing. M. Soriano. 2013.




No comments :

Post a Comment