The code used in this project is well commented, and descriptions on how these functions work is best found by reading the comments. A list of m-files can be found at the bottom of this page. Let's start with the simplest functions, shall we?
semitonePitchTable()
This table is a list of half-step frequencies to be compared to an audio signal. CmajorPitchTable() and CminorPitchTable() are written in the same way and are thus omitted from this page.
%%http://en.wikipedia.org/wiki/Scientific_pitch_notation#Table_of_note_frequencies
function pitches = semitonePitchTable()
%%Returns all hearable halfsteps from C0 to E10
pitches = [16.352 17.324 18.354 19.445 20.602 21.827 23.125 24.500 25.957 27.500 29.135 30.868 32.703 34.648 36.708 38.891 41.203 43.654 46.249 48.999 51.913 55.000 58.270 61.735 65.406 69.296 73.416 77.782 82.407 87.307 92.499 97.999 103.830 110.000 116.540 123.470 130.800 138.590 146.830 155.560 164.810 174.610 185.000 196.000 207.650 220.000 233.080 246.940 261.630 277.180 293.660 311.130 329.630 349.230 369.990 392.000 415.300 440.000 466.160 493.880 523.250 554.370 587.330 622.250 659.260 698.460 739.990 783.990 830.610 880.000 932.330 987.770 1046.500 1108.700 1174.700 1244.500 1318.500 1396.900 1480.000 1568.000 1661.200 1760.000 1864.700 1975.500 2093.000 2217.500 2349.300 2489.000 2637.000 2793.800 2960.000 3136.000 3322.400 3520.000 3729.300 3951.100 4186.000 4434.900 4698.600 4978.000 5274.000 5587.700 5919.900 6271.900 6644.900 7040.000 7458.600 7902.100 8372.000 8869.800 9397.300 9956.100 10548.100 11175.300 11839.800 12543.900 13289.800 14080.000 14917.200 15804.300 16744.000 17739.700 18794.500 19912.100 21096.200 22350.600];
end
readWave()
This function is just for personal convenience.
function [d,r] = readWav(name)
%Reads, plays and creates the magnitude spectrum of an input wav file.
%Input must be a string, ex. 'music.wav'
[d,r] = wavread(name);
soundsc(d,r);
displayDFT(d, r);
end
displayDFT()
Another function for personal convenience.
function [F Y Z]=displayDFT(wave,fs)
%%Displays the positive half of the DFT magnitude spectrum in dB, but
%%can return the entire DFT (Z). Requires input wave signal and its
%%sampling frequency fs.
%%NFFT is a power of 2 to maximize efficiency of FFT algorithm
NFFT = 2^nextpow2(length(wave));
Z = fft(wave,NFFT);
%Scales amplitude axis
W = Z/length(wave);
Y = abs(W(1:floor(NFFT/2+1)));
Y = 20*log10(Y/min(Y(:)));
%Scales frequency axis
F = fs/2*linspace(0,1,floor(NFFT/2+1));
plot(F,Y);
title('Single-Sided Amplitude Spectrum of Y(t)');
xlabel('Frequency (Hz)');
ylabel('|Y(f)|');
end
displaySpectrogram()
Yet another function for personal convenience. displaydBSpectrogram() is written the same way (except the magnitude is in decibels) so it is omitted from this page. Note that uimagesc() is used instead of imagesc() to allow for irregular axes. uimagesc.m is found at the bottom of this page and requires uimage.m to work.
function Y = displaySpectrogram(S,F,T)
%%Quick function to display the spectrogram of a given STFT
%%uimagesc is used so that the axes can have non-linear spacing
X = abs(S);
uimagesc(T,F,S);
colorbar;
axis xy
xlabel('Time (s)')
ylabel('Freq (Hz)')
end
addNoise()
Because the real audio signal wasn't noisy enough.
function noisywave = addNoise(wave, c)
%%Adds additive noise onto input wave, with noise-to-signal ratio
%%proportional to C.
%%Reshapes input signal into a row of values, in case it is in column form
if (length(wave(1,:)) == 1)
wave = wave';
end
%%Generates pseudorandom values centered at 1, with range 1
r = rand(1, length(wave)) - 1;
%%Adds noise to wave
noisywave = wave + r*c;
end
hps()
Shouldn't be used on generateWave() output as it has no harmonics.
%%Computes the Harmonic Product Spectrum of a magnitude spectrum Y.
%%Works by downsampling the input sequence by 2, zero-padding it to its
%%original length, and then dotting it with the original sequence. This should
%%theoretically accentuate the peak at the fundamental frequency.
%%Including a downsampling by 3 can be done by uncommenting the commented
%%code. Absolute amplitude values of the HPS will be meaningless, but
%%the HPS is only used to obtain frequency information anyway.
function S = hps(Y)
down2 = Y(1:2:length(Y));
padding = zeros((length(Y)-length(down2)), 1);
down2 = [down2; padding];
%%down3 = Y(1:3:length(Y));
%%padding = zeros((length(Y)-length(down3)), 1);
%%down3 = [down3; padding];
S = Y.*down2;
%%S = Y.*down2.*down3;
for i=1:length(S)
S(i) = S(i)^(1/2);
end
end
quickSpectrogram()
function [S F T Y] = quickSpectrogram(wave,winsize,nfft,hpsonoff)
%%Quick spectrogram function with HPS ability and certain preset arguments
[S,F,T]=spectrogram(wave, hanning(winsize), winsize/2, nfft, 44100);
bins = length(S(1,:));
if (hpsonoff ~= 0)
Y = abs(S);
for i = 1:bins;
Y(:,i) = hps(Y(:,i));
end
displaySpectrogram(Y, F, T);
else
Y = displaydBSpectrogram(S, F, T);
end
end
varWinSpectrogram()
A nicer spectrogram than the other one.
%%This function creates a spectrogram with variable window sizing based off
%%frequency information obtained from an initial low-resolution
%%spectrogram, in order to increase the precision of pitch-detection.
%%hpsonoff is a boolean that controls whether or not the spectrogram is a
%%Harmonic Product Spectrum. isNoisy is a boolean that specifies if the user
%%wants the signal to have its DC component removed.
function [S_tot F T_tot Y] = varWinSpectrogram(wave, fs, hpsonoff, isNoisy)
%%For a simple synthetic sinusoid, finding the peak of each bin is
%%sufficient enough to find the approximate pitch during that time-window.
%%However, if there is a large DC component, the window sizing algorithm
%%will be ineffectual, so the signal is put through a simple high-pass filter with
%%a low cutoff frequency.
if(isNoisy ~= 0)
cutoff = 100;
d = fdesign.highpass('N,F3dB',2,cutoff/(fs/2));
hd = design(d,'butter');
wave = filter(hd, wave);
end
%The average lowest frequency the human ear can detect is about 20Hz. Corresponding
%period is 1/20 = 50ms. The length of the Hanning window in the initial
%spectrogram will be the equivalent of 50ms in sample length,
%and then taken to the nearest power of 2 for efficiency's sake.
maxwin=2^nextpow2(ceil(0.05*fs));
%%Initial low-resolution spectrogram. Windows are non-overlapping to make
%%later calculations substantially easier.
[S F T]=spectrogram(wave, hanning(maxwin), 0, maxwin, fs);
Y = displaydBSpectrogram(S,F,T);
bins = length(Y(1,:));
S_cell = cell(bins);
T_cell = cell(bins);
S_tot = [];
X_tot = [];
%%Pre-allocates memory for arrays
maxpeaks = zeros(1, bins);
indices = zeros(1, bins);
approxpitch = zeros(1, bins);
approxperiod = zeros(1, bins);
winsize = zeros(1, bins);
%%Calculates the approximates pitches of each time-window in the STFT
for i = 1:bins
%%If specified, convert Y into a HPS
if (hpsonoff ~= 0)
Y(:,i) = hps(Y(:,i));
end
maxpeaks(i) = max(Y(:,i));
indices(i) = find(Y(:,i) == maxpeaks(i),1,'first');
approxpitch(i) = F(indices(i));
approxperiod(i) = 1/approxpitch(i);
end
%%scaling is a constant factor found from experimentation
scaling = 20;
winsize = floor(approxperiod * fs * scaling);
for i = 1:bins
if(winsize(i) > maxwin) %Window can not be larger than the length of the signal
winsize(i) = maxwin;
end
if(winsize(i) <= 256) %Minimum window size is 256
winsize(i) = 256;
end
end
for i = 1:bins
%%Sample space for each sub-STFT. The last bin may be slightly longer than
%%the rest to make sure the whole signal is transformed.
start = ((i-1)*maxwin+1);
over = (maxwin*i);
if(i==bins)
over=length(wave);
end
%%nfft must remain constant at maxwin for the arrays to
%%concatenate properly
[S F T] = spectrogram(wave(start:over), hanning(winsize(i)), [], maxwin, fs);
%%S_cell hold all of the spectrograms
S_cell{i} = S;
%For some reason the time indices were wrong (some were even negative), so I
%manually calculated them
T_cell{i} = linspace(start/fs,over/fs,length(T)+1);
T_cell{i} = T_cell{i}(2:length(T_cell{i})); %%Prevents repeats
end
S_tot = S_cell{1};
T_tot = T_cell{1};
%%This for-loop is separate from the last in order to properly initialize
%%the size of arrays S_tot and T_tot, which is necessary for
%%concatenation
for i = 2:bins
S_tot = [S_tot, S_cell{i}];
T_tot = [T_tot, T_cell{i}];
end
figure;
newbins = length(S(1,:));
%%If specified, convert Y into a HPS
if (hpsonoff ~= 0)
Y = abs(S_tot);
for i = 1:newbins;
Y(:,i) = hps(Y(:,i));
end
displaySpectrogram(Y, F, T);
else
Y = displaydBSpectrogram(S_tot, F, T_tot);
end
end
generateWave()
A synthetic waveform for testing purposes. Margins are just so that the listener can recognize a distinct beginning and ending pitch.
%This function creates and plays a glissando (frequency sweep) starting from
%fmin to fmax over the duration of sweeptime, with margintime specifying
%how much time the signal stays at fmin and fmax before and after the
%glissando (for the convenience of the listener.) The glissando is exponential so that it can be
%percieved as linear. fs is the sampling rate. All input times are in seconds, and all input frequencies are in Hz. The
%function also plots the DFT of the wave.
%If margins are nonzero, there will be audible clicking since the wave
%function is not continuous. Also outputs time indexing x.
function [wave x] = generateWave(fmin, fmax, fs, sweeptime, margintime)
%%This code was used to find the desired angular frequency function,
%%which is the derivative of the argument of the sinusoid. The
%%expression in the sinusoid must then be the integral of the angular
%%frequency function.
%%
%%syms a b c x;
%%int(a*(b/a)^(x/c),x) %%This specifies an exponential function that starts at a and increases to b over time duration c
%%ans = (a*c*(b/a)^(x/c))/log(b/a)
totaltime = 2*margintime + sweeptime; %Total duration of the wave, in seconds
totalsamples = totaltime*fs; %Total duration of the wave, in samples
marginsamples = margintime*fs; %Duration of the margins, in samples
sweepsamples = sweeptime*fs; %Duration of the sweep, in samples
%%Generates sample indexing
x = linspace(0,totaltime,totalsamples);
xmargin = linspace(0,margintime,marginsamples);
xsweep = linspace(0,sweeptime,sweepsamples);
%%This code exists because linspace will return a vector of length 1 even
%%if all its arguments are 0
if(sweeptime == 0)
xsweep = [];
end
if(margintime == 0)
xmargin = [];
end
%%Pre-allocates memory for wave
wave = zeros(1,totalsamples);
%%Generates margin waveform
if(margintime ~= 0)
for i = 1:length(xmargin)
wave(i) = sin(fmin*xmargin(i)*2*pi); %%Starting margin with frequency fmin
wave(i+length(xmargin)+length(xsweep)) = sin(fmax*xmargin(i)*2*pi); %%Ending margin with frequency fmax
end
end
%%Generates sweep waveform
if(sweeptime ~= 0)
for i = 1:length(xsweep)
%%Plugs in the integral of angular frequency function (x 2pi) using the user specified values into
%%sin function
wave(i+length(xmargin)) = sin((fmin*sweeptime*(fmax/fmin)^(xsweep(i)/sweeptime)/log(fmax/fmin))*2*pi);
end
end
[F Y T] = displayDFT(wave, fs);
%%Convenient axes scaling
ymin = 0;
ymax = max(Y);
xmax = fmax+200;
xmin = fmin-200;
if (xmin-200)<0
xmin=0;
end
axis([xmin xmax ymin ymax]);
%%Plays waveform
soundsc(wave, fs);
end
compareToPitches()
Once the signal's pitches are found, they are compared to the specified pitch table using this function.
%%This function takes an input frequency and outputs its nearest pitch in a
%%given table of pitches
function exactpitch = compareToPitches(pitch, pitchtable)
%%In case pitchtable is not sorted in ascending order
sortedpitchtable = sort(pitchtable);
%%Pre-allocates memory for array
midpitchtable = zeros(1,length(sortedpitchtable)-1);
%%Boolean that indicates if the closest pitch to the input frequency has been found
found = 0;
%%Generates a table of "average" pitches from pitchtable to make comparison easier.
%%log2(0.5) is used instead of 0.5 in order to account for logarithmic
%%sound perception.
for i = 1:(length(sortedpitchtable)-1)
midpitchtable(i) = (sortedpitchtable(i+1) - sortedpitchtable(i))*log2(0.5) + sortedpitchtable(i);
end
%%If input frequency is lower than the lowest pitch in the table, output
%%the lowest pitch in the table and set found to true.
if (pitch <= midpitchtable(1))
exactpitch = sortedpitchtable(1);
found = 1;
end
%%If exactpitch has not been found yet, and if input frequency is higher than the highest pitch in the table, output
%%the highest pitch in the table and set found to true.
if (found==0)
if (pitch > midpitchtable(length(midpitchtable)))
exactpitch = sortedpitchtable(length(sortedpitchtable));
found = 1;
end
end
%%If exactpitch has not been found yet, compare it to all values in the
%%midpitchtable and output the corresponding pitch in pitchtable.
i = 1;
while(found==0)
if(pitch > midpitchtable(i) && pitch <= midpitchtable(i+1))
exactpitch = sortedpitchtable(i+1);
found=1;
end
i = i+1;
end
end
pitchCorrector()
Very primitive. I wish I had more time to work on this!
%%Primitive pitch correction function. Requires input STFT, its frequency
%%space, and a table of pitches to compare to. Y can be the HPS of S or any
%%other peak-accentuated sequence.
function S_corrected = pitchCorrector(S, F, Y, pitchtable)
bins = length(S(1,:));
S_len = length(S(:,1));
%%Pre-allocates memory for arrays
maxpeaks = zeros(1,bins);
indices = zeros(1,bins);
pitches = zeros(1,bins);
correctedpitches = zeros(1,bins);
ratio = zeros(1,bins);
S_corrected = zeros(S_len,bins);
%%Finds the maximum amplitudes and their corresponding frequencies in the
%%Fourier spectrum. correctedpitches stores the closest exact pitch to
%%each of these maximum pitches. ratio stores the factor by which each
%%ST-spectrum will have to be shifted in order to obtain the correct
%%pitch.
for k = 1:bins
maxpeaks(k) = max(Y(:,k));
indices(k) = find(Y(:,k) == maxpeaks(k),k,'first');
pitches(k) = F(indices(k));
correctedpitches(k) = compareToPitches(pitches(k), pitchtable);
ratio(k) = correctedpitches(k)/pitches(k);
end
%%This for-loop is separate from the previous for debugging purposes.
%%Utilizes the fact that the frequencies in F are spaced linearly, starting at 0, so that
%%the ratio between frequencies is equivalent to the ratios between sample
%%indices. If ratio > 1, S_corrected will end with a number of zeros. If
%%ratio < 1, S_corrected will contain many repeated values. Either way,
%%high-frequency information will be lost.
for k = 1:bins
for j = 1:S_len
y = round(j/ratio(k));
%%Prevents negative indices
if y < 1
y = 1;
end
%%Non-constant frequency scaling
if y <= S_len
S_corrected(j,k) = S((y),k);
end
end
end
end
M Files