Speech Command detection in audio file

I need to remake “Speech Command Recognition Using Deep Learning” example so I can read audio from the wav file and get time intervals in which the command appears, but I don’t know how to change real-time analysis from microphone into static file analysis in this example.

Thank you for your help.

%% Detect Commands Using Streaming Audio from Microphone
% Test your newly trained command detection network on streaming audio from
% your microphone. If you have not trained a network, then type
% |load('commandNet.mat')| at the command line to load a pretrained network
% and the parameters required to classify live, streaming audio. Try
% saying one of the commands, for example, _yes_, _no_, or _stop_.
% Then, try saying one of the unknown words such as _Marvin_, _Sheila_, _bed_,
% _house_, _cat_, _bird_, or any number from zero to nine.
%%
% Specify the audio sampling rate and classification rate in Hz and create
% an audio device reader that can read audio from your microphone.
fs = 16e3;
classificationRate = 20;
audioIn = audioDeviceReader('SampleRate',fs, ...
'SamplesPerFrame',floor(fs/classificationRate));
%%
% Specify parameters for the streaming spectrogram computations and
% initialize a buffer for the audio. Extract the classification labels of
% the network. Initialize buffers of half a second for the labels and
% classification probabilities of the streaming audio. Use these buffers to
% compare the classification results over a longer period of time and by
% that build 'agreement' over when a command is detected.
frameLength = frameDuration*fs;
hopLength = hopDuration*fs;
waveBuffer = zeros([fs,1]);
labels = trainedNet.Layers(end).Classes;
YBuffer(1:classificationRate/2) = categorical("background");
probBuffer = zeros([numel(labels),classificationRate/2]);
framesNumber = audio/frameLength;%%
% Create a figure and detect commands as long as the created figure exists.
% To stop the live detection, simply close the figure.
h = figure('Units','normalized','Position',[0.2 0.1 0.6 0.8]);
while ishandle(h)

% Extract audio samples from the audio device and add the samples to
% the buffer.
x = audioIn();
waveBuffer(1:end-numel(x)) = waveBuffer(numel(x)+1:end);
waveBuffer(end-numel(x)+1:end) = x;

% Compute the spectrogram of the latest audio samples.
spec = auditorySpectrogram(waveBuffer,fs, ...
'WindowLength',frameLength, ...
'OverlapLength',frameLength-hopLength, ...
'NumBands',numBands, ...
'Range',[50,7000], ...
'WindowType','Hann', ...
'WarpType','Bark', ...
'SumExponent',2);
spec = log10(spec + epsil);

% Classify the current spectrogram, save the label to the label buffer,
% and save the predicted probabilities to the probability buffer.
[YPredicted,probs] = classify(trainedNet,spec,'ExecutionEnvironment','cpu');
YBuffer(1:end-1)= YBuffer(2:end);
YBuffer(end) = YPredicted;
probBuffer(:,1:end-1) = probBuffer(:,2:end);
probBuffer(:,end) = probs';

% Plot the current waveform and spectrogram.
subplot(2,1,1);
plot(waveBuffer)
axis tight
ylim([-0.2,0.2])

subplot(2,1,2)
pcolor(spec)
caxis([specMin+2 specMax])
shading flat

% Now do the actual command detection by performing a very simple
% thresholding operation. Declare a detection and display it in the
% figure title if all of the following hold:
% 1) The most common label is not |background|.
% 2) At least |countThreshold| of the latest frame labels agree.
% 3) The maximum predicted probability of the predicted label is at
% least |probThreshold|. Otherwise, do not declare a detection.
[YMode,count] = mode(YBuffer);
countThreshold = ceil(classificationRate*0.2);
maxProb = max(probBuffer(labels == YMode,:));
probThreshold = 0.5;
subplot(2,1,1);

if YMode == “background” || count

NOTE:-

Matlabsolutions.com provide latest MatLab Homework Help,MatLab Assignment Help for students, engineers and researchers in Multiple Branches like ECE, EEE, CSE, Mechanical, Civil with 100% output.Matlab Code for B.E, B.Tech,M.E,M.Tech, Ph.D. Scholars with 100% privacy guaranteed. Get MATLAB projects with source code for your learning and research.

Here is a sketch of how you can achieve this:

1) Read the contents of the audio file

[audioIn,fs] = audioread('filename.wav');

2) Split the signal into 1-second chunks, with overlap between consecutive chunks. The higher the overlap, the higher your resolution (i.e. how close you will be able to detect where the keyword occured)

y = buffer(audioIn, fs, round(3*fs/4));

3) Convert each chunk (each column of y) to a mel spectrogram, and classify it using the network. Based on the network results, you will get an estimate of where the word occured in your wav file (each chunk advances by 1 sec — overlap, so in the this example, it’s ~250 ms.

SEE COMPLETE ANSWER CLICK THE LINK

--

--

--

Simple! That is me, a simple person. I am passionate about knowledge and reading. That’s why I have decided to write and share a bit of my life and thoughts to.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Some Strategies for Machine Learning Projects

Compute performance metrics-F1 score, Precision, Accuracy for CNN in FastAI

https://www.run.ai/guides/deep-learning-for-computer-vision/deep-convolutional-neural-networks

ALEXA and the Technology Behind it

Machine Learning Operations in the Current Era

NLP and NLU Tools for Chatbots

Time Series Modeling and Deep Reinforcement Learning (DRL) on High-Frequency Crypto Exchange Data

Image Processing Library to Ease Differentiation of Colors for People with Colorblindness

Digital Enablement of Healthcare Payer Processes using Machine Learning

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Technical Source

Technical Source

Simple! That is me, a simple person. I am passionate about knowledge and reading. That’s why I have decided to write and share a bit of my life and thoughts to.

More from Medium

dentifying and marking image regions

UNMANNED AIR VEHICLES(UAV) DETECTION with YOLOv3 -Anıl ÖZEN

Face Detection in Videos

Fun With AI: Create Figures ‘Magically’ on Screen Using MediaPipe and OpenCV