The Behringer UCA-222 USB audio adapter arrived. For just a couple dollars more and having a larger footprint the audio recording is a world of difference better and using a standard 1/8 Tip Ring Shield (TRS) to RCA cable too!
Note: As an Amazon Associate I earn from qualifying purchases.
This is how the new recordings sound:
A quick test with Google Speech to text API using model “phone_call” results:
Text: 2621 2601 respond to the Safeway 3333 Arapahoe Road for a med check non emergent with 1:00
When switched to the “video” model the result:
Text: 26:21 2601 respond to the Safeway 3333 Arapaho Road for a med check non-emergent with lock it off
I am taking a break from coding for a few days but thought this was worth updating.
Dispatch Tone Out Decoder Part 7: Testing a New Method to Detect Two Tone Dispatches
The Google speech to text detection is not work great at the moment, it misses several words and confuses others, maybe with a better audio recording it will do better. I want to test it again with the Behringer UCA222 adapter when it arrives. I am hoping the UCA222 will give a better quality audio recording, but we will see later.
While waiting for the new Behringer UCA222 USB audio adapters to arrive, I am trying a new detection method for the Two Tone detection. The original routine works pretty well but is not 100%, it will miss a tone out detection about 1 out of 15-20 times.
The original method worked by seeking out the peak frequency then checking how many times it occurred based on timing. With this method there always is a peak frequency in every FFT data set, so there are many checks to test if a valid tone and every now and then it misses for one reason or another. I have ideas how to improve this method too but I want to try a new idea.
The new method first takes several data chunks from the audio stream, performs an FFT on each chunk, then sums the FFT data sets together. If a single tone is happening it adds to the single bin in the summed data, noise will average to some level since it will randomly populate bins in each data set. Summing the FFT sets together pulls the single tone bin up and averages non repetitive frequencies down in the surrounding bins relative to the tone.
When a tone out is sent out, the transmission is usually sent out with no overlapping traffic. If we look at a plot the FFT data of actual recordings from the scanner we see for the first or second part of the tone something like this, a single bin with some fill spilling into the bins just below and above the main bin:
A voice transmission should look similar to this with many bins having varying amplitude:
Searching for a single tone is accomplished by testing a peak tone against surrounding bins to see if there are any other bins that have relatively high levels or not. If the surrounding bins are at low levels then a single tone is flagged as a possible tone out frequency. I started coding this up yesterday and already without a lot of debug it is working at detecting tone outs about as well as the original method that took a week to get to the point it is at today.
I left both running in parallel yesterday listening to the same audio source, this is from the original method:
2300: Boulder Rural Fire : Tone out on 01-04-23 at 00:36:58
1: 948.4Hz 2:1983.8Hz C1:15 C2: 15
3100: Boulder Emergency Squad (Group 2) : Tone out on 01-04-23 at 00:37:01
1: 1035.3Hz 2:1984.0Hz C1:17 C2: 15
****** Tone out on 01-04-23 at 00:37:02
1: 1984.2Hz 2:1031.2Hz False: 0
c1: 1031.2077625728903 c2: 15
American Medical Response : Tone out on 01-04-23 at 00:37:06
1: 1405.3Hz 2:1530.8Hz C1:14 C2: 15
2600: Lafayette Fire Department : Tone out on 01-04-23 at 01:35:23
1: 947.7Hz 2:1345.2Hz C1:16 C2: 15
****** Tone out on 01-04-23 at 01:52:59
1: 1096.6Hz 2:1343.1Hz False: 0
c1: 1343.1419031575342 c2: 15
2600: Lafayette Fire Department : Tone out on 01-04-23 at 03:51:24
1: 948.1Hz 2:1337.9Hz C1:11 C2: 15
2600: Lafayette Fire Department : Tone out on 01-04-23 at 05:02:34
1: 1083.1Hz 2:1328.5Hz C1:15 C2: 15
2600: Lafayette Fire Department : Tone out on 01-04-23 at 05:02:39
1: 1083.6Hz 2:1343.6Hz C1:16 C2: 15
2600: Lafayette Fire Department : Tone out on 01-04-23 at 05:18:23
1: 948.6Hz 2:1344.4Hz C1:14 C2: 15
2200: Mountain View Fire St.-6 : Tone out on 01-04-23 at 05:18:28
1: 1129.9Hz 2:868.4Hz C1:11 C2: 15
2700: Louisville Ambulance : Tone out on 01-04-23 at 06:39:08
1: 1674.6Hz 2:1120.9Hz C1:13 C2: 15
2200: Mountain View Fire : Tone out on 01-04-23 at 06:49:04
1: 1497.9Hz 2:1086.1Hz C1:14 C2: 15
And this is the new method, still needs a little debug but not bad for day one testing:
2300: Boulder Rural Fire : Tone out on 01-04-23 at 00:36:57
1: 949.6Hz @:11.3 , 2:1989.1Hz @:1.9
3100: Boulder Emergency Squad (Group 2) : Tone out on 01-04-23 at 00:37:00
1: 1035.8Hz @:11.9 , 2:1988.9Hz @:1.9
3100: Boulder Emergency Squad (Group 1) : Tone out on 01-04-23 at 00:37:03
1: 1026.8Hz @:7.2 , 2:1250.9Hz @:0.5
2600: Lafayette Fire Department : Tone out on 01-04-23 at 01:35:22
1: 942.9Hz @:10.5 , 2:1334.6Hz @:1.7
2600: Lafayette Fire Department : Tone out on 01-04-23 at 03:51:23
1: 946.1Hz @:15.4 , 2:1355.8Hz @:2.7
2600: Lafayette Fire Department : Tone out on 01-04-23 at 05:18:22
1: 951.2Hz @:2.1 , 2:1345.6Hz @:7.2
2200: Mountain View Fire St.-6 : Tone out on 01-04-23 at 05:18:27
1: 1130.7Hz @:12.4 , 2:861.1Hz @:6.2
2700: Louisville Ambulance : Tone out on 01-04-23 at 06:39:08
1: 1670.0Hz @:8.1 , 2:1131.9Hz @:2.2
2200: Mountain View Fire : Tone out on 01-04-23 at 06:49:04
1: 1518.0Hz @:4.1 , 2:1096.7Hz @:13.8
2300: Boulder Rural : Tone out on 01-04-23 at 06:49:44
1: 948.3Hz @:6.5 , 2:1529.3Hz @:2.2
The line below each department tone out is for debug purposes, on the new method it shows the relative amplitude of each tone, the original method shows counts of tone occurrence in given time periods
Update: After leaving this running for a day I noticed that it would still miss random tone outs, and while tone out usually go out without overlapping audio there still are a percentage that do have either TX chirps or are mixed with some voice. I will think about this some more….
Dispatch Tone Out Decoder Part 6: Using a Cheap USB Sound Card
The USB sound card I ordered from Amazon finally arrived. I ordered this $7.99 JSAUX USB interface. I specifically picked this device because it had okay reviews and a nice form factor with an aluminum case. The small size was the major decision factor on this purchase for when/if the project ever gets put on a Raspberry Pi coupled with a Baofeng UV-5R radio, I want the solution to be very small.
Note: As an Amazon Associate I earn from qualifying purchases
I was excited to get this project hooked up to the Pro-2052 scanner so that I did not have the silly overlapping Sheriff’s dispatch and Fire dispatch that was on the Broadcastify channel. I used a 1/8 male to male tip ring sleeve (TRS) audio patch cable directly from the headphone output on the scanner to the mic input on the USB card. The USB sound card was plug and ‘play’ as far as Windows 10 recognizing it, but then the problems:
Mic Muting Windows 10
Unable to decode tones
Debugging Windows microphone muting
When the scanner was connected to the USB mic port there was no detectable audio. I switched the scanner to listen to weather so that I had a constant audio output to troubleshoot this. Turning the volume up on the scanner and still no audio.
Next I opened the sound settings by right clicking the speaker on the task bar lower right corner:
The input was correct as ‘Microphone (USB Audio) was the input device, but still no movement on the Test your microphone bar
I clicked the Device properties for the Microphone , everything looked correct:
Next I clicked Additional device properties under Related Settings
The input was muted, that is weird?!
I unmuted it, and the Test your microphone bar was moving only for a second and then stopped again. The Microphone properties once again showed the mic was muted. WTF?! I spent a little time playing around with everything, moving to a different USB port etc.. Finally after a little troubleshooting I realized that Windows does not like it when the microphone input that hits the peak on input volume, it will mute the mic input automatically without any hesitation., without notification and keeps the mic in a muted state until it is manually unselected. What engineer at Microsoft thought that this was a good idea?!! 😒(Note for Microsoft engineers that may care about customer experience 🤣: How about mute for the peak input then unmute the mic when the input returns to a less than peak level 🤔 ) With the volume on the scanner turned down the auto muting went away. I really look forward to moving this project to Ubuntu.
Distorted recorded audio
When listening to the recorded audio from the scanner it was very distorted no matter what the settings were for the microphone level or scanner volume, so I immediately thought that it was some mismatch between the mic input and scanner headphone output. To listen to the microphone input with the speakers, you can loop back the connection with this setting, it plays audio after applying the setting:
To address the mismatch I went digging through my junk drawers looking to see if I had any audio isolation transformers that I could use to isolate the two, nope nothing. I tried winding a couple of transformers by hand and checked the response using a frequency generator to sweep through the audio range of 100-20,000Hz and an o-scope to monitor the output. Not having the right core material for the hand wound audio transformer it was a no go as the low frequency response was rolling off too much. I can order an audio isolation transformer and wait a week but I want to get this project moving. I had a Triad toroid power isolation transformer. I figured what the heck I’ll try it on the o-scope and it had a very flat response all throughout the audio range. I cut the 1/8 inch TRS patch cable in half inserted the transformer, a load resistor of 33 Ohms in parallel and a 1k Ohm series on the microphone input side of the transformer. I picked the resistor values with a little experimenting of listening to what the audio sounded like with different values.
The recorded audio is okay at this point. The isolation setup looks very silly but for now the project is moving forward once again:
I am going to order some small audio isolation transformers to experiment with when they arrive.
Okay on to decoding tones with the code that was working great on the Broadcastify channel as long as there was no overlapping audio…
The decoder code was not working at all on the scanner audio, so back to debugging once again. Well it turns out after spending a few hours debugging that the input on the mic and the output of the scanner had to be tweaked to get that volume in the sweet spot for the FFT portion of the code to pick the tones out. A few more tweaks were made to make it more robust since the received audio from the scanner up in the mountains at a long distance from the transmitting antenna is different audio than what is picked up on the Broadcastify channel.
For example this is recorded audio from the scanner output using the cheap USB sound card:
Decoding the two tone, fire tone outs is working great once again :
5100: Pinewood Springs Protection District : Tone out on 01-02-23 at 06:36:30
1: 951.2Hz 2:1821.8Hz C1:15 C2: 8
5100: Pinewood Springs Protection District : Tone out on 01-02-23 at 06:36:34
1: 949.8Hz 2:1819.6Hz C1:11 C2: 8
5000: Big Elk Meadows Protection District : Tone out on 01-02-23 at 06:36:39
1: 1129.4Hz 2:1228.7Hz C1:16 C2: 8
2200: Mountain View Fire St.-1 : Tone out on 01-02-23 at 06:42:06
1: 1501.0Hz 2:864.8Hz C1:15 C2: 8
2600: Lafayette Fire Department : Tone out on 01-02-23 at 07:13:19
1: 948.9Hz 2:1341.4Hz C1:15 C2: 8
2200: Mountain View Fire St.-1 : Tone out on 01-02-23 at 07:32:06
1: 1502.5Hz 2:868.8Hz C1:14 C2: 8
2200: Mountain View Fire St.-1 : Tone out on 01-02-23 at 07:40:48
1: 1503.3Hz 2:868.7Hz C1:13 C2: 8
2600: Lafayette Fire Department : Tone out on 01-02-23 at 07:46:44
1: 948.5Hz 2:1341.4Hz C1:17 C2: 8
2300: Boulder Rural : Tone out on 01-02-23 at 07:46:48
1: 948.7Hz 2:1529.2Hz C1:15 C2: 8
I have a new idea that I want to try that will make the detection better and much more streamlined. The entire detection routine will have to be redone from scratch as it will use a ‘rotating’ FTT summing method. Essentially it will be a sliding window that ‘rolls’ across the audio stream summing the FFT on each chunk together to really pop up unique tones and flatten everything that is not unique in the signal. I have been thinking about for a few days, but this will be for a later project for the Mad Scientist Hut.
I have also ordered a couple of the $10 Behringer UCA222 USB audio adapters to see how they ‘sound’ they are a little big for what I wanted in the final project but we will see if the audio is better when they arrive. Back to getting the speech to text conversion code working..
Dispatch Tone Out Decoder Part 5: Adding Speech to Text Conversion in Python
I want the scanner tone out decoder to include an actual real time transcribed audio text displayed in line with the departments toned out so that I can easily see if there is something going on that is important with just a quick skim of the output display. Another great feature I can incorporate is checking for trigger keywords that will alert me if there is really something that needs immediate attention, such as a wildland fire in my area.
I have been researching for most of today looking to see what is available for converting speech to text and there are several options. I have chosen to go down the path of using the Google Cloud API. Mainly because they have a 60 minute free transcription per month, they include a $300 credit within a 90 day trial to use for any of the APIs, and finally because once you go over 60 minutes per month of transcription the cost is only 2.4 cents per minute. So it is worth trying out.
After I setup my Google Cloud account, I had trouble figuring out how to really get started until I happened across this : https://www.hellocodeclub.com/python-speech-recognition-create-program-with-google-api/ . This tutorial worked up until the point of adding in an environment variable. My main Python debug setup is still running on Windows 10. I saved my Google Credentials JSON file to my C:\ directory for ease of editing the Environment Variable. Add the System Environment Variable for the JSON file location by hitting the Window key and typing in ‘environment’. The search brings up ‘edit environment variables control panel’ click that.
Click Environment Variables…
Add GOOGLE_APPLICATION_CREDENTIALS in the Variable name
Add your JSON file name and location in the Variable value:
Restart your Python Idle session for the environment variables to take effect.
I ran the code in the tutorial against a snippet of recorded audio from the Broadcastify Boulder County dispatch channel:
The output is not 100% correct but is good enough to see what is going on:
Transcript: 2621 2601 respond with Lafayette please to the outside entrance of Exempla Good Sam 200 Exempla Circle
The next step will be to integrate this into the tone out decoder display.