SwiftUI: Building a simple Shazam Clone using ShazamKit

So, at WWDC21, Apple released ShazamKit, a tool that lets you enrich your app experience with audio recognition, enabling your users to find out a song’s name, who sang it, the genre, and more.

The original Shazam app has this capability, but here is apple giving you the power to bring this capability into your own personal apps.

Here is a step by step process involved in building a simple Shazam clone (as seen below) using the newly introduced ShazamKit.

NB: UI is built using SwiftUI, but easy to follow through if you are new to SwiftUI.

ShazamClone Screen Recording

Requirements

The requirements are simple. ShazamKit is currently available on iOS 15. Building for iOS 15 requires Xcode 13 and Mac OS 11.3 or higher is required to use Xcode 13. Also, you need a physical device to test with.

The Process

Clone the repo from Github so that you can follow through easily.


I have a SwiftUI view (ShazamView) containing a number of views, most importantly a record button you press to get the app to listen to music around you. Tapping on the button calls a startListening() function in the ShazamViewModel to begin the listening process.

The startListening() function goes through a couple of steps:

1. Requesting Record (Microphone) Permission

Here, the app checks for record permission, if allowed, another function is further called to begin the recording process. If not allowed, the app requests that the user allow microphone access. This is not the major focus of this blog post, so I’m just going to skip the details of that here.

class ShazamViewModel: NSObject, ObservableObject {
    // ...
    func startListening() {
        let audioSession = AVAudioSession.sharedInstance()
        switch audioSession.recordPermission {
        case .undetermined:
            requestRecordPermission(audioSession: audioSession)
        case .denied:
            viewState = .recordPermissionSettingsAlert
        case .granted:
            DispatchQueue.global(qos: .background).async {
                self.proceedWithRecording()
            }
        @unknown default:
            requestRecordPermission(audioSession: audioSession)
        }
    }
}

2. Setting up Audio Engine for recording

In the proceedToRecord() button, the app is switched into a ‘loading’ state using the Ripple animation described in a separate blog post. Here also, AudioEngine needed to enable the app to continually record sounds is set up.

class ShazamViewModel: NSObject, ObservableObject {
    // ...
    private func proceedWithRecording() {
        DispatchQueue.main.async {
            self.viewState = .recordingInProgress
        }
        if audioEngine.isRunning {
            stopRecording()
            return
        }
        let inputNode = audioEngine.inputNode
        let recordingFormat = inputNode.outputFormat(forBus: .zero)
        inputNode.removeTap(onBus: .zero)
        inputNode.installTap(onBus: .zero, bufferSize: 1024, format: recordingFormat) { [weak self] buffer, time in
            print("Current Recording at: \(time)")
        }
        audioEngine.prepare()
        do {
            try audioEngine.start()
        } catch {
            print(error.localizedDescription)
        }
    }
}

3. Asking ShazamKit to match recordings with songs in the catalog

ShazamKit requires an instance of SHSession(), from which we can call a function required to match the recording with songs in the catalog. It also has two delegate functions within the SHSessionDelegate, one is triggered when a match is found and the other is triggered when there is an error or no match is found. It’s as simple as that.

class ShazamViewModel: NSObject, ObservableObject {
    // ...
    private func proceedWithRecording() {
        // ...
        inputNode.installTap(onBus: .zero, bufferSize: 1024, format: recordingFormat) { [weak self] buffer, time in
            print("Current Recording at: \(time)")
            self?.session.matchStreamingBuffer(buffer, at: time) // <---- Here 
        }
        // ...
    }
}
extension ShazamViewModel: SHSessionDelegate {
    func session(_ session: SHSession, didFind match: SHMatch) {
        guard let firstMatch = match.mediaItems.first else {
            return
        }
        stopRecording()
        let song = Song(
            title: firstMatch.title ?? "",
            artist: firstMatch.artist ?? "",
            genres: firstMatch.genres,
            artworkUrl: firstMatch.artworkURL,
            appleMusicUrl: firstMatch.appleMusicURL
        )
        DispatchQueue.main.async {
            self.viewState = .result(song: song)
        }
    }
    func session(_ session: SHSession, didNotFindMatchFor signature: SHSignature, error: Error?) {
        print(error?.localizedDescription ?? "")
        stopRecording()
        DispatchQueue.main.async {
            self.viewState = .noResult
        }
    }
}

The rest of the codebase consists of the Song struct that the Shazam match is mapped into, a SongDetailView where the song details are displayed to the user, a couple of animations to improve the app experience and a couple of alerts displayed to the user as needed.

Custom Catalog

You may ask, “So is building a Shazam clone the only thing I can do with ShazamKit?” The answer is no.

Even though ShazamKit matches songs from the default music catalog, you can pass in your own custom-built catalog for ShazamKit to use during lookup. With this, you can build various applications like games, learning resources, etc.

Android?

ShazamKit is not limited to iOS only, it is also available for android and lets you add audio recognition to your Android apps.


With this simple Shazam clone, I’m sure you can agree with me that using ShazamKit is pretty easy. Why don’t you check it out and get your hands dirty building something with it.

Cheers!