Audio

Dive into the technical aspects of audio on your device, including codecs, format support, and customization options.

Post

Replies

Boosts

Views

Activity

How can third-party iOS apps obtain real-time waveform / spectrogram data for Apple Music tracks (similar to djay & other DJ apps)?

Hi everyone, I’m working on an iOS MusicKit app that overlays a metronome on top of Apple Music playback. To line the clicks up perfectly I’d like access to low-level audio analysis data—ideally a waveform / spectrogram or beat grid—while the track is playing. I’ve noticed that several approved DJ apps (e.g. djay, Serato, rekordbox) can already: • Display detailed scrolling waveforms of Apple Music songs • Scratch, loop or time-stretch those tracks in real time That implies they receive decoded PCM frames or at least high-resolution analysis data from Apple Music under a special entitlement. My questions: 1. Does MusicKit (or any public framework) expose real-time audio buffers, FFT bins, or beat markers for streaming Apple Music content? 2. If not, is there an Apple program or entitlement that developers can apply for—similar to the “DJ with Apple Music” initiative—to gain that deeper access? 3. Where can I find official documentation or a point of contact for this kind of request? I’ve searched the docs and forums but only see standard MusicKit playback APIs, which don’t appear to expose raw audio for DRM-protected songs. Any guidance, links or insider tips on the proper application process would be hugely appreciated! Thanks in advance.

Media Technologies Audio

651

Dec ’25

Start and stop recording Voice Memos with Siri

using iOS 26.2; Airpods 4 Long press stem to launch Siri Speak "Record Voice Memo" -> Recording starts Recording in progress... Long press stem to launch Siri -> Nothing happens. To stop recording need use phone. is this intended behaviour? i would like to be able to stop recording with Siri I am able to launch Siri from phone while recording, but point is to keep phone in pocket and start/stop recordings only via Airpods.

Media Technologies Audio Siri and Voice

191

Dec ’25

AVAudioSession : Audio issues when recording the screen in an app that changes IOBufferDuration on iOS 26.

Among Japanese end users, audio issues during screen recording—primarily in game applications—have become a topic of discussion. We have confirmed that the trigger for this issue is highly likely to be related to changes to IOBufferDuration. When using setPreferredIOBufferDuration and the IOBufferDuration is set to a value smaller than the default, audio problems occur in the recorded screen capture video. Audio playback is performed using AudioUnit (RemoteIO). https://developer.apple.com/documentation/avfaudio/avaudiosession/setpreferrediobufferduration(_:)?language=objc This issue was not observed on iOS 18, and it appears to have started occurring after upgrading to iOS 26. We provide an audio middleware solution, and we had incorporated changes to IOBufferDuration into our product to achieve low-latency audio playback. As a result, developers using our product as well as their end users are being affected by this issue. We kindly request that this issue be investigated and addressed in a future update. “This document has been translated by AI. The original text is included below for reference.” 日本のエンドユーザー間で主にゲームアプリケーションにおける画面収録時の音声の問題が話題になっています。こちらの症状のトリガーが、IOBufferDurationの変更によるものである可能性が高いことを確認しました。 setPreferredIOBufferDurationを使用し、IOBufferDurationがデフォルトより小さい状態の時、画面収録された動画の音声に問題が発生することをしています。音声の再生にはAudioUnit(RemoteIO)を使用しています。 https://developer.apple.com/documentation/avfaudio/avaudiosession/setpreferrediobufferduration(_:)?language=objc iOS 18ではこのような問題は確認されておらず、iOS26になってから問題が発生しているようです。私たちはオーディオミドルウェアを提供しており、低遅延の再生のためにIOBufferDurationの変更を製品に組み込んでいました。そのため、弊社製品をご利用いただいている開発者およびエンドユーザーの皆様がこの不具合の影響を受けています。こちらの不具合の調査及び修正対応を検討いただけますでしょうか。

Media Technologies Audio

613

Audio DSP Processing Issue / Metallic Ringing Artifacts when recording acoustic instruments on iPhone 17 Pro Max

Description: I have identified a specific issue when recording acoustic guitar and other instruments on the iPhone 17 Pro Max using native applications (Voice Memos, Camera). The recordings contain an unnatural metallic resonance (ringing artifacts) that should not be present. Testing and Methodology: Hardware Verification: Initially, I suspected a hardware defect in the audio chip or microphone. However, extensive testing with third-party software suggests this is likely a software-level issue. AudioShare Test: I conducted a test using the AudioShare app in "Measurement Mode" (which bypasses standard iOS system-wide audio processing). In this mode, the audio remains perfectly clean, and the metallic ringing disappears entirely. Conclusion: The issue is rooted in the DSP (Digital Signal Processing) algorithms that iOS applies for noise suppression or voice enhancement. These algorithms appear to misinterpret the high-frequency overtones of acoustic instruments as background noise and attempt to "filter" them, resulting in audible digital artifacts. Comparison Results: This issue has not been observed on devices from other brands or on older iPhone models (preliminary tests suggest older versions handle this better). Notably, the problem persists even in GarageBand, as the app still utilizes certain system-level processing layers. Proposed Solution: I suggest adding a "Raw Audio" or "Instrument Mode" toggle within the Microphone/Audio settings for native apps. This mode should disable aggressive DSP processing, similar to how the AVAudioSession.Mode.measurement works in specialized apps. Attachments: I am attaching 4 archives, including a final "Measurement Mode" folder with comparative samples (Measurement Mode vs. Standard Mode). The artifacts are most prominent when monitored through headphones.

Media Technologies Audio Sound Analysis

168

Jan ’26

Unable to play audio via MusicKit

Hey folks, I'm running into an odd issue suddenly with an app that had a working MusicKit integration before. I'm using ApplicationMusicPlayer to play Apple Music albums and songs. I'm testing on a physical device, signed in to Apple ID, and with a valid subscription. Apple Music via the first-party app works entirely fine on this device. Attempting to play back any content at all gives the log: <ICUserIdentityStoreACAccountBackend: 0x1070bf3e0> Failed to initialize primary apple account, error=Error Domain=ICError Code=-7013 "Client is not entitled to access account store" UserInfo={NSDebugDescription=Client is not entitled to access account store} [ICUserIdentityStore] - initializing account histories with activeAccountDSID = nil, activeLockerAccountDSID = nil, timestamp = 14605951908 [ICUserIdentityStore] Failed to fetch local store account with error: Error Domain=ICError Code=-7013 "Client is not entitled to access account store" UserInfo={NSDebugDescription=Client is not entitled to access account store}. The album artwork, track names, etc, all appear in the control center playback controls, but the music doesn't play. Trying to trigger playback with control center just results in it skipping to the next track, which doesn't play either. This exact code used to work. I have the MusicKit service selected in Apple Connect. Since this isn't entitlement-based, I'm not sure how else to check that I'm set up correctly. I've tried deleting/reinstalling the app, restarting the device, cleaning/rebuilding, and deleting DerivedData, to no avail. Any help? Running Xcode 16.4 (16F6), testing on iOS 18.5 (22F76)

Media Technologies Audio MusicKit

182

Jun ’25

AVAudioSession.outputVolume does not reflect system volume changes made while app is in background

I have a question regarding the behavior of AVAudioSession.sharedInstance().outputVolume. Observed behavior: When the app is in the foreground, I read audioSession.outputVolume (for example, 0.1). The app is then moved to the background. While the app is in the background, the user changes the system volume using the hardware buttons (for example, to 0.5). When the app returns to the foreground, audioSession.outputVolume still reports the previous value (0.1). From my testing, outputVolume only seems to update when the system volume is changed while the app is in the foreground. Volume changes made while the app is in the background are not reflected when the app returns to the foreground. Questions: According to Apple’s documentation for AVAudioSession.outputVolume: “The systemwide output volume set by the user.” https://developer.apple.com/documentation/avfaudio/avaudiosession/outputvolume However, based on our testing on iOS 18.6.2 and iOS 18.1, the observed behavior seems to differ from this description. Questions: The documentation states that outputVolume represents the system-wide volume set by the user. In our testing, the value does not reflect volume changes made while the app is in the background and only updates when the app is in the foreground.Is this the expected behavior of AVAudioSession.outputVolume? Is there any other recommended way in Swift to retrieve the current system volume that reflects user changes made both while the app is in the foreground and while it is in the background? Any clarification on the intended behavior or recommended handling would be greatly appreciated.

Media Technologies Audio Swift Frameworks AVAudioSession AVFoundation

290

Lock screen media controls for MusicKit/ ApplicationMusicPlayer

Hi, when using ApplicationMusicPlayer from MusicKit my app automatically gets the media controls on the lock screen: Play/ Pause, Skip Buttons, Playback Position etc. I would like to customize these. Tried a bunch of things, e.g. using MPRemoteCommandCenter. So far I haven't had any success. Does anyone know how I can customize the media controls of ApplicationMusicPlayer. Thank you.

Media Technologies Audio MusicKit

536

Sep ’25

AVAudioSession.outputVolume not reporting correctly in iOS 18+ devices

I’m using the shared instance of AVAudioSession. After activating it with .setActive(true), I observe the outputVolume, and it correctly reports the device’s volume. However, after deactivating the session using .setActive(false), changing the volume, and then reactivating it again, the outputVolume returns the previous volume (before deactivation), not the current device volume. The correct volume is only reported after the user manually changes it again using physical buttons or Control Center, which triggers the observer. What I need is a way to retrieve the actual current device volume immediately after reactivating the audio session, even on the second and subsequent activations. Disabling and re-enabling the audio session is essential to how my application functions. I’ve tested this behavior with my colleagues, and the issue is consistently reproducible on iOS 18.0.1, iOS 18.1, iOS 18.3, iOS 18.5 and iOS 18.6.2. On devices running iOS 17.6.1 and iOS 16.0.3, outputVolume correctly reflects the current volume immediately after calling .setActive(true) multiple times.

Media Technologies Audio Audio AVAudioSession

359

Feb ’26

iOS 17 camera capture assertions and issues

Hello, Starting in iOS 17, our application started having some issue publishing to our video session. More specifically the video capture seems to be broken in some, but not all sessions. What's troubling is that we're seeing that it fails consistently every 4 sessions. It also fails silently, without reporting any problems to the app. We only notice that there are no frames being rendered or sent to the remote devices. Here's what shows-up in the console: <<<< FigCaptureSourceRemote >>>> Fig assert: "! storage->connectionDied" at bail (FigCaptureSourceRemote.m:235) - (err=0) <<<< FigCaptureSourceRemote >>>> Fig assert: "err == 0 " at bail (FigCaptureSourceRemote.m:253) - (err=-16453) Anyone seeing this? Any idea what could be the cause? Our sessions work perfectly on iOS16 and below. Thanks

Media Technologies Audio AVFoundation

1.4k

Oct ’25

Question about PT Framework channel tone behaviour

I've been wondering if there is a way to modify or even disable tones for indicating channel states. The behaviour regarding tones seems like a black box with little documentation. During migration to Apple's PT Framework we've noticed that there are few scenarios where a tone is played which doesn't match certain certifications. For example; moving from a channel to another produces a tone which would fail a test case. I understand the reasoning fully, as it marks that the channel is ready to transmit or receive, but this doesn't mirror the behaviour of TETRA which would be wanted in this case. I'm also wondering if there would be any way to directly communicate feedback regarding PT Framework?

Media Technologies Audio Push To Talk

422

Oct ’25

Failure on attempt to import track as spatial audio

I'm working on a project to support spatial audio editing, using this sample project as a reference: https://developer.apple.com/documentation/Cinematic/editing-spatial-audio-with-an-audio-mix This sample works well on an unedited capture, but does not work for a capture that has already been edited. The failure is occurring at "let audioInfo = try await CNAssetSpatialAudioInfo(asset: myAsset)", which is throwing "no eligible audio tracks in asset". I also find that for already edited captures, if i use CNAssetSpatialAudioInfo.assetContainsSpatialAudio, it returns false. What i mean by "already edited" is that if I take a spatial capture with my iPhone 16, and then edit that capture in the Photos app using the Cinematic effect, and then save the edited output (e.g. edited_capture.mov), I can't import that edited_capture.mov into my project as a spatial audio asset. Is this intentional behavior or a bug? If it's intentional, can you describe why?

Media Technologies Audio

165

Sep ’25

Graceful shutdown during background audio playback.

Hello. My team and I think we have an issue where our app is asked to gracefully shutdown with a following SIGTERM. As we’ve learned, this is normally not an issue. However, it seems to also be happening while our app (an audio streamer) is actively playing in the background. From our perspective, starting playback is indicating strong user intent. We understand that there can be extreme circumstances where the background audio needs to be killed, but should it be considered part of normal operation? We hope that’s not the case. All we see in the logs is the graceful shutdown request. We can say with high certainty that it’s happening though, as we know that playback is running within 0.5 seconds of the crash, without any other tracked user interaction. Can you verify if this is intended behavior, and if there’s something we can do about it from our end. From our logs it doesn’t look to be related to either memory usage within the app, or the system as a whole. Best, John

Media Technologies Audio AVAudioSession Core Audio

137

Jun ’25

iOS Speech Error on Mobile Simulator (Error fetching voices)

I'm writing a simple app for iOS and I'd like to be able to do some text to speech in it. I have a basic audio manager class with a "speak" function: import Foundation import AVFoundation class AudioManager { static let shared = AudioManager() var audioPlayer: AVAudioPlayer? var isPlaying: Bool { return audioPlayer?.isPlaying ?? false } var playbackPosition: TimeInterval = 0 func playSound(named name: String) { guard let url = Bundle.main.url(forResource: name, withExtension: "mp3") else { print("Sound file not found") return } do { if audioPlayer == nil || !isPlaying { audioPlayer = try AVAudioPlayer(contentsOf: url) audioPlayer?.currentTime = playbackPosition audioPlayer?.prepareToPlay() audioPlayer?.play() } else { print("Sound is already playing") } } catch { print("Error playing sound: \(error.localizedDescription)") } } func stopSound() { if let player = audioPlayer { playbackPosition = player.currentTime player.stop() } } func speak(text: String) { let synthesizer = AVSpeechSynthesizer() let utterance = AVSpeechUtterance(string: text) utterance.voice = AVSpeechSynthesisVoice(language: "en-GB") synthesizer.speak(utterance) } } And my app shows text in a ScrollView: ScrollView { Text(self.description) .padding() .foregroundColor(.black) .font(.headline) .background(Color.gray.opacity(0)) }.onAppear { AudioManager.shared.speak(text: self.description) } However, the text doesn't get read out (in the simulator). I see some output in the console: Error fetching voices: Swift.DecodingError.dataCorrupted(Swift.DecodingError.Context(codingPath: [], debugDescription: "Invalid container metadata for _UnkeyedDecodingContainer, found keyedGraphEncodingNodeID", underlyingError: nil)). Using fallback voices. I'm probably doing something wrong here, but not sure what.

Media Technologies Audio

702

Dec ’25

Improving Speech Analyzer Transcription for technical terms

I am developing an app with transcription and I am exploring ways to improve the transcription from the SpeechAnalyzer/Transcriber for technical terms. SFSpeech... recognition had the capability of being augmented by contextualStrings. Does something similar exist for SpeechAnalyzer/Transcriber? If so please point me towards the documentation and any sample code that may exist for this. If there are other options, please let me know.

Media Technologies Audio Speech

315

Sep ’25

[AVPlayerItemVideoOutput initWithPixelBufferAttributes:] output attributes setting not work

My app want Converting iphone12 HDR Video to SDR，to edit。 follow the doc Apple-HDR-Convert. My code setting the pixBuffAttributes [pixBuffAttributes setObject:(id)(kCVImageBufferYCbCrMatrix_ITU_R_709_2) forKey:(id)kCVImageBufferYCbCrMatrixKey]; [pixBuffAttributes setObject:(id)(kCVImageBufferColorPrimaries_ITU_R_709_2) forKey:(id)kCVImageBufferColorPrimariesKey]; [pixBuffAttributes setObject:(id)kCVImageBufferTransferFunction_ITU_R_709_2 forKey:(id)kCVImageBufferTransferFunctionKey]; playerItemOutput = [[AVPlayerItemVideoOutput alloc] initWithPixelBufferAttributes:pixBuffAttributes]; but I get the playerItemOutput's output buffer CFTypeRef colorAttachments = CVBufferGetAttachment(pixelBuffer, kCVImageBufferYCbCrMatrixKey, NULL); CFTypeRef colorPrimaries = CVBufferGetAttachment(pixelBuffer, kCVImageBufferColorPrimariesKey, NULL); CFTypeRef colorTransFunc = CVBufferGetAttachment(pixelBuffer, kCVImageBufferTransferFunctionKey, NULL); NSLog(@"colorAttachments = %@", colorAttachments); NSLog(@"colorPrimaries = %@", colorPrimaries); NSLog(@"colorTransFunc = %@", colorTransFunc); log output: colorAttachments = ITU_R_2020 colorPrimaries = ITU_R_2020 colorTransFunc = ITU_R_2100_HLG pixBuffAttributes setting output format invalid，please help！

Media Technologies Audio AVFoundation

845

Nov ’25

AVSpeechSynthesizer system voices (SLA clarification)

Hello, I am building an iOS-only, commercial app that uses AVSpeechSynthesizer with system voices, strictly using the APIs provided by Apple. Before distributing the app, I want to ensure that my current implementation does not conflict with the iOS Software License Agreement (SLA) and is aligned with Apple’s intended usage. For a better playback experience (more accurate estimation of utterance duration and smoother skip forward/backward during playback), I currently synthesize speech using: AVSpeechSynthesizer.write(_:toBufferCallback:) Converting the received AVAudioPCMBuffer buffers into audio data Storing the audio inside the app sandbox Playing it back using AVAudioPlayer / AVAudioEngine The cached audio is: Generated fully on-device using system voices Stored only inside the app’s private container Used only for internal playback controls (timeline, seek, skip ±5 seconds) Never shared, exported, uploaded, or exposed outside the app The alternative approaches would be: Keeping the generated audio entirely in memory (RAM) for playback purposes, without writing it to the file system at any point Or using AVSpeechSynthesizer.speak(_:) and playing speech strictly in real time which has a poorer user experience compared to my approach I have reviewed the current iOS Software License Agreement: https://www.apple.com/legal/sla/docs/iOS18_iPadOS18.pdf In particular, section (f) mentions restrictions around System Characters, Live Captions, and Personal Voice, including the following excerpt: “…use … only for your personal, non-commercial use… No other creation or use of the System Characters, Live Captions, or Personal Voice is permitted by this License, including but not limited to the use, reproduction, display, performance, recording, publishing or redistribution in a … commercial context.” I do not see a specific reference in the SLA to system text-to-speech voices used via AVSpeechSynthesizer, and I want to be certain that temporarily caching synthesized speech for internal, non-exported playback is acceptable in a commercial app. My question is: Is caching AVSpeechSynthesizer system-voice output inside the app sandbox for internal playback acceptable, or is Apple’s recommended approach to rely only on real-time playback (speak(_:)) or strictly in-memory buffering without file storage? If this question falls outside DTS technical scope and is instead a policy or licensing matter, I would appreciate guidance on the authoritative Apple documentation or the correct Apple team/contact. Thank you.

Media Technologies Audio Speech Audio

462

Feb ’26

CMFormatDescription.audioStreamBasicDescription has wrong or unexpected sample rate for audio channels with different sample rates

In my app I use AVAssetReaderTrackOutput to extract PCM audio from a user-provided video or audio file and display it as a waveform. Recently a user reported that the waveform is not in sync with his video, and after receiving the video I noticed that the waveform is in fact double as long as the video duration, i.e. it shows the audio in slow-motion, so to speak. Until now I was using CMFormatDescription.audioStreamBasicDescription.mSampleRate which for this particular user video returns 22'050. But in this case it seems that this value is wrong... because the audio file has two audio channels with different sample rates, as returned by CMFormatDescription.audioFormatList.map({ $0.mASBD.mSampleRate }) The first channel has a sample rate of 44'100, the second one 22'050. If I use the first sample rate, the waveform is perfectly in sync with the video. The problem is given by the fact that the ratio between the audio data length and the sample rate multiplied by the audio duration is 8, double the ratio for the first audio file (4). In the code below this ratio is given by Double(length) / (sampleRate * asset.duration.seconds) When commenting out the line with the sampleRate variable definition in the code below and uncommenting the following line, the ratios for both audio files are 4, which is the expected result. I would expect audioStreamBasicDescription to return the correct sample rate, i.e. the one used by AVAssetReaderTrackOutput, which (I think) somehow merges the stereo tracks. The documentation is sparse, and in particular it’s not documented whether the lower or higher sample rate is used; in this case, it seems like the higher one is used, but audioStreamBasicDescription for some reason returns the lower one. Does anybody know why this is the case or how I should extract the sample rate of the produced PCM audio data? Should I always take the higher one? I created FB19620455. let openPanel = NSOpenPanel() openPanel.allowedContentTypes = [.audiovisualContent] openPanel.runModal() let url = openPanel.urls[0] let asset = AVURLAsset(url: url) let assetTrack = asset.tracks(withMediaType: .audio)[0] let assetReader = try! AVAssetReader(asset: asset) let readerOutput = AVAssetReaderTrackOutput(track: assetTrack, outputSettings: [AVFormatIDKey: Int(kAudioFormatLinearPCM), AVLinearPCMBitDepthKey: 16, AVLinearPCMIsBigEndianKey: false, AVLinearPCMIsFloatKey: false, AVLinearPCMIsNonInterleaved: false]) readerOutput.alwaysCopiesSampleData = false assetReader.add(readerOutput) let formatDescriptions = assetTrack.formatDescriptions as! [CMFormatDescription] let sampleRate = formatDescriptions[0].audioStreamBasicDescription!.mSampleRate //let sampleRate = formatDescriptions[0].audioFormatList.map({ $0.mASBD.mSampleRate }).max()! print(formatDescriptions[0].audioStreamBasicDescription!.mSampleRate) print(formatDescriptions[0].audioFormatList.map({ $0.mASBD.mSampleRate })) if !assetReader.startReading() { preconditionFailure() } var length = 0 while assetReader.status == .reading { guard let sampleBuffer = readerOutput.copyNextSampleBuffer(), let blockBuffer = sampleBuffer.dataBuffer else { break } length += blockBuffer.dataLength } print(Double(length) / (sampleRate * asset.duration.seconds))

Media Technologies Audio AVFoundation

134

Aug ’25

Ability to programatically download Premium and Enhanced voices

Please consider adding the ability to programatically download Premium and Enhanced voices. At the moment it is extremely inconvenient for our users, as they have to navigate to settings themselves to download voices. Our app relies heavily on SpeechSynthesis integration, and it would greatly benefit from this feature. FB16307193

Media Technologies Audio AVFoundation

Jun ’25

Process to request the restricted entitlement behind “DJ with Apple Music” (tempo control / time-stretch on Apple Music streams)?

Hi, I’m an iOS developer building an app with an use case that needs advanced playback on Apple Music subscription streams, specifically: • Real-time tempo change (BPM) during playback — i.e., time-stretch with key-lock, not just crossfade. • Beat-matched transitions between tracks. From what I can tell, this capability seems to exist only for approved partners and isn’t available through public MusicKit. Question: What’s the official request path to be evaluated for that restricted partner entitlement (application form, questionnaire, NDA, or internal team/BD contact)? If the entitlement identifier is internal, how can I get my account routed to the right Apple Music team? For reference, publicly announced partners include Algoriddim djay, Serato DJ Pro, rekordbox (AlphaTheta), and Engine DJ—all of which appear to implement mixing features that imply advanced playback (tempo/beat-matching) on Apple Music content. I’d prefer not to share product details publicly for the moment and can provide specifics privately if needed. Thanks in advance!

Media Technologies Audio Apple Music API FairPlay Streaming MusicKit AVFoundation

386

Oct ’25

iOS 26.0 (23A5276f) - Bluetooth Call Audio Broken (AirPods + Car)

iOS 26.0 (23A5276f) – Bluetooth Call Audio Issue I’m experiencing a Bluetooth audio issue on iOS 26.0 (build 23A5276f). I cannot make or receive phone calls properly using Bluetooth devices — this affects both my car’s Bluetooth system and my AirPods Pro (2nd generation). Notably: Regular phone calls have no audio (either I can’t hear the other person, or they can’t hear me). WhatsApp and other VoIP apps work fine with the same Bluetooth devices. Media playback (music, video, etc.) works without issues over Bluetooth. It seems this bug is limited to the native Phone app or the system audio routing for regular cellular calls. Please advise if this is a known issue or if a fix is expected in upcoming beta releases.

Media Technologies Audio Media

340

Jun ’25

How can third-party iOS apps obtain real-time waveform / spectrogram data for Apple Music tracks (similar to djay & other DJ apps)?

Media Technologies Audio