Explore the power of machine learning and Apple Intelligence within apps. Discuss integrating features, share best practices, and explore the possibilities for your app here.

All subtopics
Posts under Machine Learning & AI topic

Post

Replies

Boosts

Views

Activity

Programmatic image creation using ImageCreator
Hello, Could you please provide details for maximum string length of the prompt and the title when using ImageCreator and the method extracted(from:title:)? static func extracted( from text: String, title: String? = nil ) -> ImagePlaygroundConcept Any additional details or example of prompt and title would help. Additionally, are ImagePlaygroundStyle.animation, ImagePlaygroundStyle.illustration and ImagePlaygroundStyle.sketch all available when using extracted(from:title:)? I am trying to generate images programmatically and would appreciate your guidance. Thank you.
1
0
412
2w
Building Real-Time Voice Input on macOS 26 with SpeechAnalyzer + ScreenCaptureKit
We built an open-source macOS menu bar app that turns speech into text and pastes it into the active app — using SpeechAnalyzer for on-device transcription, ScreenCaptureKit + Vision for screen-aware context, and FluidAudio for speaker diarization in meeting mode. Here's what we learned shipping it on macOS 26. GitHub: github.com/Marvinngg/ambient-voice Architecture The app has two modes: hotkey dictation (press to talk, release to inject) and meeting recording (continuous transcription with a floating panel). Dictation Mode Audio capture uses AVCaptureSession (more on why below). The captured audio feeds into SpeechAnalyzer via an AsyncStream: let transcriber = SpeechTranscriber( locale: locale, transcriptionOptions: [], reportingOptions: [.volatileResults, .alternativeTranscriptions], attributeOptions: [.audioTimeRange, .transcriptionConfidence] ) let analyzer = SpeechAnalyzer(modules: [transcriber]) let (inputSequence, inputBuilder) = AsyncStream.makeStream() try await analyzer.start(inputSequence: inputSequence) While recording, we capture a screenshot of the focused window using ScreenCaptureKit, run Vision OCR (VNRecognizeTextRequest), extract keywords, and inject them into SpeechAnalyzer as contextual bias: let context = AnalysisContext() context.contextualStrings[.general] = ocrKeywords try await analyzer.setContext(context) This improves accuracy for technical terms and proper nouns visible on screen. If your screen shows "SpeechAnalyzer", saying it out loud is more likely to be transcribed correctly. After transcription, an optional L2 step sends the text through a local LLM (ollama) for spoken-to-written cleanup, then CGEvent simulates Cmd+V to paste into the active app. Meeting Mode Meeting mode forks the same audio stream to two consumers: SpeechAnalyzer — real-time streaming transcription, displayed in a floating NSPanel FluidAudio buffer — accumulates 16kHz Float32 mono samples for batch speaker diarization after recording stops When the user ends the meeting, FluidAudio's performCompleteDiarization() runs on the accumulated audio. We align transcription segments with speaker segments using audioTimeRange overlap matching — each transcription segment gets assigned the speaker ID with the most time overlap. Results export to Markdown. Pitfalls We Hit on macOS 26 1. AVAudioEngine installTap doesn't fire with Bluetooth devices We started with AVAudioEngine.inputNode.installTap() for audio capture. It worked fine with built-in mics but the tap callback never fired with Bluetooth devices (tested with vivo TWS 4 Hi-Fi). Fix: switched to AVCaptureSession. The delegate callback captureOutput(_:didOutput:from:) fires reliably regardless of audio device. The tradeoff is you get CMSampleBuffer instead of AVAudioPCMBuffer, so you need a conversion step. 2. NSEvent addGlobalMonitorForEvents crashes Our global hotkey listener used NSEvent.addGlobalMonitorForEvents. On macOS 26, this crashes with a Bus error inside GlobalObserverHandler — appears to be a Swift actor runtime issue. Fix: switched to CGEventTap. Works reliably, but the callback runs on a CFRunLoop context, which Swift doesn't recognize as MainActor. 3. CGEventTap callbacks aren't on MainActor If your CGEventTap callback touches any @MainActor state, you'll get concurrency violations. The callback runs on whatever thread owns the CFRunLoop. Fix: bridge with DispatchQueue.main.async {} inside the tap callback before touching any MainActor state. 4. CGPreflightScreenCaptureAccess doesn't request permission We used CGPreflightScreenCaptureAccess() as a guard before calling ScreenCaptureKit. If it returned false, we'd bail out. The problem: this function only checks — it never triggers macOS to add your app to the Screen Recording permission list. Chicken-and-egg: you can't get permission because you never ask for it. Fix: call CGRequestScreenCaptureAccess() at app startup. This adds your app to System Settings → Screen Recording. Then let ScreenCaptureKit calls proceed without the preflight guard — SCShareableContent will also trigger the permission prompt on first use. 5. Ad-hoc signing breaks TCC permissions on every rebuild During development, codesign --sign - (ad-hoc) generates a different code directory hash on every build. macOS TCC tracks permissions by this hash, so every rebuild = new app identity = all permissions reset. Fix: sign with a stable certificate. If you have an Apple Development certificate, use that. The TeamIdentifier stays constant across rebuilds, so TCC permissions persist. We also discovered that launching via open WE.app (LaunchServices) instead of directly executing the binary is required — otherwise macOS attributes TCC permissions to Terminal, not your app. Benchmarks We ran end-to-end benchmarks on public datasets (Mac Mini M4 16GB, macOS 26): Transcription (SpeechAnalyzer, AliMeeting Chinese): • Near-field CER 34% (excluding outliers ~25%) • Far-field CER 40% (single channel, no beamforming, >30% overlap) • Processing speed 74-89x real-time Speaker diarization (FluidAudio offline): • AMI English 16 meetings: avg DER 23.2% (collar=0.25s, ignoreOverlap=True) • AliMeeting Chinese 8 meetings: DER 48.5% (including overlap regions) • Memory: RSS ~500MB, peak 730-930MB Full evaluation methodology, scripts, and raw results are in the repo. Open Source The project is MIT licensed: github.com/Marvinngg/ambient-voice It includes the macOS client (Swift 6.2, SPM), server-side distillation/training scripts (Python), and a complete evaluation framework with reproducible benchmarks. Feedback and contributions welcome.
0
0
428
3w
Powermetrics GPU power vs system DC power discrepancy on M4 Max
While analyzing system power on an M4 Max under GPU-heavy compute workloads, I noticed that the the GPU power reported by powermetrics does not come anywhere close to total system DC power reported by the SMC counter PDTR (as used by utilities like mactop). For example, in a heavy GPU workload, powermetrics would report a 65W idle-load delta on the GPU, but at the same time system DC power would rise by 179W, leaving 114W or nearly 2/3 of total system DC power on a Mac Studio M4 Max unexplained. From measurements, the difference appears to correlate with the amount of on-chip data movement (for example, varying bytes-per-FLOP in the workload changes the observed gap). Using SMC and IOReport, I was able to reverse engineer an energy model for the GPU that explains almost all of the energy flow with less than 2% error on the workload I studied. The result is a simple two-term energy roofline model: P_GPU (GPU_combined term in the plot) ≈ a * bytes + b * FLOPs with: ~5 pJ/byte for SRAM movement ~2.7 pJ/FLOP for compute. Has anyone observed similar behavior, or is there guidance on how GPU power reported by IOReport/powermetrics should be interpreted relative to total system power? In particular, I’m interested in whether certain classes of GPU activity may not be attributed to the GPU component in IOReport. Full details with the methodology and results are available here: https://youtu.be/HKxIGgyeISM
0
0
87
3w
Building a 4-agent autonomous coding pipeline on Apple Silicon — MLX backend questions
Hi, I'm building ANF (Autonomous Native Forge) — a cloud-free, 4-agent autonomous software production pipeline running on local hardware with local LLM inference. No middleware, pure Node.js native. Currently running on NVIDIA Blackwell GB10 with vLLM + DeepSeek-R1-32B. Now porting to Apple Silicon. Three technical questions: How production-ready is mlx-lm's OpenAI-compatible API server for long context generation (32K tokens)? What's the recommended approach for KV Cache management with Unified Memory architecture — any specific flags or configurations for M4 Ultra? MLX vs GGUF (llama.cpp) for a multi-agent pipeline where 4 agents call the inference endpoint concurrently — which handles parallel requests better on Apple Silicon? GitHub: github.com/trgysvc/AutonomousNativeForge Any guidance appreciated.
0
0
278
4w
How does ARKit achieve low-latency and stable head tracking using only RGB camera ?
Hi, I’m working on a real-time head/face tracking pipeline using a standard 2D RGB camera, and I’m trying to better understand how ARKit achieves such stable and responsive results in comparable conditions. To clarify upfront: I’m specifically interested in RGB-only tracking and the underlying vision/ML pipeline. I’m not using TrueDepth or any depth/IR-based sensors, and I’d like to understand how similar stability and responsiveness can be achieved under those constraints. In my current setup, I estimate head pose from RGB frames (facial landmarks + PnP) and apply temporal filtering (e.g., exponential smoothing and Kalman filtering). This significantly reduces jitter, but introduces noticeable latency, especially during faster head movements. What stands out in ARKit is that it appears to maintain both: Very low jitter Very low perceived latency even when operating with camera input alone. I’m trying to understand what techniques might contribute to this behavior. In particular: Does ARKit use predictive tracking (e.g., velocity or acceleration-based pose extrapolation) to compensate for camera and processing delays in RGB-only scenarios? Are there recommended strategies for balancing temporal smoothing and responsiveness without introducing visible lag in camera-based pose estimation pipelines? Is the tracking pipeline internally decoupled from rendering (e.g., asynchronous processing with prediction applied at render time)? Are there general best practices for minimizing end-to-end latency in vision-based head tracking systems beyond standard filtering approaches? I understand that implementation details may not be public, but any high-level insights or pointers would be greatly appreciated. Thanks!
0
0
195
2w
Foundation Models framework dyld symbol errors after macOS 26 Beta 2 - LanguageModelSession constructor missing
Foundation Models framework worked perfectly on macOS 26 Beta 2, but starting from Beta 3 and continuing through Beta 6 (latest), I get dyld symbol errors even with the exact code from Apple's documentation. Environment: macOS 26.0 Beta 6 (25A5351b) Xcode 26 Beta 6 M4 Max MacBook Pro Apple Intelligence enabled and downloaded Error Details: dyld[Process]: Symbol not found: _$s16FoundationModels20LanguageModelSessionC5model10guardrails5tools12instructionsAcA06SystemcD0C_AC10GuardrailsVSayAA4Tool_pGAA12InstructionsVSgtcfC Referenced from: /path/to/app.debug.dylib Expected in: /System/Library/Frameworks/FoundationModels.framework/Versions/A/FoundationModels Code Used (Exact from Documentation): import FoundationModels // This worked on Beta 2, crashes on Beta 3+ let model = SystemLanguageModel.default let session = LanguageModelSession(model: model) let response = try await session.respond(to: "Hello") What I've Verified: FoundationModels.framework exists in /System/Library/Frameworks/ Framework is properly linked in Xcode project Apple Intelligence is enabled and working Same code works in older beta versions Issue persists even with completely fresh Xcode projects Analysis: The dyld error suggests the LanguageModelSession(model:) constructor is missing. The symbol shows it's looking for a constructor with parameters (model:guardrails:tools:instructions:), but the documentation still shows the simple (model:) constructor. Questions: Has the LanguageModelSession API changed since Beta 2? Should we now use the constructor with guardrails/tools/instructions parameters? Is this a known issue with recent betas? Are there updated code samples for the current API? Additional Context: This affects both basic SystemLanguageModel usage AND custom adapter loading. The same dyld symbol errors occur when trying to create SystemLanguageModel(adapter: adapter) as well. Any guidance on the correct API usage for current betas would be greatly appreciated. The documentation appears to be out of sync with the actual framework implementation.
1
0
686
Sep ’25
Core Model Editor and Params
Optimal Precision • Current Precision: Mixed (Float32, int32) • Optimal Precision: Not specified in the image, but typically involves using the most efficient data type for the model's operations to balance speed and memory usage without significant loss of accuracy. Comparison: • Mixed Precision: Utilizes both Float32 and int32 to optimize performance. Float32 provides high precision, while int32 reduces memory usage and increases computational speed. • Optimal Precision: Aimed at achieving the best trade-off between performance and accuracy, potentially using other data types like Float16 (bfloat16) for even greater efficiency in certain hardware environments. Operation Distribution • Current Distribution: • iOS18.mul: 168 • iOS18.transpose: 126 • iOS18.linear: 98 • iOS18.add: 97 • iOS18.sliceByIndex: 96 • iOS18.expandDims: 74 • iOS18.concat: 72 • iOS18.squeeze: 72 • iOS18.reshape: 67 • iOS18.layerNorm: 49 • iOS18.matmul: 48 • iOS18.gelu: 26 • iOS18.softmax: 24 • Split: 24 • conv: 1 • iOS18.conv: 1 Comparison: • Operation Count: Indicates how frequently each operation is executed. High counts for operations like mul, transpose, and linear suggest these are computationally intensive parts of the model. • Optimization Opportunities: Reducing the count of high-frequency operations or optimizing their execution can improve performance. This might involve pruning unnecessary operations, optimizing algorithms, or leveraging hardware acceleration. General Recommendations • Precision Tuning: Experiment with different precision levels to find the best balance for your specific hardware and accuracy requirements. • Operation Optimization: Focus on optimizing the most frequent operations. Techniques include using more efficient algorithms, parallelizing computations, or utilizing specialized hardware like GPUs or TPUs. • Benchmarking: Regularly benchmark the model to assess the impact of changes and ensure that optimizations lead to meaningful performance improvements. By focusing on these areas, you can potentially enhance the efficiency and performance of your ML model.
0
0
111
Feb ’26
Parallel/Steam processing of Apple Intelligence
I have built a MAC-OS machine intelligence application that uses Apple Intelligence. A part of the application is to preprocess text. For longer text content I have implemented chunking to get around the token limit. However the application performance is now limited by the fact that Apple Intelligence is sequential in operation. This has a large impact on the application performance. Is there any approach to operate Apple Intelligence in a parallel mode or even a streaming interface. As Apple Intelligence has Private Cloud Services I was hoping to be able to send multiple chunks in parallel as that would significantly improve performance. Any suggestions would be welcome. This could also be considered a request for a future enhancement.
2
0
222
Feb ’26
SpeechAnalyzer / AssetInventory and preinstalled assets
During testing the “Bringing advanced speech-to-text capabilities to your app” sample app demonstrating the use of iOS 26 SpeechAnalyzer, I noticed that the language model for the English locale was presumably already downloaded. Upon checking the documentation of AssetInventory, I found out that indeed, the language model can be preinstalled on the system. Can someone from the dev team share more info about what assets are preinstalled by the system? For example, can we safely assume that the English language model will almost certainly be already preinstalled by the OS if the phone has the English locale?
1
0
277
Jul ’25
CoreML Inference Acceleration
Hello everyone, I have a visual convolutional model and a video that has been decoded into many frames. When I perform inference on each frame in a loop, the speed is a bit slow. So, I started 4 threads, each running inference simultaneously, but I found that the speed is the same as serial inference, every single forward inference is slower. I used the mactop tool to check the GPU utilization, and it was only around 20%. Is this normal? How can I accelerate it?
2
0
718
Sep ’25
InferenceError with Apple Foundation Model – Context Length Exceeded on macOS 26.0 Beta
Hello Team, I'm currently working on a proof of concept using Apple's Foundation Model for a RAG-based chat system on my MacBook Pro with the M1 Max chip. Environment details: macOS: 26.0 Beta Xcode: 26.0 beta 2 (17A5241o) Target platform: iPad (as the iPhone simulator does not support Foundation models) While testing, even with very small input prompts to the LLM, I intermittently encounter the following error: InferenceError::inference-Failed::Failed to run inference: Context length of 4096 was exceeded during singleExtend. Has anyone else experienced this issue? Are there known limitations or workarounds for context length handling in this setup? Any insights would be appreciated. Thank you!
3
0
296
Jul ’25
Apple Intelligence Naughty Naughty
When doing some exploratory research into using Apple Intelligence in our aviation-focused application, I noticed that there were several times that key phases would be marked as inappropriate. I tried to stifle these using prompts and rules but couldn't get it to take hold. I was encouraged by an Apple employee to go ahead and post this so that the AI team can use the feedback. There were several terms that triggered this warning, but the two that were most prominent were: 'Tailwind' 'JFK' or 'KJFK' (NY airport ICAO/IATA codes)
2
0
608
Mar ’26
Apple ANE Peformance - throttling?
I can no longer achieve 100% ANE usage since upgrading to MacOS26 Beta 5. I used to be able to get 100%. Has Apple activated throttling or power saving features in the new Betas? Is there any new rate limiting on the API? I can hardly get above 3w or 40%. I have a M4 Pro mini (64GB) with High Power energy setting. MacOS 26 Beta 5.
2
0
342
Aug ’25
CoreML GPU NaN bug with fused QKV attention on macOS Tahoe
Problem: CoreML produces NaN on GPU (works fine on CPU) when running transformer attention with fused QKV projection on macOS 26.2. Root cause: The common::fuse_transpose_matmul optimization pass triggers a Metal kernel bug when sliced tensors feed into matmul(transpose_y=True). Workaround: pipeline = ct.PassPipeline.DEFAULT pipeline.remove_passes(['common::fuse_transpose_matmul']) mlmodel = ct.convert(model, ..., pass_pipeline=pipeline) Minimal repro: https://github.com/imperatormk/coreml-birefnet/blob/main/apple_bug_repro.py Affected: Any ViT/Swin/transformer with fused QKV attention (BiRefNet, etc.) Has anyone else hit this? Filed FB report too.
1
0
490
6d
Accessing Apple Intelligence APIs: Custom Prompt Support and Inference Capabilities
Hello Apple Developer Community, I'm exploring the integration of Apple Intelligence features into my mobile application and have a couple of questions regarding the current and upcoming API capabilities: Custom Prompt Support: Is there a way to pass custom prompts to Apple Intelligence to generate specific inferences? For instance, can we provide a unique prompt to the Writing Tools or Image Playground APIs to obtain tailored outputs? Direct Inference Capabilities: Beyond the predefined functionalities like text rewriting or image generation, does Apple Intelligence offer APIs that allow for more generalized inference tasks based on custom inputs? I understand that Apple has provided APIs such as Writing Tools, Image Playground, and Genmoji. However, I'm interested in understanding the extent of customization and flexibility these APIs offer, especially concerning custom prompts and generalized inference. Additionally, are there any plans or timelines for expanding these capabilities, perhaps with the introduction of new SDKs or frameworks that allow deeper integration and customization? Any insights, documentation links, or experiences shared would be greatly appreciated. Thank you in advance for your assistance!
3
0
370
Jun ’25
Inquiry About Building an App for Object Detection, Background Removal, and Animation
Hi all! Nice to meet you., I am planning to build an iOS application that can: Capture an image using the camera or select one from the gallery. Remove the background and keep only the detected main object. Add a border (outline) around the detected object’s shape. Apply an animation along that border (e.g., moving light or glowing effect). Include a transition animation when removing the background — for example, breaking the background into pieces as it disappears. The app Capword has a similar feature for object isolation, and I’d like to build something like that. Could you please provide any guidance, frameworks, or sample code related to: Object segmentation and background removal in Swift (Vision or Core ML). Applying custom borders and shape animations around detected objects. Recognizing the object name (e.g., “person”, “cat”, “car”) after segmentation. Thank you very much for your support. Best regards, SINN SOKLYHOR
0
0
198
Nov ’25
Huge discrepency of predictions confidence between from Pytorch to Coreml example
I am follwing this tutorial: https://apple.github.io/coremltools/docs-guides/source/convert-a-torchvision-model-from-pytorch.html I have obtained simialr result using the python code. However when I view it in Xcode, the preview prediction percentage confidence is way off I suspect it is due the the output of the model, which is in percentage already and in Xcode it multiply 100 again leading to this result. Please give me any feedback to fix this, thank you.
0
0
307
Nov ’25
Programmatic image creation using ImageCreator
Hello, Could you please provide details for maximum string length of the prompt and the title when using ImageCreator and the method extracted(from:title:)? static func extracted( from text: String, title: String? = nil ) -> ImagePlaygroundConcept Any additional details or example of prompt and title would help. Additionally, are ImagePlaygroundStyle.animation, ImagePlaygroundStyle.illustration and ImagePlaygroundStyle.sketch all available when using extracted(from:title:)? I am trying to generate images programmatically and would appreciate your guidance. Thank you.
Replies
1
Boosts
0
Views
412
Activity
2w
Building Real-Time Voice Input on macOS 26 with SpeechAnalyzer + ScreenCaptureKit
We built an open-source macOS menu bar app that turns speech into text and pastes it into the active app — using SpeechAnalyzer for on-device transcription, ScreenCaptureKit + Vision for screen-aware context, and FluidAudio for speaker diarization in meeting mode. Here's what we learned shipping it on macOS 26. GitHub: github.com/Marvinngg/ambient-voice Architecture The app has two modes: hotkey dictation (press to talk, release to inject) and meeting recording (continuous transcription with a floating panel). Dictation Mode Audio capture uses AVCaptureSession (more on why below). The captured audio feeds into SpeechAnalyzer via an AsyncStream: let transcriber = SpeechTranscriber( locale: locale, transcriptionOptions: [], reportingOptions: [.volatileResults, .alternativeTranscriptions], attributeOptions: [.audioTimeRange, .transcriptionConfidence] ) let analyzer = SpeechAnalyzer(modules: [transcriber]) let (inputSequence, inputBuilder) = AsyncStream.makeStream() try await analyzer.start(inputSequence: inputSequence) While recording, we capture a screenshot of the focused window using ScreenCaptureKit, run Vision OCR (VNRecognizeTextRequest), extract keywords, and inject them into SpeechAnalyzer as contextual bias: let context = AnalysisContext() context.contextualStrings[.general] = ocrKeywords try await analyzer.setContext(context) This improves accuracy for technical terms and proper nouns visible on screen. If your screen shows "SpeechAnalyzer", saying it out loud is more likely to be transcribed correctly. After transcription, an optional L2 step sends the text through a local LLM (ollama) for spoken-to-written cleanup, then CGEvent simulates Cmd+V to paste into the active app. Meeting Mode Meeting mode forks the same audio stream to two consumers: SpeechAnalyzer — real-time streaming transcription, displayed in a floating NSPanel FluidAudio buffer — accumulates 16kHz Float32 mono samples for batch speaker diarization after recording stops When the user ends the meeting, FluidAudio's performCompleteDiarization() runs on the accumulated audio. We align transcription segments with speaker segments using audioTimeRange overlap matching — each transcription segment gets assigned the speaker ID with the most time overlap. Results export to Markdown. Pitfalls We Hit on macOS 26 1. AVAudioEngine installTap doesn't fire with Bluetooth devices We started with AVAudioEngine.inputNode.installTap() for audio capture. It worked fine with built-in mics but the tap callback never fired with Bluetooth devices (tested with vivo TWS 4 Hi-Fi). Fix: switched to AVCaptureSession. The delegate callback captureOutput(_:didOutput:from:) fires reliably regardless of audio device. The tradeoff is you get CMSampleBuffer instead of AVAudioPCMBuffer, so you need a conversion step. 2. NSEvent addGlobalMonitorForEvents crashes Our global hotkey listener used NSEvent.addGlobalMonitorForEvents. On macOS 26, this crashes with a Bus error inside GlobalObserverHandler — appears to be a Swift actor runtime issue. Fix: switched to CGEventTap. Works reliably, but the callback runs on a CFRunLoop context, which Swift doesn't recognize as MainActor. 3. CGEventTap callbacks aren't on MainActor If your CGEventTap callback touches any @MainActor state, you'll get concurrency violations. The callback runs on whatever thread owns the CFRunLoop. Fix: bridge with DispatchQueue.main.async {} inside the tap callback before touching any MainActor state. 4. CGPreflightScreenCaptureAccess doesn't request permission We used CGPreflightScreenCaptureAccess() as a guard before calling ScreenCaptureKit. If it returned false, we'd bail out. The problem: this function only checks — it never triggers macOS to add your app to the Screen Recording permission list. Chicken-and-egg: you can't get permission because you never ask for it. Fix: call CGRequestScreenCaptureAccess() at app startup. This adds your app to System Settings → Screen Recording. Then let ScreenCaptureKit calls proceed without the preflight guard — SCShareableContent will also trigger the permission prompt on first use. 5. Ad-hoc signing breaks TCC permissions on every rebuild During development, codesign --sign - (ad-hoc) generates a different code directory hash on every build. macOS TCC tracks permissions by this hash, so every rebuild = new app identity = all permissions reset. Fix: sign with a stable certificate. If you have an Apple Development certificate, use that. The TeamIdentifier stays constant across rebuilds, so TCC permissions persist. We also discovered that launching via open WE.app (LaunchServices) instead of directly executing the binary is required — otherwise macOS attributes TCC permissions to Terminal, not your app. Benchmarks We ran end-to-end benchmarks on public datasets (Mac Mini M4 16GB, macOS 26): Transcription (SpeechAnalyzer, AliMeeting Chinese): • Near-field CER 34% (excluding outliers ~25%) • Far-field CER 40% (single channel, no beamforming, >30% overlap) • Processing speed 74-89x real-time Speaker diarization (FluidAudio offline): • AMI English 16 meetings: avg DER 23.2% (collar=0.25s, ignoreOverlap=True) • AliMeeting Chinese 8 meetings: DER 48.5% (including overlap regions) • Memory: RSS ~500MB, peak 730-930MB Full evaluation methodology, scripts, and raw results are in the repo. Open Source The project is MIT licensed: github.com/Marvinngg/ambient-voice It includes the macOS client (Swift 6.2, SPM), server-side distillation/training scripts (Python), and a complete evaluation framework with reproducible benchmarks. Feedback and contributions welcome.
Replies
0
Boosts
0
Views
428
Activity
3w
Powermetrics GPU power vs system DC power discrepancy on M4 Max
While analyzing system power on an M4 Max under GPU-heavy compute workloads, I noticed that the the GPU power reported by powermetrics does not come anywhere close to total system DC power reported by the SMC counter PDTR (as used by utilities like mactop). For example, in a heavy GPU workload, powermetrics would report a 65W idle-load delta on the GPU, but at the same time system DC power would rise by 179W, leaving 114W or nearly 2/3 of total system DC power on a Mac Studio M4 Max unexplained. From measurements, the difference appears to correlate with the amount of on-chip data movement (for example, varying bytes-per-FLOP in the workload changes the observed gap). Using SMC and IOReport, I was able to reverse engineer an energy model for the GPU that explains almost all of the energy flow with less than 2% error on the workload I studied. The result is a simple two-term energy roofline model: P_GPU (GPU_combined term in the plot) ≈ a * bytes + b * FLOPs with: ~5 pJ/byte for SRAM movement ~2.7 pJ/FLOP for compute. Has anyone observed similar behavior, or is there guidance on how GPU power reported by IOReport/powermetrics should be interpreted relative to total system power? In particular, I’m interested in whether certain classes of GPU activity may not be attributed to the GPU component in IOReport. Full details with the methodology and results are available here: https://youtu.be/HKxIGgyeISM
Replies
0
Boosts
0
Views
87
Activity
3w
Building a 4-agent autonomous coding pipeline on Apple Silicon — MLX backend questions
Hi, I'm building ANF (Autonomous Native Forge) — a cloud-free, 4-agent autonomous software production pipeline running on local hardware with local LLM inference. No middleware, pure Node.js native. Currently running on NVIDIA Blackwell GB10 with vLLM + DeepSeek-R1-32B. Now porting to Apple Silicon. Three technical questions: How production-ready is mlx-lm's OpenAI-compatible API server for long context generation (32K tokens)? What's the recommended approach for KV Cache management with Unified Memory architecture — any specific flags or configurations for M4 Ultra? MLX vs GGUF (llama.cpp) for a multi-agent pipeline where 4 agents call the inference endpoint concurrently — which handles parallel requests better on Apple Silicon? GitHub: github.com/trgysvc/AutonomousNativeForge Any guidance appreciated.
Replies
0
Boosts
0
Views
278
Activity
4w
How does ARKit achieve low-latency and stable head tracking using only RGB camera ?
Hi, I’m working on a real-time head/face tracking pipeline using a standard 2D RGB camera, and I’m trying to better understand how ARKit achieves such stable and responsive results in comparable conditions. To clarify upfront: I’m specifically interested in RGB-only tracking and the underlying vision/ML pipeline. I’m not using TrueDepth or any depth/IR-based sensors, and I’d like to understand how similar stability and responsiveness can be achieved under those constraints. In my current setup, I estimate head pose from RGB frames (facial landmarks + PnP) and apply temporal filtering (e.g., exponential smoothing and Kalman filtering). This significantly reduces jitter, but introduces noticeable latency, especially during faster head movements. What stands out in ARKit is that it appears to maintain both: Very low jitter Very low perceived latency even when operating with camera input alone. I’m trying to understand what techniques might contribute to this behavior. In particular: Does ARKit use predictive tracking (e.g., velocity or acceleration-based pose extrapolation) to compensate for camera and processing delays in RGB-only scenarios? Are there recommended strategies for balancing temporal smoothing and responsiveness without introducing visible lag in camera-based pose estimation pipelines? Is the tracking pipeline internally decoupled from rendering (e.g., asynchronous processing with prediction applied at render time)? Are there general best practices for minimizing end-to-end latency in vision-based head tracking systems beyond standard filtering approaches? I understand that implementation details may not be public, but any high-level insights or pointers would be greatly appreciated. Thanks!
Replies
0
Boosts
0
Views
195
Activity
2w
Foundation Models framework dyld symbol errors after macOS 26 Beta 2 - LanguageModelSession constructor missing
Foundation Models framework worked perfectly on macOS 26 Beta 2, but starting from Beta 3 and continuing through Beta 6 (latest), I get dyld symbol errors even with the exact code from Apple's documentation. Environment: macOS 26.0 Beta 6 (25A5351b) Xcode 26 Beta 6 M4 Max MacBook Pro Apple Intelligence enabled and downloaded Error Details: dyld[Process]: Symbol not found: _$s16FoundationModels20LanguageModelSessionC5model10guardrails5tools12instructionsAcA06SystemcD0C_AC10GuardrailsVSayAA4Tool_pGAA12InstructionsVSgtcfC Referenced from: /path/to/app.debug.dylib Expected in: /System/Library/Frameworks/FoundationModels.framework/Versions/A/FoundationModels Code Used (Exact from Documentation): import FoundationModels // This worked on Beta 2, crashes on Beta 3+ let model = SystemLanguageModel.default let session = LanguageModelSession(model: model) let response = try await session.respond(to: "Hello") What I've Verified: FoundationModels.framework exists in /System/Library/Frameworks/ Framework is properly linked in Xcode project Apple Intelligence is enabled and working Same code works in older beta versions Issue persists even with completely fresh Xcode projects Analysis: The dyld error suggests the LanguageModelSession(model:) constructor is missing. The symbol shows it's looking for a constructor with parameters (model:guardrails:tools:instructions:), but the documentation still shows the simple (model:) constructor. Questions: Has the LanguageModelSession API changed since Beta 2? Should we now use the constructor with guardrails/tools/instructions parameters? Is this a known issue with recent betas? Are there updated code samples for the current API? Additional Context: This affects both basic SystemLanguageModel usage AND custom adapter loading. The same dyld symbol errors occur when trying to create SystemLanguageModel(adapter: adapter) as well. Any guidance on the correct API usage for current betas would be greatly appreciated. The documentation appears to be out of sync with the actual framework implementation.
Replies
1
Boosts
0
Views
686
Activity
Sep ’25
Core Model Editor and Params
Optimal Precision • Current Precision: Mixed (Float32, int32) • Optimal Precision: Not specified in the image, but typically involves using the most efficient data type for the model's operations to balance speed and memory usage without significant loss of accuracy. Comparison: • Mixed Precision: Utilizes both Float32 and int32 to optimize performance. Float32 provides high precision, while int32 reduces memory usage and increases computational speed. • Optimal Precision: Aimed at achieving the best trade-off between performance and accuracy, potentially using other data types like Float16 (bfloat16) for even greater efficiency in certain hardware environments. Operation Distribution • Current Distribution: • iOS18.mul: 168 • iOS18.transpose: 126 • iOS18.linear: 98 • iOS18.add: 97 • iOS18.sliceByIndex: 96 • iOS18.expandDims: 74 • iOS18.concat: 72 • iOS18.squeeze: 72 • iOS18.reshape: 67 • iOS18.layerNorm: 49 • iOS18.matmul: 48 • iOS18.gelu: 26 • iOS18.softmax: 24 • Split: 24 • conv: 1 • iOS18.conv: 1 Comparison: • Operation Count: Indicates how frequently each operation is executed. High counts for operations like mul, transpose, and linear suggest these are computationally intensive parts of the model. • Optimization Opportunities: Reducing the count of high-frequency operations or optimizing their execution can improve performance. This might involve pruning unnecessary operations, optimizing algorithms, or leveraging hardware acceleration. General Recommendations • Precision Tuning: Experiment with different precision levels to find the best balance for your specific hardware and accuracy requirements. • Operation Optimization: Focus on optimizing the most frequent operations. Techniques include using more efficient algorithms, parallelizing computations, or utilizing specialized hardware like GPUs or TPUs. • Benchmarking: Regularly benchmark the model to assess the impact of changes and ensure that optimizations lead to meaningful performance improvements. By focusing on these areas, you can potentially enhance the efficiency and performance of your ML model.
Replies
0
Boosts
0
Views
111
Activity
Feb ’26
face and body detection is local model or a cloud model?
Is the face and body detection service in the Vision framework a local model or a cloud model? https://developer.apple.com/documentation/vision
Replies
1
Boosts
0
Views
746
Activity
Sep ’25
Parallel/Steam processing of Apple Intelligence
I have built a MAC-OS machine intelligence application that uses Apple Intelligence. A part of the application is to preprocess text. For longer text content I have implemented chunking to get around the token limit. However the application performance is now limited by the fact that Apple Intelligence is sequential in operation. This has a large impact on the application performance. Is there any approach to operate Apple Intelligence in a parallel mode or even a streaming interface. As Apple Intelligence has Private Cloud Services I was hoping to be able to send multiple chunks in parallel as that would significantly improve performance. Any suggestions would be welcome. This could also be considered a request for a future enhancement.
Replies
2
Boosts
0
Views
222
Activity
Feb ’26
SpeechAnalyzer / AssetInventory and preinstalled assets
During testing the “Bringing advanced speech-to-text capabilities to your app” sample app demonstrating the use of iOS 26 SpeechAnalyzer, I noticed that the language model for the English locale was presumably already downloaded. Upon checking the documentation of AssetInventory, I found out that indeed, the language model can be preinstalled on the system. Can someone from the dev team share more info about what assets are preinstalled by the system? For example, can we safely assume that the English language model will almost certainly be already preinstalled by the OS if the phone has the English locale?
Replies
1
Boosts
0
Views
277
Activity
Jul ’25
CoreML Inference Acceleration
Hello everyone, I have a visual convolutional model and a video that has been decoded into many frames. When I perform inference on each frame in a loop, the speed is a bit slow. So, I started 4 threads, each running inference simultaneously, but I found that the speed is the same as serial inference, every single forward inference is slower. I used the mactop tool to check the GPU utilization, and it was only around 20%. Is this normal? How can I accelerate it?
Replies
2
Boosts
0
Views
718
Activity
Sep ’25
InferenceError with Apple Foundation Model – Context Length Exceeded on macOS 26.0 Beta
Hello Team, I'm currently working on a proof of concept using Apple's Foundation Model for a RAG-based chat system on my MacBook Pro with the M1 Max chip. Environment details: macOS: 26.0 Beta Xcode: 26.0 beta 2 (17A5241o) Target platform: iPad (as the iPhone simulator does not support Foundation models) While testing, even with very small input prompts to the LLM, I intermittently encounter the following error: InferenceError::inference-Failed::Failed to run inference: Context length of 4096 was exceeded during singleExtend. Has anyone else experienced this issue? Are there known limitations or workarounds for context length handling in this setup? Any insights would be appreciated. Thank you!
Replies
3
Boosts
0
Views
296
Activity
Jul ’25
Apple Intelligence Naughty Naughty
When doing some exploratory research into using Apple Intelligence in our aviation-focused application, I noticed that there were several times that key phases would be marked as inappropriate. I tried to stifle these using prompts and rules but couldn't get it to take hold. I was encouraged by an Apple employee to go ahead and post this so that the AI team can use the feedback. There were several terms that triggered this warning, but the two that were most prominent were: 'Tailwind' 'JFK' or 'KJFK' (NY airport ICAO/IATA codes)
Replies
2
Boosts
0
Views
608
Activity
Mar ’26
Apple ANE Peformance - throttling?
I can no longer achieve 100% ANE usage since upgrading to MacOS26 Beta 5. I used to be able to get 100%. Has Apple activated throttling or power saving features in the new Betas? Is there any new rate limiting on the API? I can hardly get above 3w or 40%. I have a M4 Pro mini (64GB) with High Power energy setting. MacOS 26 Beta 5.
Replies
2
Boosts
0
Views
342
Activity
Aug ’25
CoreML GPU NaN bug with fused QKV attention on macOS Tahoe
Problem: CoreML produces NaN on GPU (works fine on CPU) when running transformer attention with fused QKV projection on macOS 26.2. Root cause: The common::fuse_transpose_matmul optimization pass triggers a Metal kernel bug when sliced tensors feed into matmul(transpose_y=True). Workaround: pipeline = ct.PassPipeline.DEFAULT pipeline.remove_passes(['common::fuse_transpose_matmul']) mlmodel = ct.convert(model, ..., pass_pipeline=pipeline) Minimal repro: https://github.com/imperatormk/coreml-birefnet/blob/main/apple_bug_repro.py Affected: Any ViT/Swin/transformer with fused QKV attention (BiRefNet, etc.) Has anyone else hit this? Filed FB report too.
Replies
1
Boosts
0
Views
490
Activity
6d
Accessing Apple Intelligence APIs: Custom Prompt Support and Inference Capabilities
Hello Apple Developer Community, I'm exploring the integration of Apple Intelligence features into my mobile application and have a couple of questions regarding the current and upcoming API capabilities: Custom Prompt Support: Is there a way to pass custom prompts to Apple Intelligence to generate specific inferences? For instance, can we provide a unique prompt to the Writing Tools or Image Playground APIs to obtain tailored outputs? Direct Inference Capabilities: Beyond the predefined functionalities like text rewriting or image generation, does Apple Intelligence offer APIs that allow for more generalized inference tasks based on custom inputs? I understand that Apple has provided APIs such as Writing Tools, Image Playground, and Genmoji. However, I'm interested in understanding the extent of customization and flexibility these APIs offer, especially concerning custom prompts and generalized inference. Additionally, are there any plans or timelines for expanding these capabilities, perhaps with the introduction of new SDKs or frameworks that allow deeper integration and customization? Any insights, documentation links, or experiences shared would be greatly appreciated. Thank you in advance for your assistance!
Replies
3
Boosts
0
Views
370
Activity
Jun ’25
Apple Swift Replacing Python
This YouTube video is very interesting, discussing Swift's power and its potential to replace Python. Here is the link. https://youtu.be/6ZGlseSqar0?si=pzZVq9FKsveca4kA
Replies
0
Boosts
0
Views
160
Activity
1w
Why is Create ML only using CPU
Hi i'm curently crating a model to identify car plates (object detection) i use asitop to monitor my macbook pro and i see that only the cpu is used for the training and i wanted to know why
Replies
0
Boosts
0
Views
348
Activity
May ’25
Inquiry About Building an App for Object Detection, Background Removal, and Animation
Hi all! Nice to meet you., I am planning to build an iOS application that can: Capture an image using the camera or select one from the gallery. Remove the background and keep only the detected main object. Add a border (outline) around the detected object’s shape. Apply an animation along that border (e.g., moving light or glowing effect). Include a transition animation when removing the background — for example, breaking the background into pieces as it disappears. The app Capword has a similar feature for object isolation, and I’d like to build something like that. Could you please provide any guidance, frameworks, or sample code related to: Object segmentation and background removal in Swift (Vision or Core ML). Applying custom borders and shape animations around detected objects. Recognizing the object name (e.g., “person”, “cat”, “car”) after segmentation. Thank you very much for your support. Best regards, SINN SOKLYHOR
Replies
0
Boosts
0
Views
198
Activity
Nov ’25
Huge discrepency of predictions confidence between from Pytorch to Coreml example
I am follwing this tutorial: https://apple.github.io/coremltools/docs-guides/source/convert-a-torchvision-model-from-pytorch.html I have obtained simialr result using the python code. However when I view it in Xcode, the preview prediction percentage confidence is way off I suspect it is due the the output of the model, which is in percentage already and in Xcode it multiply 100 again leading to this result. Please give me any feedback to fix this, thank you.
Replies
0
Boosts
0
Views
307
Activity
Nov ’25