Explore the power of machine learning and Apple Intelligence within apps. Discuss integrating features, share best practices, and explore the possibilities for your app here.

All subtopics
Posts under Machine Learning & AI topic

Post

Replies

Boosts

Views

Activity

Real Time Text detection using iOS18 RecognizeTextRequest from video buffer returns gibberish
Hey Devs, I'm trying to create my own Real Time Text detection like this Apple project. https://developer.apple.com/documentation/vision/extracting-phone-numbers-from-text-in-images I want to use the new iOS18 RecognizeTextRequest instead of the old VNRecognizeTextRequest in my SwiftUI project. This is my delegate code with the camera setup. I removed region of interest for debugging but I'm trying to scan English words in books. The idea is to get one word in the ROI in the future. But I can't even get proper words so testing without ROI incase my math is wrong. @Observable class CameraManager: NSObject, AVCapturePhotoCaptureDelegate ... override init() { super.init() setUpVisionRequest() } private func setUpVisionRequest() { textRequest = RecognizeTextRequest(.revision3) } ... func setup() -> Bool { captureSession.beginConfiguration() guard let captureDevice = AVCaptureDevice.default( .builtInWideAngleCamera, for: .video, position: .back) else { return false } self.captureDevice = captureDevice guard let deviceInput = try? AVCaptureDeviceInput(device: captureDevice) else { return false } /// Check whether the session can add input. guard captureSession.canAddInput(deviceInput) else { print("Unable to add device input to the capture session.") return false } /// Add the input and output to session captureSession.addInput(deviceInput) /// Configure the video data output videoDataOutput.setSampleBufferDelegate( self, queue: videoDataOutputQueue) if captureSession.canAddOutput(videoDataOutput) { captureSession.addOutput(videoDataOutput) videoDataOutput.connection(with: .video)? .preferredVideoStabilizationMode = .off } else { return false } // Set zoom and autofocus to help focus on very small text do { try captureDevice.lockForConfiguration() captureDevice.videoZoomFactor = 2 captureDevice.autoFocusRangeRestriction = .near captureDevice.unlockForConfiguration() } catch { print("Could not set zoom level due to error: \(error)") return false } captureSession.commitConfiguration() // potential issue with background vs dispatchqueue ?? Task(priority: .background) { captureSession.startRunning() } return true } } // Issue here ??? extension CameraManager: AVCaptureVideoDataOutputSampleBufferDelegate { func captureOutput( _ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection ) { guard let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else { return } Task { textRequest.recognitionLevel = .fast textRequest.recognitionLanguages = [Locale.Language(identifier: "en-US")] do { let observations = try await textRequest.perform(on: pixelBuffer) for observation in observations { let recognizedText = observation.topCandidates(1).first print("recognized text \(recognizedText)") } } catch { print("Recognition error: \(error.localizedDescription)") } } } } The results I get look like this ( full page of English from a any book) recognized text Optional(RecognizedText(string: e bnUI W4, confidence: 0.5)) recognized text Optional(RecognizedText(string: ?'U, confidence: 0.3)) recognized text Optional(RecognizedText(string: traQt4, confidence: 0.3)) recognized text Optional(RecognizedText(string: li, confidence: 0.3)) recognized text Optional(RecognizedText(string: 15,1,#, confidence: 0.3)) recognized text Optional(RecognizedText(string: jllÈ, confidence: 0.3)) recognized text Optional(RecognizedText(string: vtrll, confidence: 0.3)) recognized text Optional(RecognizedText(string: 5,1,: 11, confidence: 0.5)) recognized text Optional(RecognizedText(string: 1141, confidence: 0.3)) recognized text Optional(RecognizedText(string: jllll ljiiilij41, confidence: 0.3)) recognized text Optional(RecognizedText(string: 2f4, confidence: 0.3)) recognized text Optional(RecognizedText(string: ktril, confidence: 0.3)) recognized text Optional(RecognizedText(string: ¥LLI, confidence: 0.3)) recognized text Optional(RecognizedText(string: 11[Itl,, confidence: 0.3)) recognized text Optional(RecognizedText(string: 'rtlÈ131, confidence: 0.3)) Even with ROI set to a specific rectangle Normalized to Vision, I get the same results with single characters returning gibberish. Any help would be amazing thank you. Am I using the buffer right ? Am I using the new perform(on: CVPixelBuffer) right ? Maybe I didn't set up my camera properly? I can provide code
1
0
365
Jul ’25
visionOS 26 beta 2: Symbol Not Found on Foundation Models
When I try to run visionOS 26 beta 2 on my device the app crashes on Launch: dyld[904]: Symbol not found: _$s16FoundationModels10TranscriptV7entriesACSayAC5EntryOG_tcfC Referenced from: <A71932DD-53EB-39E2-9733-32E9D961D186> /private/var/containers/Bundle/Application/53866099-99B1-4BBD-8C94-CD022646EB5D/VisionPets.app/VisionPets.debug.dylib Expected in: <F68A7984-6B48-3958-A48D-E9F541868C62> /System/Library/Frameworks/FoundationModels.framework/FoundationModels Symbol not found: _$s16FoundationModels10TranscriptV7entriesACSayAC5EntryOG_tcfC Referenced from: <A71932DD-53EB-39E2-9733-32E9D961D186> /private/var/containers/Bundle/Application/53866099-99B1-4BBD-8C94-CD022646EB5D/VisionPets.app/VisionPets.debug.dylib Expected in: <F68A7984-6B48-3958-A48D-E9F541868C62> /System/Library/Frameworks/FoundationModels.framework/FoundationModels dyld config: DYLD_LIBRARY_PATH=/usr/lib/system/introspection DYLD_INSERT_LIBRARIES=/usr/lib/libLogRedirect.dylib:/usr/lib/libBacktraceRecording.dylib:/usr/lib/libMainThreadChecker.dylib:/usr/lib/libViewDebuggerSupport.dylib:/System/Library/PrivateFrameworks/GPUToolsCapture.framework/GPUToolsCapture Symbol not found: _$s16FoundationModels10TranscriptV7entriesACSayAC5EntryOG_tcfC Referenced from: <A71932DD-53EB-39E2-9733-32E9D961D186> /private/var/containers/Bundle/Application/53866099-99B1-4BBD-8C94-CD022646EB5D/VisionPets.app/VisionPets.debug.dylib Expected in: <F68A7984-6B48-3958-A48D-E9F541868C62> /System/Library/Frameworks/FoundationModels.framework/FoundationModels dyld config: DYLD_LIBRARY_PATH=/usr/lib/system/introspection DYLD_INSERT_LIBRARIES=/usr/lib/libLogRedirect.dylib:/usr/lib/libBacktraceRecording.dylib:/usr/lib/libMainThreadChecker.dylib:/usr/lib/libViewDebuggerSupport.dylib:/System/Library/PrivateFrameworks/GPUToolsCapture.framework/GPUToolsCapture Message from debugger: Terminated due to signal 6
5
0
218
Jun ’25
Looking for a prebuilt TensorFlow Lite C++ library (libtensorflowlite) for macOS M1/M2
Hi everyone! 👋 I'm working on a C++ project using TensorFlow Lite and was wondering if anyone has a prebuilt TensorFlow Lite C++ library (libtensorflowlite) for macOS (Apple Silicon M1/M2) that they’d be willing to share. I’m looking specifically for the TensorFlow Lite C++ API — something that lets me use tflite::Interpreter, tflite::FlatBufferModel, etc. Building it from source using Bazel on macOS has been quite challenging and time-consuming, so a ready-to-use .dylib or .a build along with the required headers would be incredibly helpful. TensorFlow Lite version: v2.18.0 preferred Target: macOS arm64 (Apple Silicon) What I need: libtensorflowlite.dylib or .a Corresponding headers (ideally organized in a clean include/ folder) If you have one available or know where I can find a reliable prebuilt version, I’d be super grateful. Thanks in advance! 🙏
2
0
227
Apr ’25
Cannot find type ToolOutput in scope
My sample app has been working with the following code: func call(arguments: Arguments) async throws -&gt; ToolOutput { var temp:Int switch arguments.city { case .singapore: temp = Int.random(in: 30..&lt;40) case .china: temp = Int.random(in: 10..&lt;30) } let content = GeneratedContent(temp) let output = ToolOutput(content) return output } However in 26 beta 5, ToolOutput no longer available, please advice what has changed.
3
0
257
Aug ’25
How to encode Tool.Output (aka PromptRepresentable)?
Hey, I've been trying to write an AI agent for OpenAI's GPT-5, but using the @Generable Tool types from the FoundationModels framework, which is super awesome btw! I'm having trouble implementing the tool calling, though. When I receive a tool call from the OpenAI api, I do the following: Find the tool in my [any Tool] array via the tool name I get from the model if let tool = tools.first(where: { $0.name == functionCall.name }) { // ... } Parse the arguments of the tool call via GeneratedContent(json:) let generatedContent = try GeneratedContent(json: functionCall.arguments) Pass the tool and arguments to a function that calls tool.call(arguments: arguments) and returns the tool's output type private func execute<T: Tool>(_ tool: T, with generatedContent: GeneratedContent) async throws -> T.Output { let arguments = try T.Arguments.init(generatedContent) return try await tool.call(arguments: arguments) } Up to this point, everything is working as expected. However, the tool's output type is any PromptRepresentable and I have no idea how to turn that into something that I can encode and send back to the model. I assumed there might be a way to turn it into a GeneratedContent but there is no fitting initializer. Am I missing something or is this not supported? Without a way to return the output to an external provider, it wouldn't really be possible to use FoundationModels Tool type I think. That would be unfortunate because it's implemented so elegantly. Thanks!
2
0
245
Aug ’25
All generations in #Playground macro are throwing "unsafe" Generation Errors
I'm using Xcode 26 Beta 5 and get errors on any generation I try, however harmless, when wrapped in the #Playground macro. #Playground { let session = LanguageModelSession() let topic = "pandas" let prompt = "Write a safe and respectful story about (topic)." let response = try await session.respond(to: prompt) Not seeing any issues on simulator or device. Anyone else seeing this or have any ideas? Thanks for any help! Version 26.0 beta 5 (17A5295f) macOS 26.0 Beta (25A5316i)
4
0
159
Aug ’25
FoundationModels and Core Data
Hi, I have an app that uses Core Data to store user information and display it in various views. I want to know if it's possible to easily integrate this setup with FoundationModels to make it easier for the user to query and manipulate the information, and if so, how would I go about it? Can the model be pointed to the database schema file and the SQLite file sitting in the user's app group container to parse out the information needed? And/or should the NSManagedObjects be made @Generable for better output? Any guidance about this would be useful.
1
0
235
Jun ’25
Translation framework use in Swift 6
I’m trying to integrate Apple’s Translation framework in a Swift 6 project with Approachable Concurrency enabled. I’m following the code here: https://developer.apple.com/documentation/translation/translating-text-within-your-app#Offer-a-custom-translation And, specifically, inside the following code .translationTask(configuration) { session in do { // Use the session the task provides to translate the text. let response = try await session.translate(sourceText) // Update the view with the translated result. targetText = response.targetText } catch { // Handle any errors. } } On the try await session.translate(…) line, the compiler complains that “Sending ‘session’ risks causing data races”. Extended error message: Sending main actor-isolated 'session' to @concurrent instance method 'translate' risks causing data races between @concurrent and main actor-isolated uses I’ve downloaded Apple’s sample code (at the top of linked webpage), it compiles fine as-is on Xcode 26.4, but fails with the same error as soon as I switch the Swift Language Mode to Swift 6 in the project. How can I fix this?
4
0
299
Feb ’26
Khmer Script Misidentified as Thai in Vision Framework
It is vital for Apple to refine its OCR models to correctly distinguish between Khmer and Thai scripts. Incorrectly labeling Khmer text as Thai is more than a technical bug; it is a culturally insensitive error that impacts national identity, especially given the current geopolitical climate between Cambodia and Thailand. Implementing a more robust language-detection threshold would prevent these harmful misidentifications. There is a significant logic flaw in the VNRecognizeTextRequest language detection when processing Khmer script. When the property automaticallyDetectsLanguage is set to true, the Vision framework frequently misidentifies Khmer characters as Thai. While both scripts share historical roots, they are distinct languages with different alphabets. Currently, the model’s confidence threshold for distinguishing between these two scripts is too low, leading to incorrect OCR output in both developer-facing APIs and Apple’s native ecosystem (Preview, Live Text, and Photos). import SwiftUI import Vision class TextExtractor { func extractText(from data: Data, completion: @escaping (String) -> Void) { let request = VNRecognizeTextRequest { (request, error) in guard let observations = request.results as? [VNRecognizedTextObservation] else { completion("No text found.") return } let recognizedStrings = observations.compactMap { observation in let str = observation.topCandidates(1).first?.string return "{text: \(str!), confidence: \(observation.confidence)}" } completion(recognizedStrings.joined(separator: "\n")) } request.automaticallyDetectsLanguage = true // <-- This is the issue. request.recognitionLevel = .accurate let handler = VNImageRequestHandler(data: data, options: [:]) DispatchQueue.global(qos: .background).async { do { try handler.perform([request]) } catch { completion("Failed to perform OCR: \(error.localizedDescription)") } } } } Recognizing Khmer Confidence Score is low for Khmer text. (The output is in Thai language with low confidence score) Recognizing English Confidence Score is high expected. Recognizing Thai Confidence Score is high as expected Issues on Preview, Photos Khmer text Copied text Kouk Pring Chroum Temple [19121 รอาสายสุกตีนานยารรีสใหิสรราภูชิตีนนสุฐตีย์ [รุก เผือชิษาธอยกัตธ์ตายตราพาษชาณา ถวเชยาใบสราเบรถทีมูสินตราพาษชาณา ทีมูโษา เช็ก อาษเชิษฐอารายสุกบดตพรธุรฯ ตากร"สุก"ผาตากรธกรธุกเยากสเผาพศฐตาสาย รัอรณาษ"ตีพย" สเผาพกรกฐาภูชิสาเครๆผู:สุกรตีพาสเผาพสรอสายใผิตรรารตีพสๆ เดียอลายสุกตีน ธาราชรติ ธิพรหณาะพูชุบละเาหLunet De Lajonquiere ผารูกรสาราพารผรผาสิตภพ ตารสิทูก ธิพิ คุณที่นสายเระพบพเคเผาหนารเกะทรนภาษเราภุพเสารเราษทีเลิกสญาเราหรุฬารชสเกาก เรากุม สงสอบานตรเราะากกต่ายภากายระตารุกเตียน Recommended Solutions 1. Set a Threshold Filter out the detected result where the threshold is less than or equal to 0.5, so that it would not output low quality text which can lead to the issue. For example, let recognizedStrings = observations.compactMap { observation in if observation.confidence <= 0.5 { return nil } let str = observation.topCandidates(1).first?.string return "{text: \(str!), confidence: \(observation.confidence)}" } 2. Add Khmer Language Support This issue would never happen if the model has the capability to detect and recognize image with Khmer language. Doc2Text GitHub: https://github.com/seanghay/Doc2Text-Swift
2
0
1.1k
Jan ’26
Stream response
With respond() methods, the foundation model works well enough. With streamResponse() methods, the responses are very repetitive, verbose, and messy. My app with foundation model uses more than 500 MB memory on an iPad Pro when running from Xcode. Devices supporting Apple Intelligence have at least 8GB memory. Should Apple use a bigger model (using 3 ~ 4 GB memory) for better stream responses?
2
0
292
Jul ’25
OpenIntent not executed with Visual Intelligence
I'm building a new feature with Visual Intelligence framework. My implementation for IndexedEntity and IntentValueQuery worked as expected and I can see a list of objects in visual search result. However, my OpenIntent doesn't work. When I tap on the object, I got a message on screen "Sorry somethinf went wrong ...". and the breakpoint in perform() is never triggered. Things I've tried: I added @MainActor before perform(), this didn't change anything I set static let openAppWhenRun: Bool = true and static var supportedModes: IntentModes = [.foreground(.immediate)], still nothing I created a different intent for the see more button at the end of feed. This AppIntent with schema: .visualIntelligence.semanticContentSearch worked, perform() is executed
10
0
430
Aug ’25
Apple's Illusion of Thinking paper and Path to Real AI Reasoning
Hey everyone I'm Manish Mehta, field CTO at Centific. I recently read Apple's white paper, The Illusion of Thinking and it got me thinking about the current state of AI reasoning. Who here has read it? The paper highlights how LLMs often rely on pattern recognition rather than genuine understanding. When faced with complex tasks, their performance can degrade significantly. I was just thinking that to move beyond this problem, we need to explore approaches that combines Deeper Reasoning Architectures for true cognitive capability with Deep Human Partnership to guide AI toward better judgment and understanding. The first part means fundamentally rewiring AI to reason. This involves advancing deeper architectures like World Models, which can build internal simulations to understand real-world scenarios , and Neurosymbolic systems, which combines neural networks with symbolic reasoning for deeper self-verification. Additionally, we need to look at deep human partnership and scalable oversight. An AI cannot learn certain things from data alone, it lacks the real-world judgment an AI will never have. Among other things, deep domain expert human partners are needed to instill this wisdom , validate the AI's entire reasoning process , build its ethical guardrails , and act as skilled adversaries to find hidden flaws before they can cause harm. What do you all think? Is this focus on a deeper partnership between advanced AI reasoning and deep human judgment the right path forward? Agree? Disagree? Thanks
2
0
300
Jul ’25
Restricting App Installation to Devices Supporting Apple Intelligence Without Triggering Game Mode
Hello, My app fully relies on the new Foundation Models. Since Foundation Models require Apple Intelligence, I want to ensure that only devices capable of running Apple Intelligence can install my app. When checking the UIRequiredDeviceCapabilities property for a suitable value, I found that iphone-performance-gaming-tier seems the closest match. Based on my research: On iPhone, this effectively limits installation to iPhone 15 Pro or later. On iPad, it ensures M1 or newer devices. This exactly matches the hardware requirements for Apple Intelligence. However, after setting iphone-performance-gaming-tier, I noticed that on iPad, Game Mode (Game Overlay) is automatically activated, and my app is treated as a game. My questions are: Is there a more appropriate UIRequiredDeviceCapabilities value that would enforce the same Apple Intelligence hardware requirements without triggering Game Mode? If not, is there another way to restrict installation to devices meeting Apple Intelligence requirements? Is there a way to prevent Game Mode from appearing for my app while still using this capability restriction? Thanks in advance for your help.
2
0
467
Aug ’25
Keep getting exceededContextWindowSize with Foundation Models
I'm a bit new to the LLM stuff and with Foundation Models. My understanding is that there is a token limit of around 4K. I want to process the contents of files which may be quite large. I first tried going the Tool route but that didn't work out so I then tried manually chunking the text to keep things under the limit. It mostly works except that every now and then it'll exceed the limit. This happens even when the chunks are less than 100 characters. Instructions themselves are about 500 characters but still overall, well below 1000 characters per prompt, all told, which, in my limited understanding, should not result in 4K tokens being parsed. Any ideas on what is going on here?
2
0
322
Aug ’25
LanguageModelSession with multiple tools and structured outpout
Hi, I'm using LanguageModelSession and giving it two different tools to query data from a local database. I'm wondering how I can have the session generate structured content as the response that includes data one or both tools (or no tool at all). Here is an example of what I'm trying to do: Let's say the app has access to a database that contains information about exercise and sleep data (this is just an analogy). There are two tools, GetExerciseData() and GetSleepData(). The user may then prompt something like, "how well did I sleep in November". I have this working so that it calls through to the right tool, which would return a SleepSummary. However, I can't figure out how to have the session return the right structured data. I can do this and get back good text data: let response = session.respond(to: userInput), but I believe I want to do something like: let response = session.respond(to: trimmed, generating: <SomeStructure?>) Sometimes the model I run one tool or the other, or both tools, or no tool at all. Any help of what the right way to go about this would be much appreciated. Most of the example I found have to do with 1 tool.
1
0
726
Jan ’26
TAMM toolkit v0.2.0 is for base model older than base model in macOS 26 beta 4
Problem: We trained a LoRA adapter for Apple's FoundationModels framework using their TAMM (Training Adapter for Model Modification) toolkit v0.2.0 on macOS 26 beta 4. The adapter trains successfully but fails to load with: "Adapter is not compatible with the current system base model." TAMM 2.0 contains export/constants.py with: BASE_SIGNATURE = "9799725ff8e851184037110b422d891ad3b92ec1" Findings: Adapter Export Process: In export_fmadapter.py def write_metadata(...): self_dict[MetadataKeys.BASE_SIGNATURE] = BASE_SIGNATURE # Hardcoded value The Compatibility Check: - When loading an adapter, Apple's system compares the adapter's baseModelSignature with the current system model - If they don't match: compatibleAdapterNotFound error - The error doesn't reveal the expected signature Questions: - How is BASE_SIGNATURE derived from the base model? - Is it SHA-1 of base-model.pt or some other computation? - Can we compute the correct signature for beta 4? - Or do we need Apple to release TAMM v0.3.0 with updated signature?
1
0
677
Aug ’25
Create ML fails to train a text classifier using the BERT transfer learning algorithm
I'm trying to train a text classifier model in Create ML. The Create ML app/framework offers five algorithms. I can successfully train the model with all of the algorithms except the BERT transfer learning option. When I select this algorithm, Create ML simply stops the training process immediately after the initial feature extraction phase (with no reported error). What I've tried: I tried simplifying the dataset to just a few classes and short examples in case there was a problem with the data. I tried experimenting with the number of iterations and language/script options. I checked Console.app for logged errors and found the following for the Create ML app: error 10:38:28.385778+0000 Create ML Couldn't read event column - category is invalid. Format string is : <private> error 10:38:30.902724+0000 Create ML Could not encode the entity <private>. Error: <private> I'm not sure if these errors are normal or indicative of a problem. I don't know what it means by the "event" column – I don't have an event column in my data and I don't believe there should be one. These errors are not reported when using the other algorithms. Given that I couldn't get the app to work with BERT, I switched over to the CreateML framework and followed the code samples given in the documentation. (By the way, there's an error in the docs: the line let (trainingData, testingData) = data.stratifiedSplit(on: "text", by: 0.8) should be stratifying on "label", not on "text"). The main chunk of code looks like this: var parameters = MLTextClassifier.ModelParameters( validation: .split(strategy: .automatic), algorithm: .transferLearning(.bertEmbedding, revision: 1), language: .english ) parameters.maxIterations = 100 let sentimentClassifier = try MLTextClassifier( trainingData: trainingData, textColumn: "text", labelColumn: "label", parameters: parameters ) Ultimately I want to train a single multilingual model, and I believe that BERT is the best choice for this. The problem is that there doesn't seem to be a way to choose the multilingual Latin script option in the API. In the Create ML app you can theoretically do this by selecting the Latin script with language set to "Automatic", as recommended in this WWDC video (relevant section starts at around 8:02). But, as far as I can tell, ModelParameters only lets you pick a specific language. I presume the framework must provide some way to do this, since the Create ML app uses the framework under the hood, but I can't see a way to do it. Another possibility is that the Create ML app might be misrepresenting the framework – perhaps selecting a specific language in the app doesn't actually make any difference – for example, maybe all Latin languages actually use the same model under the hood and the language selector is just there to guide people to the right choice (but this is just my speculation). Any help would be much appreciated! If possible, I'd prefer to use the Create ML app if I can get the BERT option to work – is this actually working for anyone? Or failing that, I want to use the framework to train a multilingual Latin model with BERT, so I'm looking for instructions on how to choose that specific option or confirmation that I can just choose .english to get the correct Latin multilingual model. I'm running Xcode 26.2 on Tahoe 21.1 on an M1 Pro MacBook Pro. I have version 6.2 of the Create ML app.
8
0
1.6k
Jan ’26
Real Time Text detection using iOS18 RecognizeTextRequest from video buffer returns gibberish
Hey Devs, I'm trying to create my own Real Time Text detection like this Apple project. https://developer.apple.com/documentation/vision/extracting-phone-numbers-from-text-in-images I want to use the new iOS18 RecognizeTextRequest instead of the old VNRecognizeTextRequest in my SwiftUI project. This is my delegate code with the camera setup. I removed region of interest for debugging but I'm trying to scan English words in books. The idea is to get one word in the ROI in the future. But I can't even get proper words so testing without ROI incase my math is wrong. @Observable class CameraManager: NSObject, AVCapturePhotoCaptureDelegate ... override init() { super.init() setUpVisionRequest() } private func setUpVisionRequest() { textRequest = RecognizeTextRequest(.revision3) } ... func setup() -> Bool { captureSession.beginConfiguration() guard let captureDevice = AVCaptureDevice.default( .builtInWideAngleCamera, for: .video, position: .back) else { return false } self.captureDevice = captureDevice guard let deviceInput = try? AVCaptureDeviceInput(device: captureDevice) else { return false } /// Check whether the session can add input. guard captureSession.canAddInput(deviceInput) else { print("Unable to add device input to the capture session.") return false } /// Add the input and output to session captureSession.addInput(deviceInput) /// Configure the video data output videoDataOutput.setSampleBufferDelegate( self, queue: videoDataOutputQueue) if captureSession.canAddOutput(videoDataOutput) { captureSession.addOutput(videoDataOutput) videoDataOutput.connection(with: .video)? .preferredVideoStabilizationMode = .off } else { return false } // Set zoom and autofocus to help focus on very small text do { try captureDevice.lockForConfiguration() captureDevice.videoZoomFactor = 2 captureDevice.autoFocusRangeRestriction = .near captureDevice.unlockForConfiguration() } catch { print("Could not set zoom level due to error: \(error)") return false } captureSession.commitConfiguration() // potential issue with background vs dispatchqueue ?? Task(priority: .background) { captureSession.startRunning() } return true } } // Issue here ??? extension CameraManager: AVCaptureVideoDataOutputSampleBufferDelegate { func captureOutput( _ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection ) { guard let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else { return } Task { textRequest.recognitionLevel = .fast textRequest.recognitionLanguages = [Locale.Language(identifier: "en-US")] do { let observations = try await textRequest.perform(on: pixelBuffer) for observation in observations { let recognizedText = observation.topCandidates(1).first print("recognized text \(recognizedText)") } } catch { print("Recognition error: \(error.localizedDescription)") } } } } The results I get look like this ( full page of English from a any book) recognized text Optional(RecognizedText(string: e bnUI W4, confidence: 0.5)) recognized text Optional(RecognizedText(string: ?'U, confidence: 0.3)) recognized text Optional(RecognizedText(string: traQt4, confidence: 0.3)) recognized text Optional(RecognizedText(string: li, confidence: 0.3)) recognized text Optional(RecognizedText(string: 15,1,#, confidence: 0.3)) recognized text Optional(RecognizedText(string: jllÈ, confidence: 0.3)) recognized text Optional(RecognizedText(string: vtrll, confidence: 0.3)) recognized text Optional(RecognizedText(string: 5,1,: 11, confidence: 0.5)) recognized text Optional(RecognizedText(string: 1141, confidence: 0.3)) recognized text Optional(RecognizedText(string: jllll ljiiilij41, confidence: 0.3)) recognized text Optional(RecognizedText(string: 2f4, confidence: 0.3)) recognized text Optional(RecognizedText(string: ktril, confidence: 0.3)) recognized text Optional(RecognizedText(string: ¥LLI, confidence: 0.3)) recognized text Optional(RecognizedText(string: 11[Itl,, confidence: 0.3)) recognized text Optional(RecognizedText(string: 'rtlÈ131, confidence: 0.3)) Even with ROI set to a specific rectangle Normalized to Vision, I get the same results with single characters returning gibberish. Any help would be amazing thank you. Am I using the buffer right ? Am I using the new perform(on: CVPixelBuffer) right ? Maybe I didn't set up my camera properly? I can provide code
Replies
1
Boosts
0
Views
365
Activity
Jul ’25
visionOS 26 beta 2: Symbol Not Found on Foundation Models
When I try to run visionOS 26 beta 2 on my device the app crashes on Launch: dyld[904]: Symbol not found: _$s16FoundationModels10TranscriptV7entriesACSayAC5EntryOG_tcfC Referenced from: <A71932DD-53EB-39E2-9733-32E9D961D186> /private/var/containers/Bundle/Application/53866099-99B1-4BBD-8C94-CD022646EB5D/VisionPets.app/VisionPets.debug.dylib Expected in: <F68A7984-6B48-3958-A48D-E9F541868C62> /System/Library/Frameworks/FoundationModels.framework/FoundationModels Symbol not found: _$s16FoundationModels10TranscriptV7entriesACSayAC5EntryOG_tcfC Referenced from: <A71932DD-53EB-39E2-9733-32E9D961D186> /private/var/containers/Bundle/Application/53866099-99B1-4BBD-8C94-CD022646EB5D/VisionPets.app/VisionPets.debug.dylib Expected in: <F68A7984-6B48-3958-A48D-E9F541868C62> /System/Library/Frameworks/FoundationModels.framework/FoundationModels dyld config: DYLD_LIBRARY_PATH=/usr/lib/system/introspection DYLD_INSERT_LIBRARIES=/usr/lib/libLogRedirect.dylib:/usr/lib/libBacktraceRecording.dylib:/usr/lib/libMainThreadChecker.dylib:/usr/lib/libViewDebuggerSupport.dylib:/System/Library/PrivateFrameworks/GPUToolsCapture.framework/GPUToolsCapture Symbol not found: _$s16FoundationModels10TranscriptV7entriesACSayAC5EntryOG_tcfC Referenced from: <A71932DD-53EB-39E2-9733-32E9D961D186> /private/var/containers/Bundle/Application/53866099-99B1-4BBD-8C94-CD022646EB5D/VisionPets.app/VisionPets.debug.dylib Expected in: <F68A7984-6B48-3958-A48D-E9F541868C62> /System/Library/Frameworks/FoundationModels.framework/FoundationModels dyld config: DYLD_LIBRARY_PATH=/usr/lib/system/introspection DYLD_INSERT_LIBRARIES=/usr/lib/libLogRedirect.dylib:/usr/lib/libBacktraceRecording.dylib:/usr/lib/libMainThreadChecker.dylib:/usr/lib/libViewDebuggerSupport.dylib:/System/Library/PrivateFrameworks/GPUToolsCapture.framework/GPUToolsCapture Message from debugger: Terminated due to signal 6
Replies
5
Boosts
0
Views
218
Activity
Jun ’25
Looking for a prebuilt TensorFlow Lite C++ library (libtensorflowlite) for macOS M1/M2
Hi everyone! 👋 I'm working on a C++ project using TensorFlow Lite and was wondering if anyone has a prebuilt TensorFlow Lite C++ library (libtensorflowlite) for macOS (Apple Silicon M1/M2) that they’d be willing to share. I’m looking specifically for the TensorFlow Lite C++ API — something that lets me use tflite::Interpreter, tflite::FlatBufferModel, etc. Building it from source using Bazel on macOS has been quite challenging and time-consuming, so a ready-to-use .dylib or .a build along with the required headers would be incredibly helpful. TensorFlow Lite version: v2.18.0 preferred Target: macOS arm64 (Apple Silicon) What I need: libtensorflowlite.dylib or .a Corresponding headers (ideally organized in a clean include/ folder) If you have one available or know where I can find a reliable prebuilt version, I’d be super grateful. Thanks in advance! 🙏
Replies
2
Boosts
0
Views
227
Activity
Apr ’25
Cannot find type ToolOutput in scope
My sample app has been working with the following code: func call(arguments: Arguments) async throws -&gt; ToolOutput { var temp:Int switch arguments.city { case .singapore: temp = Int.random(in: 30..&lt;40) case .china: temp = Int.random(in: 10..&lt;30) } let content = GeneratedContent(temp) let output = ToolOutput(content) return output } However in 26 beta 5, ToolOutput no longer available, please advice what has changed.
Replies
3
Boosts
0
Views
257
Activity
Aug ’25
How to encode Tool.Output (aka PromptRepresentable)?
Hey, I've been trying to write an AI agent for OpenAI's GPT-5, but using the @Generable Tool types from the FoundationModels framework, which is super awesome btw! I'm having trouble implementing the tool calling, though. When I receive a tool call from the OpenAI api, I do the following: Find the tool in my [any Tool] array via the tool name I get from the model if let tool = tools.first(where: { $0.name == functionCall.name }) { // ... } Parse the arguments of the tool call via GeneratedContent(json:) let generatedContent = try GeneratedContent(json: functionCall.arguments) Pass the tool and arguments to a function that calls tool.call(arguments: arguments) and returns the tool's output type private func execute<T: Tool>(_ tool: T, with generatedContent: GeneratedContent) async throws -> T.Output { let arguments = try T.Arguments.init(generatedContent) return try await tool.call(arguments: arguments) } Up to this point, everything is working as expected. However, the tool's output type is any PromptRepresentable and I have no idea how to turn that into something that I can encode and send back to the model. I assumed there might be a way to turn it into a GeneratedContent but there is no fitting initializer. Am I missing something or is this not supported? Without a way to return the output to an external provider, it wouldn't really be possible to use FoundationModels Tool type I think. That would be unfortunate because it's implemented so elegantly. Thanks!
Replies
2
Boosts
0
Views
245
Activity
Aug ’25
Foundation Models Tools not invoking
I am using a contact tool to help get contact from my address book. but the model ins't invoking my tool call method. Even tried with a simple tool the outcome is the same my simple tool is not being invoked.
Replies
4
Boosts
0
Views
245
Activity
Jul ’25
All generations in #Playground macro are throwing "unsafe" Generation Errors
I'm using Xcode 26 Beta 5 and get errors on any generation I try, however harmless, when wrapped in the #Playground macro. #Playground { let session = LanguageModelSession() let topic = "pandas" let prompt = "Write a safe and respectful story about (topic)." let response = try await session.respond(to: prompt) Not seeing any issues on simulator or device. Anyone else seeing this or have any ideas? Thanks for any help! Version 26.0 beta 5 (17A5295f) macOS 26.0 Beta (25A5316i)
Replies
4
Boosts
0
Views
159
Activity
Aug ’25
FoundationModels and Core Data
Hi, I have an app that uses Core Data to store user information and display it in various views. I want to know if it's possible to easily integrate this setup with FoundationModels to make it easier for the user to query and manipulate the information, and if so, how would I go about it? Can the model be pointed to the database schema file and the SQLite file sitting in the user's app group container to parse out the information needed? And/or should the NSManagedObjects be made @Generable for better output? Any guidance about this would be useful.
Replies
1
Boosts
0
Views
235
Activity
Jun ’25
Translation framework use in Swift 6
I’m trying to integrate Apple’s Translation framework in a Swift 6 project with Approachable Concurrency enabled. I’m following the code here: https://developer.apple.com/documentation/translation/translating-text-within-your-app#Offer-a-custom-translation And, specifically, inside the following code .translationTask(configuration) { session in do { // Use the session the task provides to translate the text. let response = try await session.translate(sourceText) // Update the view with the translated result. targetText = response.targetText } catch { // Handle any errors. } } On the try await session.translate(…) line, the compiler complains that “Sending ‘session’ risks causing data races”. Extended error message: Sending main actor-isolated 'session' to @concurrent instance method 'translate' risks causing data races between @concurrent and main actor-isolated uses I’ve downloaded Apple’s sample code (at the top of linked webpage), it compiles fine as-is on Xcode 26.4, but fails with the same error as soon as I switch the Swift Language Mode to Swift 6 in the project. How can I fix this?
Replies
4
Boosts
0
Views
299
Activity
Feb ’26
Khmer Script Misidentified as Thai in Vision Framework
It is vital for Apple to refine its OCR models to correctly distinguish between Khmer and Thai scripts. Incorrectly labeling Khmer text as Thai is more than a technical bug; it is a culturally insensitive error that impacts national identity, especially given the current geopolitical climate between Cambodia and Thailand. Implementing a more robust language-detection threshold would prevent these harmful misidentifications. There is a significant logic flaw in the VNRecognizeTextRequest language detection when processing Khmer script. When the property automaticallyDetectsLanguage is set to true, the Vision framework frequently misidentifies Khmer characters as Thai. While both scripts share historical roots, they are distinct languages with different alphabets. Currently, the model’s confidence threshold for distinguishing between these two scripts is too low, leading to incorrect OCR output in both developer-facing APIs and Apple’s native ecosystem (Preview, Live Text, and Photos). import SwiftUI import Vision class TextExtractor { func extractText(from data: Data, completion: @escaping (String) -> Void) { let request = VNRecognizeTextRequest { (request, error) in guard let observations = request.results as? [VNRecognizedTextObservation] else { completion("No text found.") return } let recognizedStrings = observations.compactMap { observation in let str = observation.topCandidates(1).first?.string return "{text: \(str!), confidence: \(observation.confidence)}" } completion(recognizedStrings.joined(separator: "\n")) } request.automaticallyDetectsLanguage = true // <-- This is the issue. request.recognitionLevel = .accurate let handler = VNImageRequestHandler(data: data, options: [:]) DispatchQueue.global(qos: .background).async { do { try handler.perform([request]) } catch { completion("Failed to perform OCR: \(error.localizedDescription)") } } } } Recognizing Khmer Confidence Score is low for Khmer text. (The output is in Thai language with low confidence score) Recognizing English Confidence Score is high expected. Recognizing Thai Confidence Score is high as expected Issues on Preview, Photos Khmer text Copied text Kouk Pring Chroum Temple [19121 รอาสายสุกตีนานยารรีสใหิสรราภูชิตีนนสุฐตีย์ [รุก เผือชิษาธอยกัตธ์ตายตราพาษชาณา ถวเชยาใบสราเบรถทีมูสินตราพาษชาณา ทีมูโษา เช็ก อาษเชิษฐอารายสุกบดตพรธุรฯ ตากร"สุก"ผาตากรธกรธุกเยากสเผาพศฐตาสาย รัอรณาษ"ตีพย" สเผาพกรกฐาภูชิสาเครๆผู:สุกรตีพาสเผาพสรอสายใผิตรรารตีพสๆ เดียอลายสุกตีน ธาราชรติ ธิพรหณาะพูชุบละเาหLunet De Lajonquiere ผารูกรสาราพารผรผาสิตภพ ตารสิทูก ธิพิ คุณที่นสายเระพบพเคเผาหนารเกะทรนภาษเราภุพเสารเราษทีเลิกสญาเราหรุฬารชสเกาก เรากุม สงสอบานตรเราะากกต่ายภากายระตารุกเตียน Recommended Solutions 1. Set a Threshold Filter out the detected result where the threshold is less than or equal to 0.5, so that it would not output low quality text which can lead to the issue. For example, let recognizedStrings = observations.compactMap { observation in if observation.confidence <= 0.5 { return nil } let str = observation.topCandidates(1).first?.string return "{text: \(str!), confidence: \(observation.confidence)}" } 2. Add Khmer Language Support This issue would never happen if the model has the capability to detect and recognize image with Khmer language. Doc2Text GitHub: https://github.com/seanghay/Doc2Text-Swift
Replies
2
Boosts
0
Views
1.1k
Activity
Jan ’26
Stream response
With respond() methods, the foundation model works well enough. With streamResponse() methods, the responses are very repetitive, verbose, and messy. My app with foundation model uses more than 500 MB memory on an iPad Pro when running from Xcode. Devices supporting Apple Intelligence have at least 8GB memory. Should Apple use a bigger model (using 3 ~ 4 GB memory) for better stream responses?
Replies
2
Boosts
0
Views
292
Activity
Jul ’25
The answer that goes on forever
Encountered a few times when the answer get "stuck" (I am now at beta 6). This is an example.
Replies
1
Boosts
0
Views
261
Activity
Aug ’25
OpenIntent not executed with Visual Intelligence
I'm building a new feature with Visual Intelligence framework. My implementation for IndexedEntity and IntentValueQuery worked as expected and I can see a list of objects in visual search result. However, my OpenIntent doesn't work. When I tap on the object, I got a message on screen "Sorry somethinf went wrong ...". and the breakpoint in perform() is never triggered. Things I've tried: I added @MainActor before perform(), this didn't change anything I set static let openAppWhenRun: Bool = true and static var supportedModes: IntentModes = [.foreground(.immediate)], still nothing I created a different intent for the see more button at the end of feed. This AppIntent with schema: .visualIntelligence.semanticContentSearch worked, perform() is executed
Replies
10
Boosts
0
Views
430
Activity
Aug ’25
Apple's Illusion of Thinking paper and Path to Real AI Reasoning
Hey everyone I'm Manish Mehta, field CTO at Centific. I recently read Apple's white paper, The Illusion of Thinking and it got me thinking about the current state of AI reasoning. Who here has read it? The paper highlights how LLMs often rely on pattern recognition rather than genuine understanding. When faced with complex tasks, their performance can degrade significantly. I was just thinking that to move beyond this problem, we need to explore approaches that combines Deeper Reasoning Architectures for true cognitive capability with Deep Human Partnership to guide AI toward better judgment and understanding. The first part means fundamentally rewiring AI to reason. This involves advancing deeper architectures like World Models, which can build internal simulations to understand real-world scenarios , and Neurosymbolic systems, which combines neural networks with symbolic reasoning for deeper self-verification. Additionally, we need to look at deep human partnership and scalable oversight. An AI cannot learn certain things from data alone, it lacks the real-world judgment an AI will never have. Among other things, deep domain expert human partners are needed to instill this wisdom , validate the AI's entire reasoning process , build its ethical guardrails , and act as skilled adversaries to find hidden flaws before they can cause harm. What do you all think? Is this focus on a deeper partnership between advanced AI reasoning and deep human judgment the right path forward? Agree? Disagree? Thanks
Replies
2
Boosts
0
Views
300
Activity
Jul ’25
Restricting App Installation to Devices Supporting Apple Intelligence Without Triggering Game Mode
Hello, My app fully relies on the new Foundation Models. Since Foundation Models require Apple Intelligence, I want to ensure that only devices capable of running Apple Intelligence can install my app. When checking the UIRequiredDeviceCapabilities property for a suitable value, I found that iphone-performance-gaming-tier seems the closest match. Based on my research: On iPhone, this effectively limits installation to iPhone 15 Pro or later. On iPad, it ensures M1 or newer devices. This exactly matches the hardware requirements for Apple Intelligence. However, after setting iphone-performance-gaming-tier, I noticed that on iPad, Game Mode (Game Overlay) is automatically activated, and my app is treated as a game. My questions are: Is there a more appropriate UIRequiredDeviceCapabilities value that would enforce the same Apple Intelligence hardware requirements without triggering Game Mode? If not, is there another way to restrict installation to devices meeting Apple Intelligence requirements? Is there a way to prevent Game Mode from appearing for my app while still using this capability restriction? Thanks in advance for your help.
Replies
2
Boosts
0
Views
467
Activity
Aug ’25
Keep getting exceededContextWindowSize with Foundation Models
I'm a bit new to the LLM stuff and with Foundation Models. My understanding is that there is a token limit of around 4K. I want to process the contents of files which may be quite large. I first tried going the Tool route but that didn't work out so I then tried manually chunking the text to keep things under the limit. It mostly works except that every now and then it'll exceed the limit. This happens even when the chunks are less than 100 characters. Instructions themselves are about 500 characters but still overall, well below 1000 characters per prompt, all told, which, in my limited understanding, should not result in 4K tokens being parsed. Any ideas on what is going on here?
Replies
2
Boosts
0
Views
322
Activity
Aug ’25
LanguageModelSession with multiple tools and structured outpout
Hi, I'm using LanguageModelSession and giving it two different tools to query data from a local database. I'm wondering how I can have the session generate structured content as the response that includes data one or both tools (or no tool at all). Here is an example of what I'm trying to do: Let's say the app has access to a database that contains information about exercise and sleep data (this is just an analogy). There are two tools, GetExerciseData() and GetSleepData(). The user may then prompt something like, "how well did I sleep in November". I have this working so that it calls through to the right tool, which would return a SleepSummary. However, I can't figure out how to have the session return the right structured data. I can do this and get back good text data: let response = session.respond(to: userInput), but I believe I want to do something like: let response = session.respond(to: trimmed, generating: <SomeStructure?>) Sometimes the model I run one tool or the other, or both tools, or no tool at all. Any help of what the right way to go about this would be much appreciated. Most of the example I found have to do with 1 tool.
Replies
1
Boosts
0
Views
726
Activity
Jan ’26
Documentation Deleted?
Was just wondering why the foundation model documentation is no longer available, thanks! https://developer.apple.com/documentation/FoundationModels
Replies
1
Boosts
0
Views
273
Activity
Aug ’25
TAMM toolkit v0.2.0 is for base model older than base model in macOS 26 beta 4
Problem: We trained a LoRA adapter for Apple's FoundationModels framework using their TAMM (Training Adapter for Model Modification) toolkit v0.2.0 on macOS 26 beta 4. The adapter trains successfully but fails to load with: "Adapter is not compatible with the current system base model." TAMM 2.0 contains export/constants.py with: BASE_SIGNATURE = "9799725ff8e851184037110b422d891ad3b92ec1" Findings: Adapter Export Process: In export_fmadapter.py def write_metadata(...): self_dict[MetadataKeys.BASE_SIGNATURE] = BASE_SIGNATURE # Hardcoded value The Compatibility Check: - When loading an adapter, Apple's system compares the adapter's baseModelSignature with the current system model - If they don't match: compatibleAdapterNotFound error - The error doesn't reveal the expected signature Questions: - How is BASE_SIGNATURE derived from the base model? - Is it SHA-1 of base-model.pt or some other computation? - Can we compute the correct signature for beta 4? - Or do we need Apple to release TAMM v0.3.0 with updated signature?
Replies
1
Boosts
0
Views
677
Activity
Aug ’25
Create ML fails to train a text classifier using the BERT transfer learning algorithm
I'm trying to train a text classifier model in Create ML. The Create ML app/framework offers five algorithms. I can successfully train the model with all of the algorithms except the BERT transfer learning option. When I select this algorithm, Create ML simply stops the training process immediately after the initial feature extraction phase (with no reported error). What I've tried: I tried simplifying the dataset to just a few classes and short examples in case there was a problem with the data. I tried experimenting with the number of iterations and language/script options. I checked Console.app for logged errors and found the following for the Create ML app: error 10:38:28.385778+0000 Create ML Couldn't read event column - category is invalid. Format string is : <private> error 10:38:30.902724+0000 Create ML Could not encode the entity <private>. Error: <private> I'm not sure if these errors are normal or indicative of a problem. I don't know what it means by the "event" column – I don't have an event column in my data and I don't believe there should be one. These errors are not reported when using the other algorithms. Given that I couldn't get the app to work with BERT, I switched over to the CreateML framework and followed the code samples given in the documentation. (By the way, there's an error in the docs: the line let (trainingData, testingData) = data.stratifiedSplit(on: "text", by: 0.8) should be stratifying on "label", not on "text"). The main chunk of code looks like this: var parameters = MLTextClassifier.ModelParameters( validation: .split(strategy: .automatic), algorithm: .transferLearning(.bertEmbedding, revision: 1), language: .english ) parameters.maxIterations = 100 let sentimentClassifier = try MLTextClassifier( trainingData: trainingData, textColumn: "text", labelColumn: "label", parameters: parameters ) Ultimately I want to train a single multilingual model, and I believe that BERT is the best choice for this. The problem is that there doesn't seem to be a way to choose the multilingual Latin script option in the API. In the Create ML app you can theoretically do this by selecting the Latin script with language set to "Automatic", as recommended in this WWDC video (relevant section starts at around 8:02). But, as far as I can tell, ModelParameters only lets you pick a specific language. I presume the framework must provide some way to do this, since the Create ML app uses the framework under the hood, but I can't see a way to do it. Another possibility is that the Create ML app might be misrepresenting the framework – perhaps selecting a specific language in the app doesn't actually make any difference – for example, maybe all Latin languages actually use the same model under the hood and the language selector is just there to guide people to the right choice (but this is just my speculation). Any help would be much appreciated! If possible, I'd prefer to use the Create ML app if I can get the BERT option to work – is this actually working for anyone? Or failing that, I want to use the framework to train a multilingual Latin model with BERT, so I'm looking for instructions on how to choose that specific option or confirmation that I can just choose .english to get the correct Latin multilingual model. I'm running Xcode 26.2 on Tahoe 21.1 on an M1 Pro MacBook Pro. I have version 6.2 of the Create ML app.
Replies
8
Boosts
0
Views
1.6k
Activity
Jan ’26