@techsinger
is hdmi capture via usb-c any use to you?
states wide compatibility and captures at 1080p.
19 pounds from Amazon, link follows:
software-wise I have no clue, but would suspect something able to run OCR on a video feed might be a locally-hostable option maybe?
@evilstevie Thanks again, this whole tutorial sounds very helpful indeed. If I can get the images out of the stream, I think I'll be done, everything else is already written.
@techsinger images out of the video-stream may be as silly as taking a print-screen if nothing else is possible.
configuring the output video from the capture down to 1Hz would make it easier, and unlikely to miss much for largely static text information.
unsure on how to recognise a selected option unless the ocr can pick up on underlined or highlighted text.
@evilstevie GPT4 can, in my experience, pick up on selected text. If I'm running a VM on VMWare workstation, and OCR that window, I can then ask the LLM "which of these options is selected?", and it will give me an answer. I've only tried it about twenty times, the purpos here is to get things to the point where the speech/braille is up, so I haven't had to do it often but, in all those twenty or so times, it hasn't made anything up/hallucinated. When I choose the option, or move to the next one and choose that, I find it has given me the right answer as to which is selected.
@evilstevie Thanks for thinking with me about this. The hardware you suggest is just what I'm after, there are quite a few of them. The issue I'm trying to get more info on, though, is how to make sure the image is in a position to be sent to the model, or OCRed. OCR is certainly possible locally, I use that to access my VMs when they need an installation or don't boot, but I have yet to find a local method of being able to ask questions. That is, if I OCR a screen with multiple options, I need to find out which is selected. I need to find out, sometimes, how many arrow presses it takes to get to the specific choice I want. That sort of thing. I hate having to send stuff off device to do this, but I have yet to see a model which can answer those questions other than GPT4, though to be fair, I'm still working on getting a GPU in this machine to try one of the more powerful/faster on-device options. Even with just OCR, though, what I want to do is display, or even save/capture, a clear image which I can then process with software. Thanks again for the suggestion.
@techsinger
a quick hunt has led to a series of tutorials on obtaining ocr output from videos, which also includes multi-columns too. this may be useful while poking through a bios screen to troubleshoot uefi woes.
limk follows:
https://pyimagesearch.com/2022/03/07/ocring-video-streams/