**Tech Singer** @techsinger@tweesecake.social · Jan 23, 2024, 13:51

**Tech Singer** @techsinger@tweesecake.social · Jan 23, 2024, 13:51

Tech Singer @techsinger@tweesecake.social

Jan 23, 2024, 13:51

Tech Singer @techsinger@tweesecake.social

I know this is quite odd, but it's really just a shot in the dark so I don't reinvent the wheel. I'm sure someone has already done this and undoubtedly done it better than the way I want to. I am #blind and want to bring the HDMI output of one computer, running its UEFI configuration interface, to another computer, and then send the image of that output to OpenAI's #LLM so it can tell me what is selected, what is on the screen, and so on. This is so I can get access to the UEFI on machines, both to install systems and during those times when the machine doesn't boot and there's no sighted person around. I know of no #a11y method for blind users with #UEFI. My thinking is that a capture card would allow this. Has anyone managed this sort of thing on a windows machine? I don't mean to limit to the UEFI input, any sort of visual input from which static images are routed to a LLM from a capture card/visual input would be good to hear about. If so, I would be very grateful for any ideas on both the card and software to use, particularly so that the image is clear to the model. Thanks for having a look at what I'm sure is a very strange request.

**evilstevie** @evilstevie@mastod1.ddns.net · 2024-01-23T16:09:02Z

evilstevie @evilstevie@mastod1.ddns.net

@techsinger
is hdmi capture via usb-c any use to you?
states wide compatibility and captures at 1080p.
19 pounds from Amazon, link follows:

https://amzn.eu/d/3LiAY4x

software-wise I have no clue, but would suspect something able to run OCR on a video feed might be a locally-hostable option maybe?

Jan 23, 2024, 16:09 · · Mastodon for Android · · ·

**evilstevie** @evilstevie@mastod1.ddns.net · Jan 23, 2024, 16:21

**evilstevie** @evilstevie@mastod1.ddns.net · Jan 23, 2024, 16:21

Jan 23, 2024, 16:21

evilstevie @evilstevie@mastod1.ddns.net

@techsinger
a quick hunt has led to a series of tutorials on obtaining ocr output from videos, which also includes multi-columns too. this may be useful while poking through a bios screen to troubleshoot uefi woes.
limk follows:

https://pyimagesearch.com/2022/03/07/ocring-video-streams/

**Tech Singer** @techsinger@tweesecake.social · Jan 23, 2024, 16:55

**Tech Singer** @techsinger@tweesecake.social · Jan 23, 2024, 16:55

Jan 23, 2024, 16:55

Tech Singer @techsinger@tweesecake.social

@evilstevie Thanks again, this whole tutorial sounds very helpful indeed. If I can get the images out of the stream, I think I'll be done, everything else is already written.

**evilstevie** @evilstevie@mastod1.ddns.net · Jan 23, 2024, 17:00

**evilstevie** @evilstevie@mastod1.ddns.net · Jan 23, 2024, 17:00

Jan 23, 2024, 17:00

evilstevie @evilstevie@mastod1.ddns.net

@techsinger images out of the video-stream may be as silly as taking a print-screen if nothing else is possible.
configuring the output video from the capture down to 1Hz would make it easier, and unlikely to miss much for largely static text information.
unsure on how to recognise a selected option unless the ocr can pick up on underlined or highlighted text.

**Tech Singer** @techsinger@tweesecake.social · Jan 23, 2024, 17:08

**Tech Singer** @techsinger@tweesecake.social · Jan 23, 2024, 17:08

Jan 23, 2024, 17:08

Tech Singer @techsinger@tweesecake.social

@evilstevie GPT4 can, in my experience, pick up on selected text. If I'm running a VM on VMWare workstation, and OCR that window, I can then ask the LLM "which of these options is selected?", and it will give me an answer. I've only tried it about twenty times, the purpos here is to get things to the point where the speech/braille is up, so I haven't had to do it often but, in all those twenty or so times, it hasn't made anything up/hallucinated. When I choose the option, or move to the next one and choose that, I find it has given me the right answer as to which is selected.

**Tech Singer** @techsinger@tweesecake.social · Jan 23, 2024, 16:38

**Tech Singer** @techsinger@tweesecake.social · Jan 23, 2024, 16:38

Jan 23, 2024, 16:38

Tech Singer @techsinger@tweesecake.social

@evilstevie Thanks for thinking with me about this. The hardware you suggest is just what I'm after, there are quite a few of them. The issue I'm trying to get more info on, though, is how to make sure the image is in a position to be sent to the model, or OCRed. OCR is certainly possible locally, I use that to access my VMs when they need an installation or don't boot, but I have yet to find a local method of being able to ask questions. That is, if I OCR a screen with multiple options, I need to find out which is selected. I need to find out, sometimes, how many arrow presses it takes to get to the specific choice I want. That sort of thing. I hate having to send stuff off device to do this, but I have yet to see a model which can answer those questions other than GPT4, though to be fair, I'm still working on getting a GPU in this machine to try one of the more powerful/faster on-device options. Even with just OCR, though, what I want to do is display, or even save/capture, a clear image which I can then process with software. Thanks again for the suggestion.

Resources

Developers

What is Mastodon?

mastod1.ddns.net

More…