I know this is quite odd, but it's really just a shot in the dark so I don't reinvent the wheel. I'm sure someone has already done this and undoubtedly done it better than the way I want to. I am #blind and want to bring the HDMI output of one computer, running its UEFI configuration interface, to another computer, and then send the image of that output to OpenAI's #LLM so it can tell me what is selected, what is on the screen, and so on. This is so I can get access to the UEFI on machines, both to install systems and during those times when the machine doesn't boot and there's no sighted person around. I know of no #a11y method for blind users with #UEFI. My thinking is that a capture card would allow this. Has anyone managed this sort of thing on a windows machine? I don't mean to limit to the UEFI input, any sort of visual input from which static images are routed to a LLM from a capture card/visual input would be good to hear about. If so, I would be very grateful for any ideas on both the card and software to use, particularly so that the image is clear to the model. Thanks for having a look at what I'm sure is a very strange request.
@techsinger
is hdmi capture via usb-c any use to you?
states wide compatibility and captures at 1080p.
19 pounds from Amazon, link follows:
software-wise I have no clue, but would suspect something able to run OCR on a video feed might be a locally-hostable option maybe?
@techsinger
a quick hunt has led to a series of tutorials on obtaining ocr output from videos, which also includes multi-columns too. this may be useful while poking through a bios screen to troubleshoot uefi woes.
limk follows:
@techsinger images out of the video-stream may be as silly as taking a print-screen if nothing else is possible.
configuring the output video from the capture down to 1Hz would make it easier, and unlikely to miss much for largely static text information.
unsure on how to recognise a selected option unless the ocr can pick up on underlined or highlighted text.
@evilstevie GPT4 can, in my experience, pick up on selected text. If I'm running a VM on VMWare workstation, and OCR that window, I can then ask the LLM "which of these options is selected?", and it will give me an answer. I've only tried it about twenty times, the purpos here is to get things to the point where the speech/braille is up, so I haven't had to do it often but, in all those twenty or so times, it hasn't made anything up/hallucinated. When I choose the option, or move to the next one and choose that, I find it has given me the right answer as to which is selected.