I am trying to extract subtitles from two taiwanese series “我們與惡的距離: The world between us” and “想見你: Someday or one day” (official translation, not literal). As for most series in mandarin, the subs are hardcoded and I am looking for a way to extract an .srt or csv files out of it, to produce a workable text. In general, it would be great to be able to retrieve all subs from series or movies en mandarin, which are almost always containing hardcoded subs.
I have seen your demo on youtube ( https://www.youtube.com/watch?v=YNGkGWj8lA4 ), and it look pretty close to what I want to do except it should be applied on VLC (or any other player) and as well, is there a way to save the full text OCRed somewhere? (I do not need the timing of the subtitles in this case, only the text). Is there a way to apply OCR to the video?
Is there also a way to know when the OCR is not completely sure, to come back and check it manually?
The purpose is to do some vocabulary analysis using R once having extracted it, not translation, for a university project.