I suggest to do it like this:
-
visualAssert | unique_image.png
<= Use visualAssert to wait for popup to appear -
XCLICK | OCR=text
<= Once the popup is there, then use OCR.
This approach minimizes the number of OCR API calls. This is very useful if you use the default cloud OCR. If you have the local OCR server option (and thus very fast and unlimited OCR conversions) this does not matter.