Use case #
In some instances the data you want to extract or scrape is displayed as an image. Certain classifieds websites for instance would display the phone number as an image instead of plain text. For these cases you can use RTILA OCR feature which is available in the Prospection Panel within the Property settings. In our test case we are loading the image that is available at this URL: https://b2633864.smushcdn.com/2633864/wp-content/uploads/2017/06/example_01.png?lossy=1&strip=1&webp=1
Target Image CSS selector & enable OCR #
First you need to create a DataSet, then a property. Once the property created you can see in Point 1 that we need to select the image selector (it will highlight the image with a yellowish border). Thanks to this selector we can get the URL of the image, but you can see in the result table that the image text itself is not yet captured by RTILA yet. To be able to read the text in the image you need to click on the OCR tab and check the OCR box as well as select the language in which the text is written (see Point 2).
Extract Results to save image URL & OCR data #
On the Command panel you need to use the Extract Results command to execute the image URL and OCR reading and saving of data. See screenshot below.
Run the project to extract the data #
Simply run the project and wait for it to finish. It will execute the commands and save the URL and the OCR text of the target image into the result panel.
Find OCR text value in the Results panel #
Once the project finishes running you can head to the Results panel, then click on the corresponding result line on PREVIEW & EXPORT. This will open a new panel where you can see the Image URL as well as the OCR text value being added on the 1st column.
Once the project finishes running you can head to the Results panel, then click on the corresponding result line on PREVIEW & EXPORT. This will open a new panel where you can see the Image URL as well as the OCR text value being added on the 1st column with the header “Image Text”. You can see below that our OCR function is able to read text even for images that contain noise.
Download this template and test it #
You can download this template from this link and import it as a project to test this OCR feature first hand. You can change the Image URL or even use a List Variable to apply OCR extraction to hundred or even thousand of images.