Detect text on image using Google Cloud Vision API (python)

Beranger Natanelic
3 min readJan 29, 2021

Go 100x faster for simple detection tasks.

For a recent project, I had to detect IMEIs (International Mobile Equipment Identity) from an image.

The IMEI could be printed or handwritten. I had to think about every possible cases.

My first idea was to build a neural network using tensorflow : build a model, define the architecture, train it on the MNIST dataset, think about the best parameters, export the learning weights… Then I would have to receive and process the input image, detect digits among the text and recognise them to somehow extract what should be an IMEI. Not mentioning the 10-hours-issue I would face that no one faced before (we all know that 10-hours-issue).

Of course processing IMEI images wouldn’t be enough for my boss. I would then be asked to identify clients data from ID images and various official documents. My first month in this company would be about building neural network models. Eventually, my models will crash every two days because of blurred images…

Time goes fast, life is short. To make my boss happy and respect the 2 weeks deadline, I chose another solution. Faster and more accurate : Google Cloud Vision.

Google Vision allows developers to classify images, detect object, compare photos, detect printed and handwritten text, detect faces, detect explicit content… Many features I may need in the coming weeks.

Implementing Vision for text detection really takes a few minutes.

Requirements

  • You first have to enable Cloud Vision API in the Google Cloud Platform (create a new project if needed).
  • Then, follow steps from this link to create a service account to use Vision API and download the private key file. If you face trouble doing that, follow the instruction from there.
  • Once you are set, you then need to install Vision package for python :
pip install google-cloud-vision

Code

Import packages

Model definition

I use the following image to detect IMEIs :

If you enjoy tuning a model to have better results… I’m afraid that won’t be possible with Vision. 0 effort needed. 0 configuration possible. (Python Client doc) (maybe because you already get the most accurate result!)

Liftoff!

We now simply have to provide an image in parameters.

Get response

The response contains an array of dictionnaries :

text_annotations {
description: "PHILIPS"
bounding_poly {
vertices {
x: 224
y: 284
}
vertices {
x: 619
y: 286
}
vertices {
x: 619
y: 365
}
vertices {
x: 224
y: 363
}
}
}

Dictionaries located from indexes [1,n] will be usefull if we want to locate text on the image like this :

In our case, we just want to get the entire detected text and extract an IMEI from it.

The entire detected text can be found at index 0 of response.text_annotations[0] :

PHILIPS GS$M
TCD128/34
996111004986
MADE IN France
PN: VY379949
CE168X
CN: VY609950K80780
IMEI: 448674528976410
KI

This piece of code helps us to extract all IMEIs :

['448674528976410']

That’s all

All you need is a service account key and this code :

100% accurate doesn’t exist

Google Cloud Vision ain’t perfect. From all my tests, 10% were giving wrong results. With the current example, I also got [‘448674528976416’] (the algorithm identified a 6 instead of a 0 on the last digit).

Therefore, it’s essential to validate/verify your results.

In the case of IMEI detection, I use this algorithm to validate :

If detection fails, someone is notified and realise a human detection.

Thanks for reading ! If you face any issue, leave a comment or check the official documentation.

--

--

Beranger Natanelic

Daily Google Cloud Platform user. I am sharing learnings of my tries, struggle and success.