Detect text on image using Google Cloud Vision API (python)

Go 100x faster for simple detection tasks.

For a recent project, I had to detect IMEIs (International Mobile Equipment Identity) from an image.

The IMEI could be printed or handwritten. I had to think about every possible cases.

My first idea was to build a neural network using tensorflow : build a model, define the architecture, train it on the MNIST dataset, think about the best parameters, export the learning weights… Then I would have to receive and process the input image, detect digits among the text and recognise them to somehow extract what should be an IMEI. Not mentioning the 10-hours-issue I would face that no one faced before (we all know that 10-hours-issue).

Of course processing IMEI images wouldn’t be enough for my boss. I would then be asked to identify clients data from ID images and various official documents. My first month in this company would be about building neural network models. Eventually, my models will crash every two days because of blurred images…

Time goes fast, life is short. To make my boss happy and respect the 2 weeks deadline, I chose another solution. Faster and more accurate : Google Cloud Vision.

Google Vision allows developers to classify images, detect object, compare photos, detect printed and handwritten text, detect faces, detect explicit content… Many features I may need in the coming weeks.

Implementing Vision for text detection really takes a few minutes.


pip install google-cloud-vision


Import packages

Model definition

I use the following image to detect IMEIs :

If you enjoy tuning a model to have better results… I’m afraid that won’t be possible with Vision. 0 effort needed. 0 configuration possible. (Python Client doc) (maybe because you already get the most accurate result!)


We now simply have to provide an image in parameters.

Get response

The response contains an array of dictionnaries :

text_annotations {
description: "PHILIPS"
bounding_poly {
vertices {
x: 224
y: 284
vertices {
x: 619
y: 286
vertices {
x: 619
y: 365
vertices {
x: 224
y: 363

Dictionaries located from indexes [1,n] will be usefull if we want to locate text on the image like this :

In our case, we just want to get the entire detected text and extract an IMEI from it.

The entire detected text can be found at index 0 of response.text_annotations[0] :

MADE IN France
PN: VY379949
CN: VY609950K80780
IMEI: 448674528976410

This piece of code helps us to extract all IMEIs :


That’s all

All you need is a service account key and this code :

100% accurate doesn’t exist

Google Cloud Vision ain’t perfect. From all my tests, 10% were giving wrong results. With the current example, I also got [‘448674528976416’] (the algorithm identified a 6 instead of a 0 on the last digit).

Therefore, it’s essential to validate/verify your results.

In the case of IMEI detection, I use this algorithm to validate :

If detection fails, someone is notified and realise a human detection.

Thanks for reading ! If you face any issue, leave a comment or check the official documentation.

Future Unicorn Founder — Using tech for good