During last preparation of upcoming sharing topics, @ChunLin and I have a thought of creating AI service that can automate the translation of handwritten text on forms into computerized text.
Therefore, we did some researches, and we think using Azure Machine Learning + Microsoft Cognitive Services’ Computer Vision services might be a good choice. However, to make it easy, I had decided to use the combination of Microsoft Custom Vision service + Computer Vision service.
Introduction to Cognitive Service
Cognitive Service is developed by Microsoft to solve some of the AI problems. It wrapped as a API services that allows developers to use some Artificial Intelligence services in their service.
Microsoft Cognitive Services consist of 5 main categories:
- Vision—analyze images and videos for content and other useful information.
- Speech—tools to improve speech recognition and identify the speaker.
- Language—understanding sentences and intent rather than just words.
- Knowledge—tracks down research from scientific journals for you.
- Search—applies machine learning to web searches.
Two services we will be using today: Computer Vision and Custom Vision.
The Custom Vision is one of the Cognitive Service that uses a machine learning algorithm to classify images. You, the developer, must submit groups of images that feature and lack the classification(s) in question. You specify the correct tags of the images at the time of submission. Then, the algorithm trains to this data and calculates its own accuracy by testing itself on that same data. Once the model is trained, you can test, retrain, and eventually use it to classify new images according to the needs of your app. Custom Vision functionality can be divided into two features. Image classification assigns a distribution of classifications to each image. Object detection is similar, but it also returns the coordinates in the image where the applied tags can be found.
The Computer Vision service provides developers with access to advanced algorithms for processing images and returning information. Computer Vision works with popular image formats, such as JPEG and PNG. To analyze an image, you can either upload an image or specify an image URL. Computer Vision algorithms can analyze the content of an image in different ways, depending on the visual features you’re interested in. For example, the latest feature of Computer Vision is handwriting recognition which is still in preview.
Why Cognitive Services?
Cognitive Services provides a cloud platform as well as the dynamic of cloud infrastructure to allow developers integrating some ready-trained models (eg. Computer Vision, Text-Speech api, QnAMaker,etc) as well as some self-customized train model (such as Custom Vision).
Not only that, Cognitive Services are built by experts based on the decades of researches. Since it is hosted on cloud, it will be very much reliable compare to hosting on our own infrastructure. Lastly, it has a strong community support!
The following concept is demonstrating how a handwritten text would be captured and translated into computerized text. A Custom Vision (www.customvision.ai) object detection project would be first created to train a set of photos in my GitHub repo.
Once the trained model ready to be consumed, I would explain the on-going steps steps by step:
- The document would first be scanned and upload to storage via a customized Web API.
- Web API will then utilize the trained custom vision to detect the handwritten text and return the locations of detected texts in JSON format.
- Web API will crop all the images out based on the locations.
- Web API will then upload to Computer Vision’s handwriting recognition API to translate the handwritten text into computerized text.
- Lastly, Web API will return the texts as well as its location on the uploaded image.
It is still a very rough idea to improve the automation process in translating information written on forms into a computerized text that will be stored in database. Don’t forget to check out my GitHub repo for more updated progress as we are moving forward.