Azure AI Services for Beginners-Part1.

6 min readFeb 14, 2024

Azure has been put forward their AI services to create intelligent, cutting-edge, and responsible applications with out-of-the-box , pre-built and customizable APIs and models. They are collectively known as Azure Cognitive Services, it take as an API call to embed the ability to see, hear, speak, search, understand, and accelerate decision-making into your apps.

Azure AI Services can be utilize by Designer-driven tools or with Rest API. Designer-driven tools are the easiest to use, and are quick to set up and automate, but might have limitations when it comes to customisation.REST APIs and client libraries provide users with more control and flexibility, but require more effort, time, and expertise to build a solution.

Here, Lets explore all categories of Azure cognitive services

Azure AI Vision Services:

With access to advanced algorithms it helps to analyse the image and return the required information which we are interested in. The image must be presented in JPEG, PNG, GIF, or BMP format and must be less than 4MB. below are four sub-categories of vision services

1. Optical Character Recognition:

OCR service allows you to extract printed or handwritten text from images such as posters, street signs and product labels, as well as from documents like articles, reports, forms, and invoices. Its support almost 20+ languages for printed text and 9 languages for handwritten text.

Read OCR engine is composed of multiple advanced machine-learning based models. It can be used to extract text from images like street signs, posters etc. Intelligent Document Processing (IDP) is the advancement of OCR, which includes a document-optimized version of Read. It can be used when we are extracting text from scanned and digital documents.

OCR can be used with vision studio or as API. Below image illustrates the usage of OCR via vision studio.

Below code illustrates the OCR service via API

Document Intelligence Read OCR model runs at a higher resolution than Azure AI Vision Read and extracts print and handwritten text from PDF documents and scanned images. It also includes support for extracting text from Microsoft Word, Excel, PowerPoint, and HTML documents. It detects paragraphs, text lines, words, locations, and languages. As mentioned in below image it have some prebuilt models for specific use cases.

2. Image Analysis:

Azure AI Vision Image Analysis service can extract a wide variety of visual features from your images. It can detect whether an image contains Adult content, specific brands or objects and human faces. The image must be in JPEG, PNG, GIF, BMP, WEBP, ICO, TIFF, or MPO format and size must less than 20MB. Below are the list of features provided by Analyze Image API.

Read text from images
Detect people in images
Generate image captions
Detect objects/ tags
Tag visual features
Get the area of interest/thumbnails
Detect brands
Detect faces/ Colour.

Below code illustrates the image tag recognition.

Tag detection

Output:

3. Face Service:

Azure AI Face service provides AI algorithms that detect, recognise, and analyse human faces in images. The supported input image formats are JPEG, PNG, GIF, BMP and the image file size should be no larger than 6 MB.

Examples:

Touch less access control: Face identification enables an enhanced access control experience while reducing the hygiene and security risks from card sharing, loss, or theft.
The day-to-day use case is to record the person attendance by recognising the faces and verify it with system recorded one.

Some features of face Services are

Face Recognition
Face Comparison
Face Identification
Face Verification
Finding Similar faces.

4. Spatial Analysis

This service is used to detect the presence and movements and count of people in video.

Spatial Analysis ingests video then detects people in the video. After people are detected, the system tracks the people as they move around over time then generates events as people interact with regions of interest. All operations give insights from a single camera’s field of view. The video must be in RTSP, rawvideo, MP4, FLV, or MKV format.It can be used via RestAPI.

Azure AI Language:

Azure Language is dedicatedly for Natural Language Processing (NLP) features for understanding and analysing text.New features of language services are

Preconfigured, which means the AI models that the feature uses are not customizable. You just send your data, and use the feature’s output in your applications.
Customizable, which means you’ll train an AI model using our tools to fit your data specifically.

Language studio is the platform to try several language service features, and see what they return in a visual manner.

It enables you to use the below service features in a no-code manner.

Named entity recognition is a preconfigured feature that categories entities in unstructured text across several predefined category groups. For example: people, events, places, dates’.
PII detection (Personally identifying) is a preconfigured feature that identifies, categories, and redacts sensitive information in both unstructured text documents, and conversation transcripts. For example: phone numbers, email addresses, forms of identification.
Language detection is a preconfigured feature that can detect the language in a document is written in, and returns a language code for a wide range of languages, variants, dialects, and some regional/cultural languages.
Sentiment analysis and opinion mining are preconfigured features that help you find out what people think of your brand or topic by mining text for clues about positive or negative sentiment, and can associate them with specific aspects of the text.
Summarization is a preconfigured feature that uses extractive text summarization to produce a summary of documents and conversation transcriptions. It extracts sentences that collectively represent the most important or relevant information within the original content.
Key phrase extraction is a preconfigured feature that evaluates and returns the main concepts in unstructured text, and returns them as a list.
Entity linking is a preconfigured feature that disambiguates the identity of entities (words or phrases) found in unstructured text and returns links to Wikipedia.
Conversational language understanding (CLU) enables users to build custom natural language understanding models to predict the overall intention of an incoming utterance and extract important information from it.
Question answering is a custom feature that finds the most appropriate answer for inputs from your users, and is commonly used to build conversational client applications, such as social media applications, chat bots, and speech-enabled desktop applications.