Using Form Recognizer to recognize custom forms

One of my learning goals for this half year was to learn more about AI. Although I’m focusing my learning mainly on low-level AI (Python, ML, Data Science), I was pretty happy to get involved in a customer project using the Microsoft Form Recognizer. This blog post will cover the Form Recognizer and it’s functionality.

Let’s start at the beginning: Form Recognizer is one of Microsoft’s cognitive services that allows you extract structured text from forms. The service itself has three building blocks:

  • Layout API: An API to extract text and table structure from a document using OCR.
  • Prebuilt receipt model: An API that allows you to extract data from (USA) sales receipts.
  • Custom Models: A service that allows you to build custom models to extract data from your custom forms. Custom models use the Layout API to extract text and structure, and can either be trained without labels (using no human input) or using custom labels (allowing you to provide the model with your labels and positions).

At the time of writing, the service is in preview. This means you can use it at a discounted rate right now, but don’t get a SLA for it yet.

In this post, we’ll walk through the custom model capabilities in the Form Recognizer API and how to build that out. To start out, let’s walk through the workflow for the custom model API.

Custom model workflow

When building a custom model form recognizer, you’ll need to do a couple of steps:

  1. Build a training and testing dataset
  2. Upload training dataset
  3. Label training dataset
  4. Train model
  5. Test model

All of these steps can be done directly against the API of Form Recognizer. To make it easier, Microsoft has developed an open source tool to help with steps 3 to 5. After uploading your training dataset to Azure blob storage, you can use the tool to label your training dataset and train your model.

You could also do this directly from a programming environment. The Azure documentation has a sample on how to do this from Python.

And with that, let’s have a look at how we can build a custom model using the Form Recognizer.

Prerequisites

First off, we need a storage account with training data in it. I will use the default training data from Microsoft. I created a new storage account for this, and uploaded the data to it.

Creating a new storage account.
Uploading the 5 training invoices

In our storage account, we’ll also need to allow CORS.

Setting up CORS

And finally for storage, we’ll also need a SAS token to the container hosting our files. We’ll create a full account SAS for this demo purpose:

Generating a SAS token

Copy and paste the SAS token to a temporary file, we’ll use that later on.

Next, we’ll create the Form Recognizer resource itself.

Creating the form recognizer resource itself

Once this is created – which took less than a minute – we’ll need to grab the endpoint and the key to access it.

Getting the endpoint and the key.

With that deployed, we can go ahead and deploy the labeling tool itself.

Deploying the labeling tool

We’ll deploy and run the labeling tool locally. In my case, I’ll run this on my Ubuntu system running on WSL on Windows.

docker pull mcr.microsoft.com/azure-cognitive-services/custom-form/labeltool
docker run -it -p 3000:80 mcr.microsoft.com/azure-cognitive-services/custom-form/labeltool eula=accept

We can now connect to the labeling tool at localhost:3000.

First thing, we’ll need to pair up our storage account via the SAS URL. To do this, hit the connection icon on the left panel, and fill in the storage account details. Make sure to include the container name in the SAS URL.

Create the storage account connection

Next, we can create a project to start labeling.

Create a new project

Provide all the necessary details to connect your blob storage account and connect it to the forms endpoint we created earlier. Finally, hit save project.

Provide all the details to create the project.

Immediately after saving the project, you’ll see the form analyzer tool starts loading the forms you submitted and runs OCR on the first one of them. Click on the green button to run OCR on all files, and then we can start manually labeling the forms in the next step.

Running OCR on the first file. Make sure to run OCR on all files, to avoid waiting in the next step.

Labeling the forms

Now we can go ahead and label our forms. The first we’ll do here is create a set of tags about the information that is contained in the form:

Creating a set of tags.

And next, we can link the item on the screen to the tag. To do this, you click the element (or multiple in case of the address) and then click the tag itself.

To link an element on the page to a tag, click the element first and then click the tag.

Do this for all tags on all the training invoices.

Label everything in each file.

Once this is done, you can go ahead and train your custom model. To do this, hit the symbol of the neural net and hit the Train button.

After training, you see our model as a 95% average accuracy.

And now, we can start analyzing new invoices. To do this in the tool, you the lightbulb icon and upload a file to the tool:

Running the prediction on an actual invoice.

This will generate a dataset, containing the returned tags:

The returned tags from our invoice.

Calling Form Recognizer from Python

Form Recognizer is an API, which can be called from a multitude of tools. To show the raw return, I also wanted to test the experience in Python. To run Python, I’ll use a Jupyter notebook on Azure, which are available for free.

There, I created a new project, uploaded an invoice, and created a new Jupyter notebook:

Uploading the invoice and creating a new Python notebook.

The Python code required to run and get a return comes from the Microsoft quickstart. That code has two parts: one part is the part that uploads your invoice to the API, and a second one that polls for the results to be complete. Let’s start with the first part of the code:

########### Python Form Recognizer Async Analyze #############
import json
import time
from requests import get, post

# Endpoint URL
endpoint = r"https://nf-form.cognitiveservices.azure.com/"
apim_key = "baXXX"
model_id = "e195db40-baf7-4573-8224-fc6d1277e719"
post_url = endpoint + "/formrecognizer/v2.0-preview/custom/models/%s/analyze" % model_id
source = r"Invoice_7.pdf"
params = {
    "includeTextDetails": True
}

headers = {
    # Request headers
    'Content-Type': 'application/pdf',
    'Ocp-Apim-Subscription-Key': apim_key,
}
with open(source, "rb") as f:
    data_bytes = f.read()

try:
    resp = post(url = post_url, data = data_bytes, headers = headers, params = params)
    if resp.status_code != 202:
        print("POST analyze failed:\n%s" % json.dumps(resp.json()))
        quit()
    print("POST analyze succeeded:\n%s" % resp.headers)
    get_url = resp.headers["operation-location"]
except Exception as e:
    print("POST analyze failed:\n%s" % str(e))
    quit()

One thing that is interesting here, is you need your model_id. To get your model ID, go back to the web tool and go to the training view. That contains your model ID.

Getting your model ID.

Once you executed this step, you can execute the second step. This will poll the API endpoint and get the results when they are ready.

n_tries = 15
n_try = 0
wait_sec = 5
max_wait_sec = 60
while n_try < n_tries:
    try:
        resp = get(url = get_url, headers = {"Ocp-Apim-Subscription-Key": apim_key})
        resp_json = resp.json()
        if resp.status_code != 200:
            print("GET analyze results failed:\n%s" % json.dumps(resp_json))
            quit()
        status = resp_json["status"]
        if status == "succeeded":
            print("Analysis succeeded:\n%s" % json.dumps(resp_json))
            quit()
            n_try = n_tries
        if status == "failed":
            print("Analysis failed:\n%s" % json.dumps(resp_json))
            quit()
        # Analysis still running. Wait and retry.
        time.sleep(wait_sec)
        n_try += 1
        wait_sec = min(2*wait_sec, max_wait_sec)     
    except Exception as e:
        msg = "GET analyze results failed:\n%s" % str(e)
        print(msg)
        quit()
print("Analyze operation did not complete within the allocated time.")

This returns a big JSON object. I’ll include it for reference in the bottom of the post, and I’ll show you a snippet here already.

"documentResults": [
    {
        "docType": "custom:form",
        "pageRange": [
            1,
            1
        ],
        "fields": {
            "Customer Address":  {
                "type": "string",
                 "valueString": "The Phone Company 5506 Main St Redmond, WA 73493", 
                "text": "The Phone Company 5506 Main St Redmond, WA 73493",
                "page": 1,
                "boundingBox": [
                    5.195,
                    1.51,
                    6.57,
                    1.51,
                    6.57,
                    2.0300000000000002,
                    5.195,
                    2.0300000000000002
                ],
                "confidence": 1.0,
                "elements": [
                    "#/analyzeResult/readResults/0/lines/2/words/2",
                    "#/analyzeResult/readResults/0/lines/2/words/3",
                    "#/analyzeResult/readResults/0/lines/2/words/4",
                    "#/analyzeResult/readResults/0/lines/4/words/0",
                    "#/analyzeResult/readResults/0/lines/4/words/1",
                    "#/analyzeResult/readResults/0/lines/4/words/2",
                    "#/analyzeResult/readResults/0/lines/6/words/0",
                    "#/analyzeResult/readResults/0/lines/6/words/1",
                    "#/analyzeResult/readResults/0/lines/6/words/2"
                ]
            },

As you can see here, this returns the elements of our custom form. This includes for instance the Customer Address we tagged, followed by all other elements. The full JSON document contains a full analysis of the document. It identifies all text boxes and all OCR results.

Summary

This was a quick overview of the form recognizer’s ability to recognize custom forms. We looked into tagging custom forms, and training a custom model to recognize data in these forms. We then used both the web tool and Python to process form information.

Full JSON object returned by Form Recognizer

{
    "status": "succeeded",
    "createdDateTime": "2020-05-04T20:25:55Z",
    "lastUpdatedDateTime": "2020-05-04T20:26:06Z",
    "analyzeResult": {
        "version": "2.0.0",
        "readResults": [{
            "page": 1,
            "language": "en",
            "angle": 0,
            "width": 8.5,
            "height": 11,
            "unit": "inch",
            "lines": [{
                "boundingBox": [0.5492, 1.1349, 2.6403, 1.1349, 2.6403, 1.4069, 0.5492, 1.4069],
                "text": "Margie's Travel",
                "words": [{
                    "boundingBox": [0.5492, 1.1349, 1.7043, 1.1349, 1.7043, 1.4069, 0.5492, 1.4069],
                    "text": "Margie's",
                    "confidence": 1
                }, {
                    "boundingBox": [1.7903, 1.1349, 2.6403, 1.1349, 2.6403, 1.3534, 1.7903, 1.3534],
                    "text": "Travel",
                    "confidence": 1
                }]
            }, {
                "boundingBox": [0.7984, 1.515, 1.3826, 1.515, 1.3826, 1.6161, 0.7984, 1.6161],
                "text": "Address:",
                "words": [{
                    "boundingBox": [0.7984, 1.515, 1.3826, 1.515, 1.3826, 1.6161, 0.7984, 1.6161],
                    "text": "Address:",
                    "confidence": 1
                }]
            }, {
                "boundingBox": [4.4033, 1.5114, 6.5682, 1.5114, 6.5682, 1.6425, 4.4033, 1.6425],
                "text": "Invoice For: The Phone Company",
                "words": [{
                    "boundingBox": [4.4033, 1.5143, 4.8234, 1.5143, 4.8234, 1.6155, 4.4033, 1.6155],
                    "text": "Invoice",
                    "confidence": 1
                }, {
                    "boundingBox": [4.8793, 1.5143, 5.1013, 1.5143, 5.1013, 1.6154, 4.8793, 1.6154],
                    "text": "For:",
                    "confidence": 1
                }, {
                    "boundingBox": [5.1974, 1.513, 5.4354, 1.513, 5.4354, 1.6151, 5.1974, 1.6151],
                    "text": "The",
                    "confidence": 1
                }, {
                    "boundingBox": [5.489, 1.513, 5.8966, 1.513, 5.8966, 1.6151, 5.489, 1.6151],
                    "text": "Phone",
                    "confidence": 1
                }, {
                    "boundingBox": [5.9466, 1.5114, 6.5682, 1.5114, 6.5682, 1.6425, 5.9466, 1.6425],
                    "text": "Company",
                    "confidence": 1
                }]
            }, {
                "boundingBox": [0.8107, 1.7037, 2.0158, 1.7037, 2.0158, 1.8076, 0.8107, 1.8076],
                "text": "134 El Camino Real",
                "words": [{
                    "boundingBox": [0.8107, 1.705, 1.0195, 1.705, 1.0195, 1.8076, 0.8107, 1.8076],
                    "text": "134",
                    "confidence": 1
                }, {
                    "boundingBox": [1.0755, 1.7054, 1.1779, 1.7054, 1.1779, 1.806, 1.0755, 1.806],
                    "text": "El",
                    "confidence": 1
                }, {
                    "boundingBox": [1.2329, 1.7037, 1.6975, 1.7037, 1.6975, 1.8075, 1.2329, 1.8075],
                    "text": "Camino",
                    "confidence": 1
                }, {
                    "boundingBox": [1.752, 1.7054, 2.0158, 1.7054, 2.0158, 1.8075, 1.752, 1.8075],
                    "text": "Real",
                    "confidence": 1
                }]
            }, {
                "boundingBox": [5.1995, 1.7133, 6.0298, 1.7133, 6.0298, 1.8172, 5.1995, 1.8172],
                "text": "5506 Main St",
                "words": [{
                    "boundingBox": [5.1995, 1.7145, 5.4962, 1.7145, 5.4962, 1.817, 5.1995, 1.817],
                    "text": "5506",
                    "confidence": 1
                }, {
                    "boundingBox": [5.5494, 1.7149, 5.8453, 1.7149, 5.8453, 1.817, 5.5494, 1.817],
                    "text": "Main",
                    "confidence": 1
                }, {
                    "boundingBox": [5.8982, 1.7133, 6.0298, 1.7133, 6.0298, 1.8172, 5.8982, 1.8172],
                    "text": "St",
                    "confidence": 1
                }]
            }, {
                "boundingBox": [0.8062, 1.8967, 2.0399, 1.8967, 2.0399, 1.9993, 0.8062, 1.9993],
                "text": "New York NY 46233",
                "words": [{
                    "boundingBox": [0.8062, 1.8971, 1.0712, 1.8971, 1.0712, 1.9992, 0.8062, 1.9992],
                    "text": "New",
                    "confidence": 1
                }, {
                    "boundingBox": [1.1112, 1.8971, 1.3946, 1.8971, 1.3946, 1.9992, 1.1112, 1.9992],
                    "text": "York",
                    "confidence": 1
                }, {
                    "boundingBox": [1.442, 1.8971, 1.6226, 1.8971, 1.6226, 1.9976, 1.442, 1.9976],
                    "text": "NY",
                    "confidence": 1
                }, {
                    "boundingBox": [1.6633, 1.8967, 2.0399, 1.8967, 2.0399, 1.9993, 1.6633, 1.9993],
                    "text": "46233",
                    "confidence": 1
                }]
            }, {
                "boundingBox": [5.2018, 1.9045, 6.554, 1.9045, 6.554, 2.0275, 5.2018, 2.0275],
                "text": "Redmond, WA 73493",
                "words": [{
                    "boundingBox": [5.2018, 1.9049, 5.8581, 1.9049, 5.8581, 2.0275, 5.2018, 2.0275],
                    "text": "Redmond,",
                    "confidence": 1
                }, {
                    "boundingBox": [5.9069, 1.9049, 6.1364, 1.9049, 6.1364, 2.0055, 5.9069, 2.0055],
                    "text": "WA",
                    "confidence": 1
                }, {
                    "boundingBox": [6.1812, 1.9045, 6.554, 1.9045, 6.554, 2.0072, 6.1812, 2.0072],
                    "text": "73493",
                    "confidence": 1
                }]
            }, {
                "boundingBox": [0.5439, 2.8733, 1.5729, 2.8733, 1.5729, 2.9754, 0.5439, 2.9754],
                "text": "Invoice Number",
                "words": [{
                    "boundingBox": [0.5439, 2.8733, 1.0098, 2.8733, 1.0098, 2.9754, 0.5439, 2.9754],
                    "text": "Invoice",
                    "confidence": 1
                }, {
                    "boundingBox": [1.0611, 2.8743, 1.5729, 2.8743, 1.5729, 2.9754, 1.0611, 2.9754],
                    "text": "Number",
                    "confidence": 1
                }]
            }, {
                "boundingBox": [1.9491, 2.8733, 2.7527, 2.8733, 2.7527, 2.9754, 1.9491, 2.9754],
                "text": "Invoice Date",
                "words": [{
                    "boundingBox": [1.9491, 2.8733, 2.415, 2.8733, 2.415, 2.9754, 1.9491, 2.9754],
                    "text": "Invoice",
                    "confidence": 1
                }, {
                    "boundingBox": [2.4673, 2.8743, 2.7527, 2.8743, 2.7527, 2.9754, 2.4673, 2.9754],
                    "text": "Date",
                    "confidence": 1
                }]
            }, {
                "boundingBox": [3.3495, 2.8733, 4.4547, 2.8733, 4.4547, 2.9754, 3.3495, 2.9754],
                "text": "Invoice Due Date",
                "words": [{
                    "boundingBox": [3.3495, 2.8733, 3.8155, 2.8733, 3.8155, 2.9754, 3.3495, 2.9754],
                    "text": "Invoice",
                    "confidence": 1
                }, {
                    "boundingBox": [3.8677, 2.8743, 4.1149, 2.8743, 4.1149, 2.9754, 3.8677, 2.9754],
                    "text": "Due",
                    "confidence": 1
                }, {
                    "boundingBox": [4.1678, 2.8743, 4.4547, 2.8743, 4.4547, 2.9754, 4.1678, 2.9754],
                    "text": "Date",
                    "confidence": 1
                }]
            }, {
                "boundingBox": [4.7468, 2.8717, 5.289, 2.8717, 5.289, 3.0035, 4.7468, 3.0035],
                "text": "Charges",
                "words": [{
                    "boundingBox": [4.7468, 2.8717, 5.289, 2.8717, 5.289, 3.0035, 4.7468, 3.0035],
                    "text": "Charges",
                    "confidence": 1
                }]
            }, {
                "boundingBox": [6.141, 2.873, 6.5875, 2.873, 6.5875, 2.9736, 6.141, 2.9736],
                "text": "VAT ID",
                "words": [{
                    "boundingBox": [6.141, 2.873, 6.4147, 2.873, 6.4147, 2.9736, 6.141, 2.9736],
                    "text": "VAT",
                    "confidence": 1
                }, {
                    "boundingBox": [6.4655, 2.873, 6.5875, 2.873, 6.5875, 2.9736, 6.4655, 2.9736],
                    "text": "ID",
                    "confidence": 1
                }]
            }, {
                "boundingBox": [0.535, 3.4097, 1.1504, 3.4097, 1.1504, 3.5136, 0.535, 3.5136],
                "text": "AC-32322",
                "words": [{
                    "boundingBox": [0.535, 3.4097, 1.1504, 3.4097, 1.1504, 3.5136, 0.535, 3.5136],
                    "text": "AC-32322",
                    "confidence": 1
                }]
            }, {
                "boundingBox": [1.9461, 3.411, 2.8569, 3.411, 2.8569, 3.5136, 1.9461, 3.5136],
                "text": "03 March 2018",
                "words": [{
                    "boundingBox": [1.9461, 3.411, 2.0879, 3.411, 2.0879, 3.5136, 1.9461, 3.5136],
                    "text": "03",
                    "confidence": 1
                }, {
                    "boundingBox": [2.1428, 3.4114, 2.5074, 3.4114, 2.5074, 3.5135, 2.1428, 3.5135],
                    "text": "March",
                    "confidence": 1
                }, {
                    "boundingBox": [2.5593, 3.411, 2.8569, 3.411, 2.8569, 3.5135, 2.5593, 3.5135],
                    "text": "2018",
                    "confidence": 1
                }]
            }, {
                "boundingBox": [3.3465, 3.411, 4.119, 3.411, 4.119, 3.5135, 3.3465, 3.5135],
                "text": "06 Nov 2019",
                "words": [{
                    "boundingBox": [3.3465, 3.411, 3.4882, 3.411, 3.4882, 3.5135, 3.3465, 3.5135],
                    "text": "06",
                    "confidence": 1
                }, {
                    "boundingBox": [3.5435, 3.4114, 3.7773, 3.4114, 3.7773, 3.5135, 3.5435, 3.5135],
                    "text": "Nov",
                    "confidence": 1
                }, {
                    "boundingBox": [3.8214, 3.411, 4.119, 3.411, 4.119, 3.5135, 3.8214, 3.5135],
                    "text": "2019",
                    "confidence": 1
                }]
            }, {
                "boundingBox": [5.2909, 3.4114, 6.0483, 3.4114, 6.0483, 3.5381, 5.2909, 3.5381],
                "text": "$110,153.22",
                "words": [{
                    "boundingBox": [5.2909, 3.4114, 6.0483, 3.4114, 6.0483, 3.5381, 5.2909, 3.5381],
                    "text": "$110,153.22",
                    "confidence": 1
                }]
            }, {
                "boundingBox": [6.2288, 3.4114, 6.3995, 3.4114, 6.3995, 3.5119, 6.2288, 3.5119],
                "text": "RT",
                "words": [{
                    "boundingBox": [6.2288, 3.4114, 6.3995, 3.4114, 6.3995, 3.5119, 6.2288, 3.5119],
                    "text": "RT",
                    "confidence": 1
                }]
            }, {
                "boundingBox": [6.2429, 9.667, 6.5489, 9.667, 6.5489, 9.7966, 6.2429, 9.7966],
                "text": "Page",
                "words": [{
                    "boundingBox": [6.2429, 9.667, 6.5489, 9.667, 6.5489, 9.7966, 6.2429, 9.7966],
                    "text": "Page",
                    "confidence": 1
                }]
            }, {
                "boundingBox": [6.8409, 9.6656, 7.0593, 9.6656, 7.0593, 9.7681, 6.8409, 9.7681],
                "text": "1 of",
                "words": [{
                    "boundingBox": [6.8409, 9.6681, 6.8837, 9.6681, 6.8837, 9.7663, 6.8409, 9.7663],
                    "text": "1",
                    "confidence": 1
                }, {
                    "boundingBox": [6.9512, 9.6656, 7.0593, 9.6656, 7.0593, 9.7681, 6.9512, 9.7681],
                    "text": "of",
                    "confidence": 1
                }]
            }, {
                "boundingBox": [7.4076, 9.6681, 7.4503, 9.6681, 7.4503, 9.7663, 7.4076, 9.7663],
                "text": "1",
                "words": [{
                    "boundingBox": [7.4076, 9.6681, 7.4503, 9.6681, 7.4503, 9.7663, 7.4076, 9.7663],
                    "text": "1",
                    "confidence": 1
                }]
            }]
        }],
        "pageResults": [{
            "page": 1,
            "tables": [{
                "rows": 2,
                "columns": 6,
                "cells": [{
                    "rowIndex": 0,
                    "columnIndex": 0,
                    "text": "Invoice Number",
                    "boundingBox": [0.5075, 2.8088, 1.9061, 2.8088, 1.9061, 3.3219, 0.5075, 3.3219],
                    "elements": ["#/readResults/0/lines/7/words/0", "#/readResults/0/lines/7/words/1"]
                }, {
                    "rowIndex": 0,
                    "columnIndex": 1,
                    "text": "Invoice Date",
                    "boundingBox": [1.9061, 2.8088, 3.3074, 2.8088, 3.3074, 3.3219, 1.9061, 3.3219],
                    "elements": ["#/readResults/0/lines/8/words/0", "#/readResults/0/lines/8/words/1"]
                }, {
                    "rowIndex": 0,
                    "columnIndex": 2,
                    "text": "Invoice Due Date",
                    "boundingBox": [3.3074, 2.8088, 4.7074, 2.8088, 4.7074, 3.3219, 3.3074, 3.3219],
                    "elements": ["#/readResults/0/lines/9/words/0", "#/readResults/0/lines/9/words/1", "#/readResults/0/lines/9/words/2"]
                }, {
                    "rowIndex": 0,
                    "columnIndex": 3,
                    "text": "Charges",
                    "boundingBox": [4.7074, 2.8088, 5.386, 2.8088, 5.386, 3.3219, 4.7074, 3.3219],
                    "elements": ["#/readResults/0/lines/10/words/0"]
                }, {
                    "rowIndex": 0,
                    "columnIndex": 5,
                    "text": "VAT ID",
                    "boundingBox": [6.1051, 2.8088, 7.5038, 2.8088, 7.5038, 3.3219, 6.1051, 3.3219],
                    "elements": ["#/readResults/0/lines/11/words/0", "#/readResults/0/lines/11/words/1"]
                }, {
                    "rowIndex": 1,
                    "columnIndex": 0,
                    "text": "AC-32322",
                    "boundingBox": [0.5075, 3.3219, 1.9061, 3.3219, 1.9061, 3.859, 0.5075, 3.859],
                    "elements": ["#/readResults/0/lines/12/words/0"]
                }, {
                    "rowIndex": 1,
                    "columnIndex": 1,
                    "text": "03 March 2018",
                    "boundingBox": [1.9061, 3.3219, 3.3074, 3.3219, 3.3074, 3.859, 1.9061, 3.859],
                    "elements": ["#/readResults/0/lines/13/words/0", "#/readResults/0/lines/13/words/1", "#/readResults/0/lines/13/words/2"]
                }, {
                    "rowIndex": 1,
                    "columnIndex": 2,
                    "text": "06 Nov 2019",
                    "boundingBox": [3.3074, 3.3219, 4.7074, 3.3219, 4.7074, 3.859, 3.3074, 3.859],
                    "elements": ["#/readResults/0/lines/14/words/0", "#/readResults/0/lines/14/words/1", "#/readResults/0/lines/14/words/2"]
                }, {
                    "rowIndex": 1,
                    "columnIndex": 3,
                    "columnSpan": 2,
                    "text": "$110,153.22",
                    "boundingBox": [4.7074, 3.3219, 6.1051, 3.3219, 6.1051, 3.859, 4.7074, 3.859],
                    "elements": ["#/readResults/0/lines/15/words/0"]
                }, {
                    "rowIndex": 1,
                    "columnIndex": 5,
                    "text": "RT",
                    "boundingBox": [6.1051, 3.3219, 7.5038, 3.3219, 7.5038, 3.859, 6.1051, 3.859],
                    "elements": ["#/readResults/0/lines/16/words/0"]
                }]
            }]
        }],
        "documentResults": [{
            "docType": "custom:form",
            "pageRange": [1, 1],
            "fields": {
                "Customer Address": {
                    "type": "string",
                    "valueString": "The Phone Company 5506 Main St Redmond, WA 73493",
                    "text": "The Phone Company 5506 Main St Redmond, WA 73493",
                    "page": 1,
                    "boundingBox": [5.195, 1.51, 6.57, 1.51, 6.57, 2.0300000000000002, 5.195, 2.0300000000000002],
                    "confidence": 1.0,
                    "elements": ["#/analyzeResult/readResults/0/lines/2/words/2", "#/analyzeResult/readResults/0/lines/2/words/3", "#/analyzeResult/readResults/0/lines/2/words/4", "#/analyzeResult/readResults/0/lines/4/words/0", "#/analyzeResult/readResults/0/lines/4/words/1", "#/analyzeResult/readResults/0/lines/4/words/2", "#/analyzeResult/readResults/0/lines/6/words/0", "#/analyzeResult/readResults/0/lines/6/words/1", "#/analyzeResult/readResults/0/lines/6/words/2"]
                },
                "Invoice Date": {
                    "type": "date",
                    "text": "03 March 2018",
                    "page": 1,
                    "boundingBox": [1.945, 3.41, 2.855, 3.41, 2.855, 3.515, 1.945, 3.515],
                    "confidence": 0.88,
                    "elements": ["#/analyzeResult/readResults/0/lines/13/words/0", "#/analyzeResult/readResults/0/lines/13/words/1", "#/analyzeResult/readResults/0/lines/13/words/2"]
                },
                "Invoice Number": {
                    "type": "string",
                    "valueString": "AC-32322",
                    "text": "AC-32322",
                    "page": 1,
                    "boundingBox": [0.535, 3.41, 1.1500000000000001, 3.41, 1.1500000000000001, 3.515, 0.535, 3.515],
                    "confidence": 0.99,
                    "elements": ["#/analyzeResult/readResults/0/lines/12/words/0"]
                },
                "Invoice Due Date": {
                    "type": "date",
                    "text": "06 Nov 2019",
                    "page": 1,
                    "boundingBox": [3.345, 3.41, 4.12, 3.41, 4.12, 3.515, 3.345, 3.515],
                    "confidence": 0.99,
                    "elements": ["#/analyzeResult/readResults/0/lines/14/words/0", "#/analyzeResult/readResults/0/lines/14/words/1", "#/analyzeResult/readResults/0/lines/14/words/2"]
                },
                "VAT ID": {
                    "type": "string",
                    "valueString": "RT",
                    "text": "RT",
                    "page": 1,
                    "boundingBox": [6.23, 3.41, 6.4, 3.41, 6.4, 3.5100000000000002, 6.23, 3.5100000000000002],
                    "confidence": 1.0,
                    "elements": ["#/analyzeResult/readResults/0/lines/16/words/0"]
                },
                "Charges": {
                    "type": "string",
                    "valueString": "$110,153.22",
                    "text": "$110,153.22",
                    "page": 1,
                    "boundingBox": [5.29, 3.41, 6.05, 3.41, 6.05, 3.54, 5.29, 3.54],
                    "confidence": 1.0,
                    "elements": ["#/analyzeResult/readResults/0/lines/15/words/0"]
                },
                "Company Address": {
                    "type": "string",
                    "valueString": "134 El Camino Real New York NY 46233",
                    "text": "134 El Camino Real New York NY 46233",
                    "page": 1,
                    "boundingBox": [0.805, 1.705, 2.04, 1.705, 2.04, 2.0, 0.805, 2.0],
                    "confidence": 1.0,
                    "elements": ["#/analyzeResult/readResults/0/lines/3/words/0", "#/analyzeResult/readResults/0/lines/3/words/1", "#/analyzeResult/readResults/0/lines/3/words/2", "#/analyzeResult/readResults/0/lines/3/words/3", "#/analyzeResult/readResults/0/lines/5/words/0", "#/analyzeResult/readResults/0/lines/5/words/1", "#/analyzeResult/readResults/0/lines/5/words/2", "#/analyzeResult/readResults/0/lines/5/words/3"]
                },
                "Company Name": {
                    "type": "string",
                    "valueString": "Margie's Travel",
                    "text": "Margie's Travel",
                    "page": 1,
                    "boundingBox": [0.55, 1.135, 2.64, 1.135, 2.64, 1.405, 0.55, 1.405],
                    "confidence": 0.94,
                    "elements": ["#/analyzeResult/readResults/0/lines/0/words/0", "#/analyzeResult/readResults/0/lines/0/words/1"]
                }
            }
        }],
        "errors": []
    }
}

Leave a Reply