Advances in AI & OCR have made it possible to automate mortgage document classification with high accuracy. Let’s see what automated mortgage document classification means for your mortgage operations and how to implement it.
In this post, we’ll cover:
Mortgage Document Classification is the process of identifying mortgage document type and its boundaries in a single file.
Given a pdf file as input, Automated Mortgage Document Classifier will provide you with:
An example of document classification is the loan processor reviewing files, splitting them into individual documents and organising them in the LOS.
In this context, the loan processor acts as a human document classifier.
Automated document classification is the same process but done by software instead of humans.
👉 Side note: Document Indexing vs. Document Classification
Sometimes, these terms may be used interchangeably, but they differ in what they mean.
Classification is the process of identifying the type of each mortgage document and determining its boundaries within a single file. The goal is to understand what documents are in a file and where they are located.
Indexing is the process of organising these documents within a storage system for easy retrieval. This process typically includes splitting files into individual documents, appropriately renaming them, and storing them in the correct folders.
Humans and software systems process mortgage documents differently depending on the document type.
For example, when LO reviews bank statements, they look for one piece of information. When the same LO reviews the credit report, they look for another.
To process mortgage documents effectively, humans and software systems must:
But the problem is that it's common for a single PDF file to contain multiple documents.
For instance, a correspondent loan package can easily have 15+ different documents in a single 100+ page PDF file. What makes it difficult to effectively process.
AI Mortgage Document Classifier can provide information about documents available and their boundaries within unclassified files like correspondent loan packages.
What enables software systems to effectively process these documents and automate:
The role of Automated Document Classification in Mortgage Operations is to enable effective document processing by providing information about documents available and their boundaries within unclassified files.
Your document classification system must first receive the files to classify them.
Thus, the process begins with your system pulling files for classification from various sources.
Common sources include:
Once your system has files, run each file through the document classifier.
As an output, you should have the following for each file:
Sometimes, ML can't accurately classify documents and identify boundaries.
In this case, we need to loop in humans to review the classification and correct if it is wrong.
Usually, AI document processing products offer out-of-the-box Human-In-The-Loop (HITL) interfaces to handle this workflow.
After review, you have accurate data about the documents in each file and where they’re located.
Most downstream integrations consume single-file documents, but your documents are still in the original files at this step.
So, the next step is to split the original files into documents.
Side note: A more accurate name for files containing a single document would be single-document files. But for the sake of simplicity, I refer to them as documents.
Once you have a list of single-document files, you can feed this data into other systems to automate your mortgage operations.
Here are some common destinations & automation:
After document classification, downstream integrations have the data to process each document effectively.
Some systems (e.g. Data Extraction, Fraud Detection) will process each document separately, while others (Indexing, Underwriting) will handle them in bulk.
Below, you can find how to approach building your automated document classification workflow outlined above.
Start by defining where you need classified documents and why.
Then, make a list of the document types you need to classify.
Once you have a list, determine where the unclassified files will come from.
You should have:
The next step is to get a model that will be able to classify the document types you defined in the previous step.
To get this model, you have 2 options:
You can find more details about the differences between these options in the section below.
By the end of the step, you should have an ML model that can classify document types you have.
Once you have a model, the next step is implementing the document classification workflow.
By the end of this step, you should have an end-to-end document classification workflow, from getting raw files to pushing classified documents into downstream integrations.
The last step is to fine-tune and up-train your models to improve accuracy.
That's especially true for classifying mortgage documents, as fewer providers have pre-trained models for the mortgage industry.
So unless you find a provider that already has pre-trained models for every document type you need to support, there will be a period where you'll need to invest more time into up-training.
The process will involve reviewing and correcting document classifications that have low accuracy.
You can use your workforce or self-managed labelling services from providers like (Ocrlous, Super.AI.)
By the end of this step, you should have a document classification system that processes most of the files with high accuracy. And only in rare cases does human involvement need to correct documents that have low confidence.
Quite a few AI document-classification products & tools are available on the market.
Their main difference is the degree to which they work for mortgage documents out of the box.
And it comes down to how many steps of the 5-step process they cover:
Some of the solutions cover all 5 steps. Other solutions cover none.
The less customisation you need, the higher the cost per document you can expect.
The more you invest to get it working, the less cost per document is.
Here, you can find a list of providers that you can use to automate mortgage document classification. That’s not an exhaustive list of the providers; these are the ones that, in my opinion, are the most relevant for mortgage document data extraction.
💡 Low-level solutions are the ones that need the most engineering involvement to make them work for mortgage documents. But they tend to have the lowest cost per document.
💡 Mid-level solutions are usually built on top of one or multiple low-level solutions and remove some complexity in implementation. Most come with pre-trained models relative to the mortgage industry and have up/down-stream integrations with popular mortgage software.
💡 Specialised solutions are usually built on top of mid-level solutions. They take them further by providing out-of-the-box automation using the data they extract.
I hope this post helped you get an insight into how to use OCR & AI to automate mortgage document classification.
If you’d like to stay on top of the latest mortgage tech and how it can be applied to mortgage operations, consider joining our mortgage technology newsletter.