Skip to main content
The Document Extraction module reads a previously classified identity document and extracts data fields from it, such as name, date of birth, document number, and expiry date. The module allows you to configure which extracted fields are required, and the overall success of the extraction is determined by whether all required fields were successfully captured. This page contains documentation for the Document Extraction module, including its variants, capabilities, and the specific result values it returns.

Document Extraction V1

The Document Extraction V1 variant processes the document images that were previously classified by the Document Classification module and extracts all available data fields. You configure which fields are required for your use case for example, full name, date of birth, document number, expiry date. To learn more, refer to Configure module settings. The extraction result reflects whether all required fields were successfully extracted, not whether every possible field was captured. Extracted data is written to the journey context under context.subject, where it becomes available to downstream modules. For example, extracted name and date of birth data can be consumed by a Data Verification module to cross-reference against trusted databases, or an extracted portrait photograph can be consumed by a Facematch Verification module. This variant is typically placed after the Document Classification module in a journey, and often before or alongside the Document Authentication module.
This module has privacy auditing enabled. Extracted personal data is recorded in the journey audit log, which can be reviewed in the Investigation portal.

Capabilities

The module returns three capabilities: an overall extraction result, the document’s expiry status, and a calculated age derived from the extracted date of birth.

Document extraction result

This capability provides the overall outcome of the extraction process. The result is determined by whether all fields marked as required in the module configuration were successfully extracted from the document.
ValueDescription
Extraction SuccessfulAll required fields were successfully extracted from the document. The extracted data is available in the journey context for downstream modules to consume. Fields that were configured as optional may or may not have been extracted — this result only confirms that all required fields are present.
Extraction UnsuccessfulOne or more required fields could not be extracted. This may occur when the image quality is too poor for OCR to read certain regions, when a required field is not present on the document type (e.g. requesting a middle name from a document format that does not include one), or when the document’s layout does not match any known template closely enough for field-level extraction. This is the default value.

Document expiry status

This capability checks whether the identity document has expired by comparing the extracted expiry date against the current date. Many verification policies reject expired documents, since an expired document may no longer be considered valid proof of identity.
ValueDescription
ExpiredThe document’s expiry date has passed. The document is no longer valid according to its issuing authority’s stated validity period. Depending on your policy, this may trigger rejection or a request for the user to submit a current document.
Not ExpiredThe document’s expiry date has not yet passed. The document is still within its valid period.
UndeterminableThe document’s expiry status could not be determined. This occurs when the expiry date field was not successfully extracted from the document, either because the image quality was insufficient to read it, or because the document type does not include an expiry date (e.g. some national ID cards are issued without an expiration). This is the default value.

Calculated age

This capability returns the subject’s age in years, calculated from the extracted date of birth and the current date at the time of processing. This value is an integer rather than a result code.
DetailDescription
TypeInteger
Range0–130
Default0
If the date of birth was successfully extracted, the module calculates the subject’s current age in whole years. This value can be used in downstream evaluation rules, for example to enforce a minimum age threshold without requiring a separate Age Verification module. If the date of birth could not be extracted, the value defaults to 0. Evaluation rules that depend on this capability should account for the default value. A calculated age of 0 indicates a missing date of birth, not a subject aged zero.

Default outcomes

The module is pre-configured with the following default outcomes, which can be used in evaluation and routing logic within the journey designer.
OutcomeConditionDescription
Extraction SuccessfulExtraction result is Extraction SuccessfulAll required fields were successfully extracted. The journey can proceed to downstream modules that consume the extracted data, such as Data Verification, Document Authentication, or Facematch Verification.
Extraction UnsuccessfulExtraction result is Extraction UnsuccessfulOne or more required fields could not be extracted. The journey may route to a retry step (prompting the user to recapture the document), manual review, or rejection depending on the configured evaluation logic.
ERRORDefault (no conditions matched)An unexpected error occurred during processing.
This module does not require its own input payload. It runs off of the document images already captured by the Document Classification module earlier in the journey.