From Scanned Loan Applications to a Clean HMDA Filing in Hours, Not Weeks

How community bank compliance teams are replacing manual extraction and edit checks with an AI agent that reads URLAs, validates against Regulation C, and delivers an exception report ready for remediation.

Thirty-Two Fields, Six Pages, and a March Deadline

A HMDA Compliance Officer at a 300-loan community bank faces the same arithmetic every January. Two hundred to five hundred loan application records need to be extracted, coded, validated, and filed with the FFIEC by March 1. Each record requires pulling data from a Uniform Residential Loan Application (the URLA, Fannie Mae Form 1003) that spans nine or more pages of borrower information, income, loan terms, property details, demographics, and originator data. Not all of those pages matter for HMDA, but the ones that do are scattered: borrower identity on page 1, income on page 2, loan and property info on page 5, demographics on page 8, originator details on page 9, and the Closing Disclosure tacked on as page 10.

The compliance officer opens each scanned PDF and starts reading. Field by field. Page by page. Loan amount from Section 4. Interest rate from the Closing Disclosure. Applicant race from Section 8, where checkboxes may or may not be marked (and where "not marked" has a specific HMDA code that is different from "declined to provide," which is also different from "collected by visual observation"). The property value shows $450,00 on the URLA but $500,000 on the Closing Disclosure. Which one is authoritative? The county field is blank. The race field has no selection. Both the YES and NO checkboxes for visual observation of race appear marked.

That's one application. There are 299 more.

After extraction comes validation. Regulation C defines validity edits, cross-field edits, and quality edits. If the loan was originated, lien status can't be blank. If the applicant didn't self-report race, it must be coded as "information not provided" (code 6), not left null. If the loan-to-income ratio exceeds 5:1, it needs a justification note. Ten validity rules, nine cross-field dependencies, four quality thresholds. Every record, every field, every combination.

A two-person compliance team doing this manually spends the better part of February on data entry and cross-referencing. That's not an exaggeration. Each HMDA LAR contains 48 data points spread across 110 data fields. The CFPB's own estimates put the annual burden at anywhere from 161 to 9,000 hours depending on institution size. For a community bank sitting at 500 loans, you're looking at a compliance officer spending two to four weeks doing work that is, fundamentally, reading a document and filling in a spreadsheet.

The stakes aren't abstract. HMDA was the top compliance violation identified in Federal Reserve examinations in 2024, according to the Consumer Compliance Outlook. One large institution was fined $12 million by the CFPB for inaccurate demographic data collection alone. The resubmission threshold is 5.1% errors on your LAR. Miss it, and you're refiling.

Why the Obvious Fixes Don't Work for This

The natural instinct is to throw technology at the problem. And the natural candidates are the ones every operations leader has already tried or considered.

A rules-based integration platform can move data between systems, but it can't read a scanned URLA. The loan application isn't a structured database export. It's a nine-page PDF with handwritten entries, checkmarks that might be Xs or might be filled circles, and a property value that was keyed in as "$450,00" because someone's finger slipped. Connecting your email to a spreadsheet doesn't help when the source document requires visual interpretation of every page.

Copying the URLA text into a general-purpose chat assistant gets you halfway. It can probably extract the loan amount and interest rate. But it won't remember that HMDA field 19 requires race codes from a specific enumeration (1 through 7, plus subcategories 21 through 27 and 41 through 44). It won't cross-check the property value on page 5 against the appraised value on page 10. It won't know that a blank race field should be coded as 6, not left empty. And it certainly won't run 23 Regulation C edit checks against the extracted data and produce a prioritized exception list with severity levels. You'd have to prompt it for each check individually, across each loan, and verify the output yourself.

HMDA data extraction is a compliance process that sits at a frustrating intersection: it requires visual interpretation of inconsistent documents, regulatory knowledge encoded in specific rule sets, cross-document reconciliation, and deterministic validation that must be applied identically across hundreds of records. The gap between "can read a document" and "can produce a filing-ready LAR" is enormous.

This is what makes it structurally resistant to simple automation. The judgment calls (is that checkbox marked? what does "$450,00" actually mean? which document's property value is authoritative?) live alongside rigid rules (race code must be 1-7, lien status is required for originated loans, denial reasons are required for denied applications). You need both, applied consistently, at volume.

Insurance compliance teams face the same structural problem filing state regulatory reports, where claim data scattered across adjuster notes, medical records, and policy documents must be coded to specific state filing schemas. Healthcare quality teams extracting CMS measures from clinical documentation hit the same wall. The domain vocabulary changes. The extraction-plus-validation pattern doesn't.

The compliance burden isn't the complexity of any single loan application. It's the compound effect of 32 fields, 23 edit checks, and 500 records where every ambiguity must be resolved the same way, every time.

This is the problem lasa.ai built an AI agent to solve: reading source documents, extracting regulated fields against a defined playbook, validating against compliance rules, and delivering an exception report instead of a raw data dump.

See what this looks like for HMDA filing →

What Changes When an Agent Reads the Loan File

The shift isn't from manual to automated. It's from extraction to review.

Instead of opening each URLA and pulling fields by hand, the compliance officer points the AI agent at a batch of scanned loan applications and walks away. The agent works through each document using a field playbook that defines exactly which fields to extract from which pages: Universal Loan Identifier from page 1, gross monthly income from page 1, loan amount and property details from page 5, demographic checkboxes from page 8, originator NMLS ID from page 9, and loan terms from the Closing Disclosure on page 10.

This isn't a generic document reader guessing at what matters. It delivers agent-level outcomes (the filing gets done) with workflow-level reliability (every step is auditable, every rule is applied identically across every record). The playbook is a structured definition, six page-level extraction sets covering 32 HMDA-reportable fields, each with its data type (currency, date, single-select, multi-select), its HMDA field number, and where applicable, the code mapping. Loan purpose "Refinance" maps to HMDA code 31. Occupancy "Primary Residence" maps to code 1. Construction method "NO" (not manufactured) maps to code 1. The agent applies these mappings as it extracts, producing coded output, not raw text.

After extraction, the agent merges fields across pages deterministically. No judgment involved in the merge, just assembly. Then it runs cross-document validation: does the property value on page 5 match the appraised value on the Closing Disclosure? When the URLA shows a truncated "$450,00" and the Closing Disclosure explicitly states "$500,000," the agent flags the discrepancy and uses the Closing Disclosure value as authoritative. When the city is spelled "Mandeville" on the URLA and "Mandevile" on the Closing Disclosure, it retains the URLA spelling and notes the inconsistency.

Then comes the part that would take a human the longest: Regulation C edit checks. The agent runs every validity edit (is the loan type one of the four permissible codes?), every cross-field edit (if action taken is "denied," is at least one denial reason present?), and every quality edit (is the loan-to-income ratio under 5:1?). It assigns severity levels. A missing lien status on an originated loan is critical, because the FFIEC will reject the filing. A loan-to-income ratio of 1.93 is a quality check that passes. A blank race field is flagged as critical with a specific correction: assign code 6, "information not provided."

The output isn't a corrected file. It's an exception report.

What Lands on Your Desk by Morning

The report opens with a document summary: institution LEI, reporting year, pages processed, total fields extracted. For a single URLA, that's 6 pages and 32 fields. For a batch of 500, it's the same structure repeated, with aggregate counts.

Then the extracted fields, organized by page. Each field shows its ID, label, extracted value, HMDA field number, mapped code, confidence score, and notes. When the agent extracts an interest rate of 6.875% from the Closing Disclosure, it shows up with confidence 1.0 and HMDA field 9. When it encounters a race field with no checkbox selected, it shows null with a note: "No selection marked." When both YES and NO are checked for visual observation of race (which happens more often than you'd think with scanned documents), it flags the field at confidence 0.5 and recommends manual inspection.

The critical exceptions section is where you spend your time. Not extracting data, not running edit checks, but reviewing the five or six flags that actually need human judgment. V620: race is null, needs code 6. V628: sex is null, needs code 3. V665: lien status is missing for an originated loan. V670 and V672: the cross-check rules that enforce coding of missing demographic data. These are the decisions that require a compliance officer. Everything else was done for you.

Quality warnings tell you what passed but looked unusual. A loan-to-income ratio of 1.93 is within bounds. Applicant age 40 is within bounds. No action needed, but documented for your file.

Data gaps list what's missing. Property county is blank and needs geocoding or manual entry. The ambiguous visual observation checkbox needs someone to look at the physical document.

For a Mortgage Operations Manager at a credit union processing quarterly batches, the same pattern applies. The volume is different, the documents are the same, and the exception report is the same structure. What took two weeks of February now takes a morning of review.

Teams filing SAR narratives through the same compliance infrastructure often extend next to regulatory change monitoring, where new CFPB guidance or FFIEC updates are analyzed against current policies to flag affected procedures before exam season.

A clean exception report ready for review

What February Looks Like When the Agent Runs in January

The compliance officer's calendar used to have a two-week block labeled "HMDA extraction" followed by a week labeled "edit check review" followed by a week of remediation and resubmission. Four weeks of the year, at minimum, dedicated to data entry dressed up as compliance work.

Now the block is three days. Day one: the agent processes the batch. Day two: the compliance officer reviews the exception report, resolves the critical flags (most of which are demographic coding gaps with clear prescribed corrections), and geocodes missing county fields. Day three: clean data goes into the FFIEC HMDA Platform.

The rest of February is available for the work that actually requires a compliance officer's judgment. Fair lending analysis. Pattern review across the LAR for potential disparities. Preparing for the next exam. The kind of work that justified the title in the first place, not reading checkboxes off scanned PDFs.

Whether you're a HMDA Compliance Officer at a 300-loan community bank, a Mortgage Operations Manager at a credit union filing 2,000 records quarterly, or a compliance analyst at a mid-size lender juggling SAR narratives and vendor risk assessments alongside HMDA prep, the change is the same. The filing gets done. The exceptions get flagged. And your expertise goes where it matters, not into a spreadsheet.

lasa.ai builds AI agents for compliance processes where document extraction meets regulatory validation. HMDA filing is one pattern. SAR narrative drafting, vendor risk assessment, and regulatory change analysis are others.

If your team spends weeks on work that should take days:

See what an agent looks like for your process →

Frequently Asked Questions

How long does it take an AI agent to process a batch of HMDA loan applications?

A batch of 200-500 URLAs typically processes within hours rather than the two to four weeks required for manual extraction. The agent extracts 32 fields per application across six pages, runs all Regulation C edit checks, and delivers an exception report. Review and remediation of flagged items typically takes one to two days.

Can an AI agent handle scanned URLA PDFs with handwritten entries and unclear checkboxes?

Yes. The agent uses vision-based extraction guided by a field playbook that defines exactly which fields appear on which pages. It interprets handwritten entries, normalizes truncated values like "$450,00" to $450,000, and flags ambiguous checkboxes with confidence scores. Fields below a confidence threshold are routed to the compliance officer for manual review.

What happens when the URLA and Closing Disclosure show different values for the same field?

The agent runs cross-document validation automatically. When property value on the URLA differs from appraised value on the Closing Disclosure, it flags the discrepancy, notes both values, and uses the Closing Disclosure figure as authoritative per standard practice. City spelling differences and other inconsistencies are similarly flagged and resolved.

Does the AI agent replace the compliance officer's judgment on HMDA filing?

No. The agent handles extraction and validation, then produces an exception report with severity levels: critical exceptions that would cause FFIEC rejection, quality warnings for unusual values, and data gaps requiring manual resolution. The compliance officer reviews exceptions, makes final coding decisions on ambiguous demographic fields, and submits through the FFIEC HMDA Platform.

What Regulation C edit checks does the agent validate against?

The agent runs the full set: ten validity edits checking permissible field values, nine cross-field edits checking logical dependencies between fields, and four quality edits flagging unusual but permissible values. Each flagged item includes the edit code, current value, expected value, severity level, and a recommended correction.

See What This Looks Like for Your Process

Let's discuss how LasaAI can automate this for your team.

Book a Discovery Call Back to Banking Solutions