AI Solution Brief· GSAI · 2026 · 04 · 0017Delivered / Production

Turn a handwritten order intostructured dataready for accounting in 3 seconds

This is one entry from Boilingwater Technology's AI solution library. For handwritten order recognition, we benchmarked general OCR APIs, multimodal large models, and a domain-specific OCR model, then delivered a hybrid production pipeline. On messy handwriting, folded paper, cross-line corrections, and overlapping fields, field-level accuracy improved from 68.4% to 96.1%.

View sample pipelineExplore the technical approach8 min read · v2.1 · updated May 10, 2026
Field-level accuracy
96.1%
1,200 complex handwritten samples
Average processing time
1.8s
End to end · P95 ≤ 3.0s
Manual review workload
-82%
Before vs. after launch
Production runtime
11 months
6 order types / 3 customers
/ 01The Problem

The real business problem

The customer is a fast-moving consumer goods distributor serving hundreds of retail outlets. Orders arrive as handwritten paper forms and must be entered into ERP for fulfillment and reconciliation. The actual paper conditions are far messier than a clean demo sample.

P-01PAIN POINT

Highly variable handwriting

Different sales reps write with different styles, pressure, angles, cross-line notes, corrections, signatures, and overwritten fields. Traditional OCR often fails at the full-row level.

P-02PAIN POINT

Ambiguous field meaning

Quantity, unit price, product name, and remarks are often mixed together. Product shorthand must be normalized against business vocabulary and SKU context.

P-03PAIN POINT

Manual entry became the bottleneck

The team had to process 1,200–1,800 orders per day within a narrow window. Two operators were working overtime and still produced avoidable entry errors.

P-04PAIN POINT

Small errors created financial risk

A wrong quantity, price, or customer name affects fulfillment, reconciliation, and month-end settlement. The impact is not inconvenience; it is real financial exposure.

/ 02Live Sample

One order through the full pipeline

The sample below shows a desensitized handwritten order from a real workflow. We place the original image, visual recognition overlay, and final structured JSON side by side so stakeholders can see what the AI actually does.

INPUTOriginal handwritten order
jpg · 1408×1056
Original handwritten order
OUTPUTRecognition output
Recognition output
Folded paper100%

Creases shift field positions, but all 12 rows are correctly aligned.

Cross-line correction97%

Crossed-out prices and handwritten replacements are interpreted with an audit trail.

Overlapping fields94%

Remarks overlapping the price column are reassigned through structured post-processing.

/ 03How We Think

We did not bet on one model. We combined the right tools.

For production AI, we do not choose a model first. We benchmark practical options against real samples, then design a hybrid pipeline where each model handles the part it is best suited for.

A
Path

General cloud OCR APIs

Tencent Cloud / Alibaba Cloud / Baidu AI Cloud
  • Fast to integrate
  • Excellent on printed text and regular tables
  • Usage-based cost works for early testing
  • Weak on casual handwriting and connected strokes
  • Cannot understand customer-specific product shorthand
  • Field structure must still be rebuilt by the business layer
Printed accuracy
98%+
This scenario
68.4%
Structuring
Custom build
Our call · Useful as a baseline and fallback channel
B
Path

Multimodal LLM reading

GPT-4o / Qwen-VL / Tongyi Qianwen VL
  • Strong contextual understanding for corrections and messy forms
  • Can output structured JSON directly
  • Good semantic generalization across product vocabulary
  • Higher cost and latency per order
  • Field-level numeric accuracy can fluctuate
  • Private deployment and compliance require additional design
This scenario
88.7%
Latency
4.2s
Cost / 1k pages
≈ ¥48
Our call · Best for difficult samples and semantic correction
C
Path

Domain-specific OCR model

Boilingwater · GS-OCR-Hand v2
  • Fine-tuned on real customer samples
  • Two-stage detection and recognition is controllable and explainable
  • Lower inference cost and latency
  • Needs ongoing data-loop maintenance
  • New form types still need migration samples
  • Does not fully solve cross-field semantic judgment alone
This scenario
93.2%
Latency
0.9s
Cost / 1k pages
≈ ¥6
Our call · Primary path for more than 90% of daily traffic
Final Decision

Hybrid by design

The delivered system uses a domain OCR primary path, a multimodal semantic correction path, and cloud OCR fallback. Most routine orders finish in under one second. Low-confidence fields are routed to the multimodal model, and poor-quality samples or service failures fall back automatically with manual-review flags.

This structure balances accuracy, cost, latency, and controllability. The engineering principle is simple: use software architecture to turn model uncertainty into business certainty.

/ 04How It Works

An explainable, degradable, and improvable AI pipeline

We treat AI as a pipeline, not a black box. Every step has a defined responsibility, input, output, and fallback strategy.

  1. STEP / 011

    Capture and layout normalization

    Images enter through mobile, scanners, or forms. Distortion, shadow, white balance, and layout are normalized before recognition.

    DocAlignDeskewShadow Removal
  2. STEP / 022

    Field detection

    A layout-aware detector identifies headers, rows, columns, and field roles before recognition.

    DBNet++Layout-awareRoI Routing
  3. STEP / 033

    Character recognition

    GS-OCR-Hand v2 is fine-tuned on real handwritten samples. Low-confidence fields are routed forward for semantic review.

    CRNN + AttentionDomain fine-tuneConfidence routing
  4. STEP / 044

    Multimodal semantic correction

    For corrections, cross-line notes, and context-heavy fields, a multimodal model reads against SKU dictionaries and historical order context.

    VLMRAG · SKU dictSelf-consistency
  5. STEP / 055

    Structuring and rule validation

    The output is normalized with unit conversion, price-range checks, customer matching, total checks, and audit logs.

    Rule engineEntity resolveAudit log

System architecture · Layered view

v2.1 · 2026.04
L1 · Access

Capture and access layer

Five business channels with idempotency, rate limits, and desensitization.

Channel
Mobile app
Channel
Scanner webhook
Channel
Feishu form sync
Gateway
API Gateway · JWT
L2 · Inference

AI inference layer

Three inference paths plus a confidence-aware router.

Primary
GS-OCR-Hand v2
Semantic
Qwen-VL · LLM
Fallback
Tencent OCR
Router
Confidence router
L3 · Business

Business orchestration layer

Connects OCR output to ERP, reconciliation, and review workflows.

Engine
Field normalization
Match
Customer / SKU matching
Audit
Total validation
Workflow
Manual review queue
L4 · Data Loop

Data-loop layer

Online corrections flow back into datasets so the model improves monthly.

Storage
Sample store
Label
Online labeling
Train
Monthly fine-tune
Monitor
Drift alerts
/ 05How We Deliver

Ten weeks to launch, then a continuous data loop

We deliver AI systems as accepted, measurable engineering projects. Before launch, the work is milestone-based; after launch, the data loop keeps improving the model.

  1. Week 0Milestone · 01

    Scenario diagnosis

    Walk through the real order flow with business and IT stakeholders.

    Process mapData and compliance checklistSuccess metrics
  2. Week 1–2Milestone · 02

    Data cold start

    Collect 4,600 real forms and build the first training and evaluation sets.

    Labeling guide1,200-sample evaluation setBaseline metrics
  3. Week 3–4Milestone · 03

    Parallel path evaluation

    Run cloud OCR, multimodal reading, and custom OCR on the same samples.

    Evaluation reportCost-latency-accuracy quadrant
  4. Week 5–7Milestone · 04

    Hybrid pipeline build

    Build confidence routing, semantic correction, fallback, and pressure tests.

    Model weightsInference service v1.0Load-test report
  5. Week 8–9Milestone · 05

    Gray release

    Run AI and human entry in parallel at one warehouse for reconciliation.

    Daily reconciliationReview interface v1
  6. Week 10Milestone · 06

    Full launch

    Roll out to six warehouses and sign off against KPI targets.

    RunbookRollback plan
  7. OngoingMilestone · 07

    Data loop and monthly iteration

    Online errors flow back into the sample store for incremental improvement.

    Monthly samplesModel review notes
/ 06The Result

Turning uncertain recognition into measurable business value

Instead of vague claims, we use same-sample before-and-after metrics and customer feedback to show whether the system solved the real problem.

Field-level accuracy
BEFORE68.4%
AFTER96.1%
+27.7pp
Average processing time
BEFORE≈ 90s manual
AFTER1.8s
-98%
Daily peak throughput
BEFORE≈ 1,800 pages
AFTER≈ 12,000 pages
×6.7
Reconciliation discrepancies
BEFORE0.7%
AFTER0.04%
-94%
End-to-end cost per page
BEFORE¥0.42 manual
AFTER¥0.06
-85%
Manual review workload
BEFORE2 full-time operators
AFTER0.4 FTE review only
-82%

Month-end reconciliation used to be our biggest headache. Since this OCR system went live, orders from six warehouses are basically scanned, structured in seconds, and written into ERP. More importantly, the models and data stay in our private cloud.

ITIT Director · FMCG distributor in South China
Sign-Off

Sign-off milestones

  • UAT passed2025.06.20
  • Full launch2025.06.27
  • First monthly review2025.07.31
  • Annual renewal2026.05.08
/ 07Where It Fits

The same pattern applies to a family of document problems

The hybrid OCR + multimodal correction + fallback pattern can be reused for forms, tickets, handwritten logs, and business documents where structure matters.

Restaurant supply chainReusable →

Handwritten hotel and restaurant order forms

Stock replenishment forms can go directly into inventory systems.

FMCG distributionReusable →

Paper return slips from retail outlets

Sales teams can scan slips into monthly ledgers.

ConstructionReusable →

Site logs and signed delivery notes

Robust handling for outdoor stains, folds, and handwriting.

Healthcare distributionReusable →

Clinic or pharmacy prescription and usage forms

Sensitive units and dosage fields can be checked against dictionaries.

LogisticsReusable →

Handwritten delivery and return notes

Signatures and remarks can be separated from operational fields.

Financial documentsReusable →

Checks, receipts, and reconciliation forms

Supports local deployment and end-to-end audit trails.

/ 08About Us

Why Boilingwater Technology

We are a software engineering and AI implementation team. Over the past three years, we have moved more than 20 AI scenarios from promising demo to stable production operation.

Boilingwater AI delivery team
Our Team
Make AI work in production
/ 01

Method

  • Scenario diagnosis → data cold start → path evaluation → hybrid pipeline → gray release → full launch → data loop
  • AI projects are managed as engineering projects with PM, milestones, and acceptance criteria.
  • No endless PoC demos; every system is delivered with measurable outcomes.
/ 02

Stack

  • Vision: DBNet++ / CRNN / TrOCR / GS-OCR-Hand
  • Multimodal: Qwen-VL / GPT-4o / Gemini Vision with flexible routing
  • Engineering: FastAPI / Triton / vLLM / Postgres / Redis / K8s for private deployment
/ 03

Engineering

  • High-throughput inference service tuning
  • Data-loop platform for labeling, evaluation, drift monitoring, and fine-tuning
  • Unified observability across business metrics, model metrics, and cost metrics
Let's Talk

If you have a concrete workflowAI has not solved yet, let's evaluate the right approach.

The first strategy call is free. We will unpack the workflow, judge whether AI is worth using, identify the right technical route, and provide a practical initial plan and estimate within five business days.

Business email
[email protected]
Office
10F, South Tower, Kingkey Yujing Times, Longgang District, Shenzhen

Please complete Cloudflare verification before submitting.

By submitting, you agree that we may use your information only for this consultation. We will not use it for unrelated marketing.