No description https://pages.gi7b.org/cgg/pdfexport/

HTML 100%

Find a file

Agent Kimi 12cf4d6d8e Auto-commit before migration to citfj		2026-05-04 14:10:23 +08:00
2306 Jan 2018 ENCS v4.pdf	Add PDF viewer app with sample BIR forms	2026-04-28 21:37:51 +08:00
2307 Jan 2018 ENCS v3.pdf	Add PDF viewer app with sample BIR forms	2026-04-28 21:37:51 +08:00
2316 Sep 2021 ENCS_Final_corrected.pdf	Add PDF viewer app with sample BIR forms	2026-04-28 21:37:51 +08:00
260429 pdfexport_architecture_spec.docx	Auto-commit before migration to citfj	2026-05-04 14:10:23 +08:00
FRD-detailed.md	Auto-commit before migration to citfj	2026-05-04 14:10:23 +08:00
FRD.md	Add PDF viewer app with sample BIR forms	2026-04-28 21:37:51 +08:00
index.html	Add PDF viewer app with sample BIR forms	2026-04-28 21:37:51 +08:00
README.md	Auto-commit before migration to citfj	2026-05-04 14:10:23 +08:00

README.md

BIR Form Automation System

Status: Phase 1 — PDF Field Mapping & Calibration (BIR Form 2307)
Live URL: https://pages.gi7b.org/cgg/pdfexport/
Target Platform: Forgejo Pages → Local Web App → ERPNext Philippines

The Problem

In the Philippines, businesses must file BIR forms like 2307 (Certificate of Creditable Tax Withheld at Source) every time they pay suppliers subject to Expanded Withholding Tax (EWT). ERPNext — a leading open-source ERP — lacks the precision to fill flat PDF government forms without massive custom development. Rather than bending ERPNext into submission, we are building a form-aware automation layer that can:

Accept any flat PDF form (starting with BIR 2307).
Use AI to detect, index, and label every field by its coordinates.
Store a machine-readable "form template" (dimensions, coordinates, fonts, rules).
Fill the form programmatically with correct fonts, alignment, spacing, and grouping.
Export a print-ready, layered PDF — background layer + fillable field layer.
Integrate with ERPNext (or any system) via API.

Team

Name	Role	Can Modify the App?
Clarise Duco	Product / Compliance Lead — BIR compliance, form requirements, field calibration	Yes — via Forgejo + Kimi/Claude
Journie Reyes	Engineering / Integration Lead — ERPNext, API, data mapping, architecture	Yes — via Forgejo + Kimi/Claude

How the Team Works (Collaborative Development Model)

This is a team-owned, team-evolved application. Because it is a static web app hosted on Forgejo Pages:

Open a browser — the app is always available at your Forgejo Pages URL
Use Kimi or Claude — paste source files, describe changes, get improved code
Commit to Forgejo — push changes; the live app updates automatically
Track everything — QA issues, FRD updates, root cause analyses all live as Forgejo Issues and Wiki pages

See FRD.md §3 for the full Collaborative Development & Continuous Improvement specification.

Architecture Overview

┌──────────────────────────────────────────────────────────────────────────────┐
│                         BIR FORM AUTOMATION STACK                            │
├──────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  ┌─────────────────┐    ┌─────────────────┐    ┌─────────────────────────┐    │
│  │   ERPNext /     │◄──►│   LOCAL API     │◄──►│   FORM FILL ENGINE      │    │
│  │   Any Data      │    │   (REST/tRPC)   │    │   (PDF-lib + Canvas)    │    │
│  │   Source        │    │                 │    │                         │    │
│  └─────────────────┘    └─────────────────┘    └─────────────────────────┘    │
│                              ▲                           │                   │
│                              │                           ▼                   │
│                              │                  ┌─────────────────┐           │
│                              │                  │  LAYERED PDF    │           │
│                              │                  │  OUTPUT         │           │
│                              │                  └─────────────────┘           │
│                              │                                               │
│  ┌───────────────────────────┴─────────────────────────────────────────┐     │
│  │                    TEMPLATE REPOSITORY (JSON/YAML)                   │     │
│  │  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌──────────┐ │     │
│  │  │ Form 2307   │  │ Form 2551Q  │  │ Form 1601EQ │  │ Form ... │ │     │
│  │  │ Template    │  │ Template    │  │ Template    │  │ Template │ │     │
│  │  │ - fields[]  │  │ - fields[]  │  │ - fields[]  │  │ - fields │ │     │
│  │  │ - rules[]   │  │ - rules[]   │  │ - rules[]   │  │ - rules  │ │     │
│  │  │ - fonts{}   │  │ - fonts{}   │  │ - fonts{}   │  │ - fonts  │ │     │
│  │  └─────────────┘  └─────────────┘  └─────────────┘  └──────────┘ │     │
│  └──────────────────────────────────────────────────────────────────┘     │
│                              ▲                                               │
│                              │                                               │
│  ┌───────────────────────────┴─────────────────────────────────────────┐     │
│  │                    AI FIELD DETECTION SERVICE                         │     │
│  │  (Kimi Moonshot / Local LLM / OCR Hybrid)                             │     │
│  │  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐                   │     │
│  │  │ PDF Upload  │  │  AI Indexer  │  │  Human Review│                   │     │
│  │  │             │  │  (draw boxes)  │  │  & Adjust   │                   │     │
│  │  └─────────────┘  └─────────────┘  └─────────────┘                   │     │
│  └──────────────────────────────────────────────────────────────────┘     │
│                                                                              │
│  ┌──────────────────────────────────────────────────────────────────┐     │
│  │                    FORGEJO PAGES HOSTING                            │     │
│  │  - Static web UI for template management                          │     │
│  │  - AI-assisted field labeling interface                           │     │
│  │  - Template preview & calibration                                 │     │
│  │  - Auto-deploys when team pushes changes                          │     │
│  └──────────────────────────────────────────────────────────────────┘     │
│                              ▲                                               │
│                              │                                               │
│  ┌───────────────────────────┴─────────────────────────────────────────┐     │
│  │                    TEAM DEV LOOP (Clarise + Journie + AI)             │     │
│  │  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐            │     │
│  │  │  Report  │  │  AI Chat │  │   Edit   │  │   Push   │            │     │
│  │  │  Issue   │──►│  (Kimi/  │──►│  Source  │──►│  & Deploy│            │     │
│  │  │          │  │  Claude) │  │          │  │          │            │     │
│  │  └──────────┘  └──────────┘  └──────────┘  └──────────┘            │     │
│  └──────────────────────────────────────────────────────────────────┘     │
│                                                                              │
└──────────────────────────────────────────────────────────────────────────────┘

Phase Roadmap

Phase	Goal	Status
Phase 0	Project scaffolding (README, FRD, repo structure)	✅ Completed
Phase 1	Upload BIR 2307 → AI detects fields → Human calibrates → Save template	Planned
Phase 2	Fill template with dummy data; calibrate font/size/kerning/alignment per field	Planned
Phase 3	Map template fields to ERPNext data model; build REST/tRPC API	Planned
Phase 4	Produce real 2307 PDFs from live supplier payment data	Planned
Phase 5	Port forms into ERPNext Philippines module; onboard 2551Q, 1601EQ, etc.	Planned

Key Technical Decisions

Decision	Rationale
Flat PDF → Layered PDF	Government forms are non-fillable scans. We overlay a transparent fillable layer on top of the original background so the output looks identical to the official form when printed.
Template-as-Code	Every form is stored as a JSON/YAML template containing field coordinates, fonts, rules, and data mappings. Templates are versioned and diffable.
AI Field Detection	Rather than manual coordinate entry, AI (Kimi Moonshot, local LLM, or hybrid OCR) suggests field bounding boxes by analyzing the PDF image. Humans review and adjust.
Forgejo Pages First	Host the labeling UI on Forgejo Pages for zero-cost, self-hosted deployment. Later migrate to ERPNext-embedded pages.
Local Web App Bridge	A lightweight local web app (React + Hono) sits between ERPNext and the form engine, handling authentication, data mapping, and batch generation.

Repository Structure

bir-form-automation/
├── README.md                 # This file
├── FRD.md                    # Functional Requirements Document
├── docs/
│   ├── architecture/         # System diagrams & ADRs
│   └── api/                  # API specifications
├── packages/
│   ├── web/                  # Forgejo Pages UI (field labeling, template mgmt)
│   ├── api/                  # Local API server (tRPC + Hono)
│   ├── pdf-engine/           # PDF manipulation layer (pdf-lib, canvas)
│   └── ai-service/           # AI field detection & OCR wrappers
├── templates/
│   └── bir/
│       └── 2307/
│           ├── 2307-template.json     # Field definitions & rules
│           ├── 2307-background.pdf    # Official blank form
│           └── 2307-sample-filled.pdf # Calibrated output sample
├── integrations/
│   └── erpnext/              # ERPNext connector & data mappers
└── tools/
    └── calibration/            # Font/size/kerning calibration scripts

Quick Start (Future)

# 1. Clone the repository
git clone <forgejo-repo-url>
cd bir-form-automation

# 2. Install dependencies
npm install

# 3. Start the local development server
npm run dev

# 4. Open the field labeling UI
open http://localhost:3000

# 5. Upload a BIR PDF and begin AI-assisted field mapping

Running Tests

The test system has three layers that run in sequence: fill → verify → audit. You can stop after any layer depending on what you need to check.

1. Configure an AI Provider (first time only)

The AI audit step requires a vision-capable AI provider. If you only want to run fill + value verification, skip this step.

Start the local API server: npm run server
Open the app in your browser: http://localhost:3000
Go to Settings → AI Providers → Add Provider

Fill in the form:

Field	Example values
Display name	`Kimi Vision`, `Claude Sonnet`, `Local Ollama`
Type	`moonshot` / `anthropic` / `openai` / `ollama` / `custom`
Endpoint URL	Pre-filled per type; edit if using a custom host
Model ID	`moonshot-v1-8k`, `claude-sonnet-4-6`, `gpt-4o`, `llava`
API Key	Enter your key — it writes to the server `.env` and is never shown again

Click Test Connection. Wait for the ✅ Connected badge.
Under Settings → AI Providers → Task Assignments, set:
- Field Detection provider → your vision provider
- AI Audit provider → same, or a different one to compare

You can add as many providers as you like. The API key for each lives in its own env var in the server .env file — never in the config files committed to the repo.

To use a provider without any API key (for offline work or CI), add a provider with type mock. It returns a realistic hardcoded response and requires nothing.

2. Open the test panel for a template

From the dashboard (/), open any saved template.
Click the Tests tab (top navigation, next to Calibration and Map).
You will see the test case library for this template.

If no test cases exist yet, click New Test Case and fill in values for each field, or click Fill with Dummy to auto-populate all fields with pattern-based dummy data.

3. Run a single test

In the test case library, click Run next to any test case.

Three tabs appear in the result panel:

Tab 1 — Generated PDF The filled PDF rendered inline. Use zoom and page navigation to inspect it visually. This is the same output that would be sent to BIR.

Tab 2 — Value Report A row-per-field table comparing what you injected vs what was extracted from the PDF:

Field	Expected	Extracted	Result
payee_tin	123-456-789-000	123-456-789-000	✅ MATCH
payee_name	ACME SUPPLIES INC.	ACME SUPPLIES IN	⚠ PARTIAL
grand_total_income	300,000.00	(empty)	❌ MISSING

MATCH — value in the PDF is identical to what was injected.
PARTIAL — value was truncated (check the overflow setting for this field).
MISSING — no text was found in this field's bounding box (coordinate or rendering issue).
MISMATCH — text is present but different from the injected value.
ORPHAN rows at the bottom of the table — text found in the PDF that does not belong to any field (coordinate error or rendering bug).

Tab 3 — AI Audit Click Run AI Audit to send the filled PDF to your configured vision provider. The AI reads the form as an independent reviewer — it does not know what values were injected, only what it can see on the page.

Results appear as colored overlays on the PDF:

🔴 Red border — CRITICAL violation (form would likely be rejected)
🟡 Yellow border — WARNING (may cause issues, needs review)
🔵 Blue border — INFO (cosmetic only)

Each violation shows a description and a suggested fix on hover. Click any violation to open the calibration panel for that field immediately.

4. Run the full test suite

To run all test cases for a template at once:

Tests tab → "Run All Tests" button

A progress bar shows each test case as it runs. When complete, a summary table appears:

Test Case	Fill	Value Check	AI Audit	Critical	Warnings
Base case	✅	✅ 22/22 match	PASS	0	0
Max length	✅	⚠ 2 partial	WARN	0	2
Minimal (required only)	✅	✅ 10/10 match	PASS	0	0
Special characters	❌	1 mismatch	FAIL	1	0

Click any row to expand the full field-level report for that test case.

5. Compare results across AI providers

To see how different AI models assess the same PDF:

In the AI Audit tab of any test result, find the Provider dropdown (top right of the tab).
Switch to a different provider and click Run AI Audit again. Results are stored separately for each provider.
For a full side-by-side comparison: click Compare Providers. All your configured vision providers run on the same PDF and their violation lists appear as columns in a table.

Violations flagged by every provider = high confidence, fix immediately.
Violations flagged by only one provider = review manually, may be a false positive.

6. Fix a violation

Every violation card — whether in the Value Report or the AI Audit — has a Fix This button.

Clicking it:

Opens the calibration panel for the affected field.
Highlights the specific parameter the AI suggested adjusting (font size, bbox width, overflow setting, etc.).
Keeps the test result in a sidebar so you can re-run with one click after making the change.

Common fixes by violation type:

Violation	Where to look	What to adjust
`TEXT_OVERFLOW`	Calibration → Overflow	Change to `shrink_font` or reduce `font_size_pt`
`PARTIAL` in value check	Calibration → Max Length	Increase `max_length` or change overflow to `wrap`
`GROUPBOX_MISALIGNED`	Calibration → GROUP_BOXES	Adjust `box_width` or use auto char_spacing recalculate
`MISSING` in value check	Template editor	Check bbox coordinates — field may be off the page
`DATE_FORMAT_WRONG`	Data mapping → Transform	Apply `date_format(MM/DD/YYYY)` transform
`TIN_FORMAT_WRONG`	Data mapping → Transform	Apply `tin_format` transform
`ORPHAN_TEXT`	Template editor	A field bbox is wrong; the text landed outside it

7. Run tests from the API (automated / CI)

All test operations are available as REST endpoints on the local API server:

# Fill a template with a specific test case
curl -X POST http://localhost:3001/api/test/fill \
  -H "Content-Type: application/json" \
  -d '{
    "template_id": "bir-2307-2018-encs",
    "test_case_id": "base-case",
    "payload": { ... }
  }'
# Returns: { fill_id, pdf_base64, fill_report }

# Run value verification on a completed fill
curl http://localhost:3001/api/test/{fill_id}/verify
# Returns: { fields_matched, fields_mismatched, fields_missing, orphan_text }

# Run AI audit on a completed fill
curl -X POST http://localhost:3001/api/test/{fill_id}/audit \
  -H "Content-Type: application/json" \
  -d '{ "provider_id": "kimi-vision" }'
# Returns: { audit_id, violations[], verdict, ai_model, duration_ms }

# Run the full test suite for a template
curl -X POST http://localhost:3001/api/test/run-suite \
  -H "Content-Type: application/json" \
  -d '{ "template_id": "bir-2307-2018-encs" }'
# Returns: { run_id, results[], summary: { passed, failed, warnings } }

For CI (Forgejo Actions), a workflow that blocks deployment on critical violations:

# .forgejo/workflows/test.yml
on: [push]
jobs:
  test-templates:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - run: npm install && npm run server:ci &
      - run: npm run test:suite -- --template bir-2307-2018-encs --provider mock
      - run: |
          RESULT=$(curl -s http://localhost:3001/api/test/last-run/summary)
          CRITICAL=$(echo $RESULT | jq '.critical_count')
          if [ "$CRITICAL" -gt "0" ]; then
            echo "❌ $CRITICAL critical violations — blocking deploy"
            exit 1
          fi

Note: Use --provider mock in CI so no API key is needed. The mock provider returns a realistic response and validates the fill + value verification layers without calling an external API.

8. Understand the overall test verdict

A template version is considered test-passing when:

All test cases show zero MISSING or MISMATCH in the value report.
The AI audit returns zero CRITICAL violations across all test cases.
WARNING violations exist in the report but are either resolved or declared as expected_violations in the test case definition (known cosmetic issues that do not affect BIR acceptance).

A ✅ test-passing badge appears on the template card in the dashboard once these conditions are met. The system will block a template commit to the repo if any CRITICAL violation exists — the block can be overridden with a written justification that is included in the commit message.

Contributing & Team Workflow

How We Improve This Application

Any team member can improve the app — no vendor dependency, no waiting for external developers:

File an Issue (in Forgejo) — Bug? Feature request? Calibration problem? Open an issue with the appropriate label:
- qa/bug — something is broken or misaligned
- fr/feature-request — new capability needed
- rca — root cause analysis of a misfilled form
- frd-update — change to requirements
Discuss & Investigate — Use the issue thread to attach sample PDFs, screenshots, and data. Use Kimi or Claude to help investigate.
Implement — Open the source in an AI chat, describe the fix, apply the changes locally, test.
Commit & Deploy — Push to Forgejo. Forgejo Pages auto-deploys. The whole team sees the update immediately.
Update the FRD — If the change affects requirements, update FRD.md in the same commit.

Development Rules

All form templates live in /templates and follow the FormTemplate schema (see FRD.md §8.1).
Every field change must be reproducible: update the template JSON, not the PDF directly.
Calibration changes (font, size, kerning) require a before/after PDF sample in the commit.
Follow the phase roadmap — Phase 1 must produce a pixel-perfect BIR 2307 before expanding to other forms.
Self-improvement is the default — when the AI misdetects a field, adjust the prompt in packages/ai-service/prompts/; when calibration is off, update the defaults; when a new edge case appears, add a validation rule.

License

MIT — This is an open-source tool for Philippine tax compliance automation.