- HTML 100%
| 2306 Jan 2018 ENCS v4.pdf | ||
| 2307 Jan 2018 ENCS v3.pdf | ||
| 2316 Sep 2021 ENCS_Final_corrected.pdf | ||
| 260429 pdfexport_architecture_spec.docx | ||
| FRD-detailed.md | ||
| FRD.md | ||
| index.html | ||
| README.md | ||
BIR Form Automation System
Status: Phase 1 — PDF Field Mapping & Calibration (BIR Form 2307)
Live URL:https://pages.gi7b.org/cgg/pdfexport/
Target Platform: Forgejo Pages → Local Web App → ERPNext Philippines
The Problem
In the Philippines, businesses must file BIR forms like 2307 (Certificate of Creditable Tax Withheld at Source) every time they pay suppliers subject to Expanded Withholding Tax (EWT). ERPNext — a leading open-source ERP — lacks the precision to fill flat PDF government forms without massive custom development. Rather than bending ERPNext into submission, we are building a form-aware automation layer that can:
- Accept any flat PDF form (starting with BIR 2307).
- Use AI to detect, index, and label every field by its coordinates.
- Store a machine-readable "form template" (dimensions, coordinates, fonts, rules).
- Fill the form programmatically with correct fonts, alignment, spacing, and grouping.
- Export a print-ready, layered PDF — background layer + fillable field layer.
- Integrate with ERPNext (or any system) via API.
Team
| Name | Role | Can Modify the App? |
|---|---|---|
| Clarise Duco | Product / Compliance Lead — BIR compliance, form requirements, field calibration | Yes — via Forgejo + Kimi/Claude |
| Journie Reyes | Engineering / Integration Lead — ERPNext, API, data mapping, architecture | Yes — via Forgejo + Kimi/Claude |
How the Team Works (Collaborative Development Model)
This is a team-owned, team-evolved application. Because it is a static web app hosted on Forgejo Pages:
- Open a browser — the app is always available at your Forgejo Pages URL
- Use Kimi or Claude — paste source files, describe changes, get improved code
- Commit to Forgejo — push changes; the live app updates automatically
- Track everything — QA issues, FRD updates, root cause analyses all live as Forgejo Issues and Wiki pages
See FRD.md §3 for the full Collaborative Development & Continuous Improvement specification.
Architecture Overview
┌──────────────────────────────────────────────────────────────────────────────┐
│ BIR FORM AUTOMATION STACK │
├──────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────────────┐ │
│ │ ERPNext / │◄──►│ LOCAL API │◄──►│ FORM FILL ENGINE │ │
│ │ Any Data │ │ (REST/tRPC) │ │ (PDF-lib + Canvas) │ │
│ │ Source │ │ │ │ │ │
│ └─────────────────┘ └─────────────────┘ └─────────────────────────┘ │
│ ▲ │ │
│ │ ▼ │
│ │ ┌─────────────────┐ │
│ │ │ LAYERED PDF │ │
│ │ │ OUTPUT │ │
│ │ └─────────────────┘ │
│ │ │
│ ┌───────────────────────────┴─────────────────────────────────────────┐ │
│ │ TEMPLATE REPOSITORY (JSON/YAML) │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌──────────┐ │ │
│ │ │ Form 2307 │ │ Form 2551Q │ │ Form 1601EQ │ │ Form ... │ │ │
│ │ │ Template │ │ Template │ │ Template │ │ Template │ │ │
│ │ │ - fields[] │ │ - fields[] │ │ - fields[] │ │ - fields │ │ │
│ │ │ - rules[] │ │ - rules[] │ │ - rules[] │ │ - rules │ │ │
│ │ │ - fonts{} │ │ - fonts{} │ │ - fonts{} │ │ - fonts │ │ │
│ │ └─────────────┘ └─────────────┘ └─────────────┘ └──────────┘ │ │
│ └──────────────────────────────────────────────────────────────────┘ │
│ ▲ │
│ │ │
│ ┌───────────────────────────┴─────────────────────────────────────────┐ │
│ │ AI FIELD DETECTION SERVICE │ │
│ │ (Kimi Moonshot / Local LLM / OCR Hybrid) │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │
│ │ │ PDF Upload │ │ AI Indexer │ │ Human Review│ │ │
│ │ │ │ │ (draw boxes) │ │ & Adjust │ │ │
│ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │
│ └──────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────────────────┐ │
│ │ FORGEJO PAGES HOSTING │ │
│ │ - Static web UI for template management │ │
│ │ - AI-assisted field labeling interface │ │
│ │ - Template preview & calibration │ │
│ │ - Auto-deploys when team pushes changes │ │
│ └──────────────────────────────────────────────────────────────────┘ │
│ ▲ │
│ │ │
│ ┌───────────────────────────┴─────────────────────────────────────────┐ │
│ │ TEAM DEV LOOP (Clarise + Journie + AI) │ │
│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │
│ │ │ Report │ │ AI Chat │ │ Edit │ │ Push │ │ │
│ │ │ Issue │──►│ (Kimi/ │──►│ Source │──►│ & Deploy│ │ │
│ │ │ │ │ Claude) │ │ │ │ │ │ │
│ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ │
│ └──────────────────────────────────────────────────────────────────┘ │
│ │
└──────────────────────────────────────────────────────────────────────────────┘
Phase Roadmap
| Phase | Goal | Status |
|---|---|---|
| Phase 0 | Project scaffolding (README, FRD, repo structure) | ✅ Completed |
| Phase 1 | Upload BIR 2307 → AI detects fields → Human calibrates → Save template | Planned |
| Phase 2 | Fill template with dummy data; calibrate font/size/kerning/alignment per field | Planned |
| Phase 3 | Map template fields to ERPNext data model; build REST/tRPC API | Planned |
| Phase 4 | Produce real 2307 PDFs from live supplier payment data | Planned |
| Phase 5 | Port forms into ERPNext Philippines module; onboard 2551Q, 1601EQ, etc. | Planned |
Key Technical Decisions
| Decision | Rationale |
|---|---|
| Flat PDF → Layered PDF | Government forms are non-fillable scans. We overlay a transparent fillable layer on top of the original background so the output looks identical to the official form when printed. |
| Template-as-Code | Every form is stored as a JSON/YAML template containing field coordinates, fonts, rules, and data mappings. Templates are versioned and diffable. |
| AI Field Detection | Rather than manual coordinate entry, AI (Kimi Moonshot, local LLM, or hybrid OCR) suggests field bounding boxes by analyzing the PDF image. Humans review and adjust. |
| Forgejo Pages First | Host the labeling UI on Forgejo Pages for zero-cost, self-hosted deployment. Later migrate to ERPNext-embedded pages. |
| Local Web App Bridge | A lightweight local web app (React + Hono) sits between ERPNext and the form engine, handling authentication, data mapping, and batch generation. |
Repository Structure
bir-form-automation/
├── README.md # This file
├── FRD.md # Functional Requirements Document
├── docs/
│ ├── architecture/ # System diagrams & ADRs
│ └── api/ # API specifications
├── packages/
│ ├── web/ # Forgejo Pages UI (field labeling, template mgmt)
│ ├── api/ # Local API server (tRPC + Hono)
│ ├── pdf-engine/ # PDF manipulation layer (pdf-lib, canvas)
│ └── ai-service/ # AI field detection & OCR wrappers
├── templates/
│ └── bir/
│ └── 2307/
│ ├── 2307-template.json # Field definitions & rules
│ ├── 2307-background.pdf # Official blank form
│ └── 2307-sample-filled.pdf # Calibrated output sample
├── integrations/
│ └── erpnext/ # ERPNext connector & data mappers
└── tools/
└── calibration/ # Font/size/kerning calibration scripts
Quick Start (Future)
# 1. Clone the repository
git clone <forgejo-repo-url>
cd bir-form-automation
# 2. Install dependencies
npm install
# 3. Start the local development server
npm run dev
# 4. Open the field labeling UI
open http://localhost:3000
# 5. Upload a BIR PDF and begin AI-assisted field mapping
Running Tests
The test system has three layers that run in sequence: fill → verify → audit. You can stop after any layer depending on what you need to check.
1. Configure an AI Provider (first time only)
The AI audit step requires a vision-capable AI provider. If you only want to run fill + value verification, skip this step.
-
Start the local API server:
npm run server -
Open the app in your browser:
http://localhost:3000 -
Go to Settings → AI Providers → Add Provider
-
Fill in the form:
Field Example values Display name Kimi Vision,Claude Sonnet,Local OllamaType moonshot/anthropic/openai/ollama/customEndpoint URL Pre-filled per type; edit if using a custom host Model ID moonshot-v1-8k,claude-sonnet-4-6,gpt-4o,llavaAPI Key Enter your key — it writes to the server .envand is never shown again -
Click Test Connection. Wait for the ✅ Connected badge.
-
Under Settings → AI Providers → Task Assignments, set:
- Field Detection provider → your vision provider
- AI Audit provider → same, or a different one to compare
You can add as many providers as you like. The API key for each lives in its own env var in the server .env file — never in the config files committed to the repo.
To use a provider without any API key (for offline work or CI), add a provider with type mock. It returns a realistic hardcoded response and requires nothing.
2. Open the test panel for a template
- From the dashboard (
/), open any saved template. - Click the Tests tab (top navigation, next to Calibration and Map).
- You will see the test case library for this template.
If no test cases exist yet, click New Test Case and fill in values for each field, or click Fill with Dummy to auto-populate all fields with pattern-based dummy data.
3. Run a single test
-
In the test case library, click Run next to any test case.
-
Three tabs appear in the result panel:
Tab 1 — Generated PDF The filled PDF rendered inline. Use zoom and page navigation to inspect it visually. This is the same output that would be sent to BIR.
Tab 2 — Value Report A row-per-field table comparing what you injected vs what was extracted from the PDF:
Field Expected Extracted Result payee_tin 123-456-789-000 123-456-789-000 ✅ MATCH payee_name ACME SUPPLIES INC. ACME SUPPLIES IN ⚠ PARTIAL grand_total_income 300,000.00 (empty) ❌ MISSING - MATCH — value in the PDF is identical to what was injected.
- PARTIAL — value was truncated (check the
overflowsetting for this field). - MISSING — no text was found in this field's bounding box (coordinate or rendering issue).
- MISMATCH — text is present but different from the injected value.
- ORPHAN rows at the bottom of the table — text found in the PDF that does not belong to any field (coordinate error or rendering bug).
Tab 3 — AI Audit Click Run AI Audit to send the filled PDF to your configured vision provider. The AI reads the form as an independent reviewer — it does not know what values were injected, only what it can see on the page.
Results appear as colored overlays on the PDF:
- 🔴 Red border — CRITICAL violation (form would likely be rejected)
- 🟡 Yellow border — WARNING (may cause issues, needs review)
- 🔵 Blue border — INFO (cosmetic only)
Each violation shows a description and a suggested fix on hover. Click any violation to open the calibration panel for that field immediately.
4. Run the full test suite
To run all test cases for a template at once:
Tests tab → "Run All Tests" button
A progress bar shows each test case as it runs. When complete, a summary table appears:
| Test Case | Fill | Value Check | AI Audit | Critical | Warnings |
|---|---|---|---|---|---|
| Base case | ✅ | ✅ 22/22 match | PASS | 0 | 0 |
| Max length | ✅ | ⚠ 2 partial | WARN | 0 | 2 |
| Minimal (required only) | ✅ | ✅ 10/10 match | PASS | 0 | 0 |
| Special characters | ❌ | 1 mismatch | FAIL | 1 | 0 |
Click any row to expand the full field-level report for that test case.
5. Compare results across AI providers
To see how different AI models assess the same PDF:
- In the AI Audit tab of any test result, find the Provider dropdown (top right of the tab).
- Switch to a different provider and click Run AI Audit again. Results are stored separately for each provider.
- For a full side-by-side comparison: click Compare Providers. All your configured vision providers run on the same PDF and their violation lists appear as columns in a table.
Violations flagged by every provider = high confidence, fix immediately.
Violations flagged by only one provider = review manually, may be a false positive.
6. Fix a violation
Every violation card — whether in the Value Report or the AI Audit — has a Fix This button.
Clicking it:
- Opens the calibration panel for the affected field.
- Highlights the specific parameter the AI suggested adjusting (font size, bbox width, overflow setting, etc.).
- Keeps the test result in a sidebar so you can re-run with one click after making the change.
Common fixes by violation type:
| Violation | Where to look | What to adjust |
|---|---|---|
TEXT_OVERFLOW |
Calibration → Overflow | Change to shrink_font or reduce font_size_pt |
PARTIAL in value check |
Calibration → Max Length | Increase max_length or change overflow to wrap |
GROUPBOX_MISALIGNED |
Calibration → GROUP_BOXES | Adjust box_width or use auto char_spacing recalculate |
MISSING in value check |
Template editor | Check bbox coordinates — field may be off the page |
DATE_FORMAT_WRONG |
Data mapping → Transform | Apply date_format(MM/DD/YYYY) transform |
TIN_FORMAT_WRONG |
Data mapping → Transform | Apply tin_format transform |
ORPHAN_TEXT |
Template editor | A field bbox is wrong; the text landed outside it |
7. Run tests from the API (automated / CI)
All test operations are available as REST endpoints on the local API server:
# Fill a template with a specific test case
curl -X POST http://localhost:3001/api/test/fill \
-H "Content-Type: application/json" \
-d '{
"template_id": "bir-2307-2018-encs",
"test_case_id": "base-case",
"payload": { ... }
}'
# Returns: { fill_id, pdf_base64, fill_report }
# Run value verification on a completed fill
curl http://localhost:3001/api/test/{fill_id}/verify
# Returns: { fields_matched, fields_mismatched, fields_missing, orphan_text }
# Run AI audit on a completed fill
curl -X POST http://localhost:3001/api/test/{fill_id}/audit \
-H "Content-Type: application/json" \
-d '{ "provider_id": "kimi-vision" }'
# Returns: { audit_id, violations[], verdict, ai_model, duration_ms }
# Run the full test suite for a template
curl -X POST http://localhost:3001/api/test/run-suite \
-H "Content-Type: application/json" \
-d '{ "template_id": "bir-2307-2018-encs" }'
# Returns: { run_id, results[], summary: { passed, failed, warnings } }
For CI (Forgejo Actions), a workflow that blocks deployment on critical violations:
# .forgejo/workflows/test.yml
on: [push]
jobs:
test-templates:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- run: npm install && npm run server:ci &
- run: npm run test:suite -- --template bir-2307-2018-encs --provider mock
- run: |
RESULT=$(curl -s http://localhost:3001/api/test/last-run/summary)
CRITICAL=$(echo $RESULT | jq '.critical_count')
if [ "$CRITICAL" -gt "0" ]; then
echo "❌ $CRITICAL critical violations — blocking deploy"
exit 1
fi
Note: Use
--provider mockin CI so no API key is needed. The mock provider returns a realistic response and validates the fill + value verification layers without calling an external API.
8. Understand the overall test verdict
A template version is considered test-passing when:
- All test cases show zero MISSING or MISMATCH in the value report.
- The AI audit returns zero CRITICAL violations across all test cases.
- WARNING violations exist in the report but are either resolved or declared as
expected_violationsin the test case definition (known cosmetic issues that do not affect BIR acceptance).
A ✅ test-passing badge appears on the template card in the dashboard once these conditions are met. The system will block a template commit to the repo if any CRITICAL violation exists — the block can be overridden with a written justification that is included in the commit message.
Contributing & Team Workflow
How We Improve This Application
Any team member can improve the app — no vendor dependency, no waiting for external developers:
-
File an Issue (in Forgejo) — Bug? Feature request? Calibration problem? Open an issue with the appropriate label:
qa/bug— something is broken or misalignedfr/feature-request— new capability neededrca— root cause analysis of a misfilled formfrd-update— change to requirements
-
Discuss & Investigate — Use the issue thread to attach sample PDFs, screenshots, and data. Use Kimi or Claude to help investigate.
-
Implement — Open the source in an AI chat, describe the fix, apply the changes locally, test.
-
Commit & Deploy — Push to Forgejo. Forgejo Pages auto-deploys. The whole team sees the update immediately.
-
Update the FRD — If the change affects requirements, update
FRD.mdin the same commit.
Development Rules
- All form templates live in
/templatesand follow theFormTemplateschema (see FRD.md §8.1). - Every field change must be reproducible: update the template JSON, not the PDF directly.
- Calibration changes (font, size, kerning) require a before/after PDF sample in the commit.
- Follow the phase roadmap — Phase 1 must produce a pixel-perfect BIR 2307 before expanding to other forms.
- Self-improvement is the default — when the AI misdetects a field, adjust the prompt in
packages/ai-service/prompts/; when calibration is off, update the defaults; when a new edge case appears, add a validation rule.
License
MIT — This is an open-source tool for Philippine tax compliance automation.