[ INDEX ARCHIVE // SYSTEM CATALOG ]

PROJECTS &
SOURCE CODE

Production systems, open-source libraries and architectural work — documented in detail. From high-throughput Go backends to fine-tuned language models: deterministic solutions under real-world load.

01 Multilingual NLP Toolkit Python / spaCy / PyTorch 2024–2026 02 catalog-storage Go / gRPC / Kafka / PostgreSQL 2025–2026 03 whatsapp-ms PHP 8 / Symfony / Docker 2025 04 Cangoo Taxi System Go / Real-Time / AWS 2018–2023 05 Fine-Tuned LLM Models Hugging Face / LoRA / QLoRA 2024–2026

Multilingual NLP Toolkit

A collection of focused, dependency-light NLP components for morphologically rich languages — built for the cases where classic dictionary-based approaches fail.

At its core is a bidirectional, rule-based Turkish morphology engine. Turkish is agglutinative: a single word like geliyormuşsunuz ("apparently you (pl.) are coming") stacks aspect, evidentiality, person and number onto the stem gel-. The engine masters both directions — analysis and generation — with pure rules: no model downloads, no lookup server. It implements 2/4/8-way vowel harmony, consonant softening (kitap → kitabı), buffer consonants (y/n/s), stem mutations (git → gid-) and Leipzig-compliant glossing.

from mnlp.turkish import TurkishMorphology tm = TurkishMorphology() tm.analyze("geliyormuşsunuz") # {'stem': 'gel', 'features': {'ASPEKT': 'progressive', # 'EVIDENTIALITY': 'indirect', 'PERSON': '2', 'NUMERUS': 'plural', ...}} tm.generate("ev", {"NUMERUS": "plural", "POSSESSION": "1pl", "KASUS": "ablative"}, pos="NOUN") # 'evlerimizden' ("from our houses") tm.gloss("geliyormuşsunuz") # 'GEL-PROG-EVID-P.2PL' (Leipzig-Glossing)

Beyond that, the toolkit ships three more production-proven modules: mnlp.grammar detects compound tenses, moods and voice for German and English via auxiliary-chain analysis on dependency parses (spaCy). mnlp.alignment wraps fast_align as a robust Python wrapper for word alignments. mnlp.modulation provides FiLM conditioning modules (Perez et al., 2018) for PyTorch — including soft selection and Gaussian variance propagation.

Dependency-light: core module runs on the pure standard library
Bidirectional: analysis and generation from a single rule set
46 unit tests: optional dependencies are skipped automatically
Battle-tested: born out of real production experiments

catalog-storage (GenericCatalog)

A high-throughput, schema-flexible resource catalog service in Go: arbitrary business entities behind a gRPC API — without schema sprawl, without the EAV antipattern.

Products, policies, vehicles, contracts: instead of maintaining a separate table landscape per type, the system combines a stable base schema (resources, resource_links) with domain-specific extension tables (pricing, vehicle, insurance, …). Queries declare via Field Scoping which field groups they need — the service joins and hydrates only those extensions.

// WRITE-BEHIND-INGESTION PIPELINE gRPC Writes→Sharded In-Memory Buffer→Redpanda / Kafka→Ingest Worker (1 per partition)→Bulk COPY→Partitioned PostgreSQL

// READ PATH gRPC Reads→Redis (State + Tiered Storage)→Field-scoped Hydration

Writes are acknowledged from a sharded in-memory buffer, drained asynchronously to Redpanda/Kafka and written to partition-aware PostgreSQL via Bulk COPY by partition-bound ingest workers. Hierarchies are maintained by a dedicated closure worker over materialized path tables; composable resource bundles with slug routing (112-policy) enable progressive field filtering across entire object trees.

Write-behind: acknowledge in microseconds, persistence decoupled
Partition-aware: 1 ingest worker per Kafka partition, COPY instead of INSERT
HA setup: Docker HA Compose, K8s deployments, PgBouncer pooling
Load-tested: k6 extreme-load scenarios included in the repo
Closure tables: materialized hierarchy paths, dedicated worker
Field Scoping: reads only pay for the fields they need

whatsapp-ms

A self-hosted WhatsApp Business microservice that wraps the Meta Cloud API behind a clean, deterministic REST interface — multi-account capable and deployable in minutes.

The service handles the complete conversation logic: sending and receiving messages, managing conversations, contacts and templates — across multiple WhatsApp Business accounts and teams. Incoming events from the Meta Cloud API are signature-verified at the inbound webhook, processed asynchronously via Symfony Messenger and forwarded to downstream systems as token-authenticated, signed webhooks.

// EVENT FLOW Meta Cloud API→Inbound Webhook (Signature Check)→Symfony Messenger Queue→Conversation Engine→Signed Partner Webhook

// COMMAND FLOW Client System→REST API→Template / Message Dispatch→Meta Cloud API

Deployment is fully containerized (MariaDB, Traefik-ready, setup-docker.sh) and extensively documented: separate guides for the REST API, Meta webhook handling, partner integration and client onboarding live directly in the repository.

Multi-account: multiple business accounts & teams in one instance
Security: Meta signature verification + signed outbound hooks
Asynchronous: Symfony Messenger decouples webhook spikes
Documented: API, webhook, partner and Docker guides in the repo

Cangoo Taxi System

The complete technical ecosystem of a taxi platform — designed, implemented and carried through to market launch as co-founder and CTO of Go4System GmbH.

From the first git commit to stable live operations: the central challenge was fault-free real-time data processing — continuous position streams from the vehicle fleet, latency-critical dispatch assignment and transaction-safe ride handling under real-world load. The architecture relied on scalable Go microservices with event streaming, operated on AWS with fully automated CI/CD pipelines.

Beyond system architecture, the role included technical leadership of the development team: code standards, review processes, infrastructure decisions — and the daily translation of business requirements into resilient technology. Five years of full entrepreneurial responsibility, from idea to operations.

100% in-house: entire platform without third-party frameworks at the core
Real-time: position streams + dispatch at sub-second latency
Team leadership: building and leading the development team
Full lifecycle: concept → architecture → operations → market launch

Fine-Tuned LLM Models

Fine-tuned open-source language models for domain-specific tasks — trained, evaluated and publicly published in the course of the AI specialization at JKU Linz.

The methodical core is Parameter-Efficient Fine-Tuning (PEFT): using LoRA adapters and 4-bit quantized QLoRA, high-performance base models are tailored to specific domains with a minimal hardware footprint — instead of expensive full fine-tuning, only low-rank adapter matrices are trained. The workflow covers data preparation, training runs, systematic evaluation and publishing the weights on the Hugging Face Hub.

Complementing this are Retrieval-Augmented Generation pipelines that couple fine-tuned models with semantic vector search: chunking strategies, embedding selection, re-ranking and grounding against protected knowledge bases — bridging the gap between model research and production AI infrastructure.

PEFT stack: LoRA adapters, QLoRA 4-bit quantization
Reproducible: systematic eval suites per training run
Public: models and cards on the Hugging Face Hub
RAG integration: embeddings, re-ranking, semantic search