TIO AUTOPLAT · Data Foundation L2

DataHubData Middle Platform

Not a traditional data warehouse, but a trustworthy data foundation built around and for AI. Evolve data from 'stored' to 'usable', ultimately to 'AI does the work'.

47+
Core Features
6
Capability Domains
20+
Data Source Types
Batch-Stream Unified
Processing Architecture
PRODUCT OVERVIEW

Scattered Data Assets, Governance and AI Applications Working in Silos

Most enterprises face data silos across systems, inconsistent metric definitions, hard-to-trace quality issues, and data assets that cannot directly provide training ammunition for AI models. DataHub integrates lakehouse storage, visual data processing, standardized governance, training dataset management, and data serviceization into a closed-loop platform, making data controllable from ingestion to service release and natively supporting AI applications.

Data Silos——Multi-source heterogeneous batch-stream unified ingestion, lakehouse storage, unified catalog management of data assets
Inconsistent Metrics——Built-in master data and metric center, standards贯穿 the entire data chain, eliminating口径分歧
AI Data Preparation Trap——Built-in training dataset flow, direct knowledge base access, data governance results directly enter AIHub
47+
Core Features
Six Domains
End-to-End Coverage
Batch-Stream Unified
Multi-Latency Adaptation
AI-Ready
Training Data Direct

Six Capability Domains

Covering the complete chain from data ingestion, storage, processing, governance to serviceization and AI integration, 47+ core features

01Platform & Lakehouse
App-level data platform operations overview (table scale, lakehouse health, ingestion status)
Sub-capability Profile configuration and dependency closure validation
Multi-engine lakehouse foundation (StarRocks / MySQL / ClickHouse / DuckDB)
Iceberg snapshots, time travel, rollback and table-level maintenance
Hot/cold archiving strategy and lifecycle timeline
02Data Ingestion
JDBC data source management and visual database/table exploration
Batch FULL/INCREMENTAL sync and CDC (Debezium)
Kafka/RabbitMQ streaming micro-batch lake ingestion
API/FTP/SFTP/S3/HTTP Push extension collectors
CSV/Excel/JSON/Parquet and multimodal file lake ingestion
03Data Processing & Orchestration
X6 DAG pipeline visual design and CRON scheduling
13 DAG node types (SOURCE/SINK/SQL_TRANSFORM/FILTER/AGGREGATE etc.)
Single-node Schema inference, sampling preview and trial run
Flink cluster/job/CDC task management (extended delivery)
Streaming deployment, event routing and media gateway
04Data Governance
Technical metadata registration, lake table sync and Schema historical profiling
Table/column-level lineage graph, path and impact analysis
Data standards, code tables, logical/physical modeling and materialization
Metric center and standard compliance scanning, violation repair suggestions
Sensitivity classification, dynamic masking and access auditing
Master data entity merge/dedup, point-in-time rollback and external distribution
05Metrics Semantics & Open Services
Open inventory and data service (read-only SQL) configuration
Dynamic OpenAPI generation based on published resources
Webhook subscription push and failure retry
SQL export CSV/XLSX (masked) and download auditing
Access Hub cross-application data sharing scenarios
06AI Integration
KB sync (tables/master data → app knowledge base, configurable masking and field whitelist)
Knowledge jobs, knowledge strategies and chunking into RAG
Training dataset export (connecting to AIBase)
MCP tools (catalog summary, metadata samples, pipeline running status)
RAG/recall chain debugging (admin verification of retrieval effect)

End-to-End Data Governance Flow

From multi-source ingestion to AI consumption, every link is under governance control

DataHub end-to-end data governance flow chart

AI-Assisted Capabilities

Focused on lowering governance barriers and improving operations efficiency, complementing AIBase's assisted training

Intelligent Modeling & Data Discovery

Automatically recommend logical models and metric drafts based on business descriptions, suggest quality rules and interpret failure root causes, allowing business users to participate in modeling and quality governance.

Knowledge Assistant

Automatically recommend chunking and Embedding strategies by document type, root-cause analysis when recall fails, accelerating knowledge go-live.

Operations Assistant

Interpret collection and pipeline running logs, give disposal suggestions, shortening failure recovery time.

Skill Discovery

Automatically identify data services that can be packaged as standardized Skills, promoting data capability assetization.

Platform Capability Assurance

End-to-End Closed Loop

From multi-source ingestion, lakehouse storage, visual processing, standard governance to service release and knowledgeization, all in one product.

Batch-Stream Unified

Supports batch collection, CDC near real-time, Flink stream processing, video stream ingestion, adapting to full-scenario data latency requirements.

Governance Built-In

Standards, quality, security, and master data run through the entire data chain, not after-the-fact patches.

AI-Ready

Native training dataset governance, knowledgeization and recall debugging, directly providing standardized data ammunition for model training and intelligent applications.

Open Ecosystem

Data service APIs, subscription push, event routing and open keys enable DataHub's data capabilities to be safely consumed by external systems and the Skill ecosystem.

USE CASES

Landing Scenarios

Typical enterprise applications of data governance and AI enablement

Data governance scenario

Group-Level Data Governance

Multi-business unit data converges into the lake, unified metric definitions and master data standards, quality rules automatically scan and one-click repair, data asset catalog globally visible.

Data open scenario

AI Training Data Preparation

Built-in model training dataset flow, supporting cross-application data selection, preview, auditing and one-click training start. DataHub manages data, AIBase manages models, with clear boundaries.

Build a Continuously Evolving Data Asset System

Natively coupled with AIHub, DataHub governance results directly enter the knowledge base; data and agents are no longer two separate systems