Unstructured Data Management
Trinity UDM (Unstructured Data Management) helps you collect and analyze unstructured data across social media, forums, blogs, and open data platforms. It supports web crawling, keyword extraction, semantic analysis, and topic classification, all powered by localized language expertise. This gives you insights that many global solutions struggle to deliver.

Your Ideal Entry Point into Text Analysis
Built on the proven foundation of Trinity JCS (Job Control System) and Trinity ETL (Data Integration), Trinity UDM lets you unify both structured and unstructured data in a single workflow. From traditional databases to social media content, you can incorporate diverse sources into your analysis. Its flexible, modular architecture enables you to customize processing pipelines to fit a wide range of data extraction and analysis needs.

Maximize Productivity
Gain Insight into the Digital World
Extract Website Content with Ease
- Crawl from seed URLs and internal links to capture complete website data.
- Access content through one or multiple proxy servers with proxy-based crawling.
- Set user-configurable parameters including connections, proxy pool size, and wait times for flexible crawling control.
- Filter URL parameters automatically and retry failed connections without interruption.
- Enable URL parameter filtering and automatic retry mechanisms to ensure smooth, uninterrupted crawling.
- Launch crawls from multiple start points with blob/bytea storage support.
- Download linked files automatically with HTTPS certification, file type detection, and duplicate handling.
Versatile Crawling Technologies
- Use JSOUP-based web crawling (GET/POST) to capture content directly from websites.
- Apply XPath-based extraction to pull precise data from XML or HTML sources.
- Capture business listings and search results directly from Google Maps and Google Search.
- Extract text seamlessly from directory files for easier processing.
- Parse RSS feeds into structured outputs you can work with instantly.
- Handle JSON arrays and objects with straightforward parsing tools.
- Standardize your data with address normalization, covering full/half-width conversion, administrative upgrades, postal code validation, and numeric format recognition.
Open Data Integration
- Connect seamlessly to government open data portals so you can extract metadata such as file names, download links, formats, encoding, and timestamps.
- Search efficiently with built-in support for both keyword and ID-based queries.
Social Media Support
- Capture Facebook page data including posts, comments, replies, likes, and reactions so you can analyze engagement at scale.
- Extract Twitter insights with keyword-based searches or pull posts directly from specific user accounts.
Key Features
Comprehensive Text Processing
Tokenization
Generate Chinese word segmentation outputs as JSON arrays with part-of-speech tags or delimited strings.
Summarization
Extract concise summaries from text fields to help you quickly capture key insights.
Word Frequency Analysis
Run frequency statistics using CRF-based segmentation to identify patterns and trends.
New Word Discovery
Detect emerging keywords and neologisms automatically through algorithmic analysis.
Text Similarity Scoring
Evaluate how closely text matches a reference dictionary with score-based comparisons.
Language Detection
Identify up to 53 languages within mixed-language content.
Noise Filtering
Remove unwanted characters across 28 languages for cleaner, more reliable text.
Sentiment Analysis
Determine sentiment polarity (positive/negative) of Chinese text using a specified dictionary.
Text Classification
Classify content automatically with keyword extraction and labeling algorithms.
Vectorization
Compute LDA (Latent Dirichlet Allocation) vectors for clustering documents or modeling topics.
Dimensional Tagging
Transform text into dimensional keywords for analytical mapping.
Advanced JSON Processing
Parse, merge, update, delete, and apply user-defined rules to handle nested JSON structures.
All Products