Unstructured Data Management

Trinity UDM (Unstructured Data Management) helps enterprises collect and analyze unstructured data scattered across social media, forums, blogs, and open data platforms. From web crawling and keyword extraction to semantic analysis and topic classification, Trinity UDM leverages localized language capabilities to achieve results that global solutions often struggle to deliver.
Free Trial

Your Ideal Entry Point into Text Analysis

Built on the robust foundation of Trinity JCS (Job Control System) and Trinity ETL (Data Integration), Trinity UDM enables seamless integration of both structured and unstructured data. From traditional databases to social media content, all can be incorporated into analysis workflows. Its highly flexible modular architecture allows enterprises to customize processing pipelines to meet a wide variety of data extraction and analysis needs.
Productivity Advantages

Gain Insight into the Digital World

Website Content Extraction
  • Crawl from seed URLs through internal links to collect complete website content
  • Supports proxy-based crawling to access content through one or multiple proxy servers.
  • User-configurable parameters such as number of connections, proxy pool size, and connection wait time allow flexible adjustment based on different crawling needs.
  • Includes URL parameter filtering and automatic retry mechanisms for failed connections
  • Support for multiple simultaneous start points and blob/bytea storage
  • Automatic downloading of linked files with HTTPS certification, file type detection, and duplicate/reject handling
Versatile Crawling Technologies
  • JSOUP-based web crawling (GET/POST)
  • XPath-based extraction for XML/HTML
  • Google Maps & Google Search result capture
  • Text extraction from directory files
  • RSS feed parsing and structured output
  • JSON array/object parsing
  • Address normalization: full/half-width conversion, administrative upgrades, postal code validation, numeric format recognition, etc.
Open Data Integration
  • Connect to government open data portals to extract metadata (file names, download links, formats, encoding, timestamps)
  • Support for keyword and ID-based searches
Social Media Support
  • Facebook page scraping: posts, comments, replies, likes, reactions
  • Twitter: keyword search results and user-specific post extraction
Key  Features

Comprehensive Text Processing

Tokenization
Chinese word segmentation output as JSON arrays with part-of-speech tags or delimited strings
Summarization
Extract condensed summaries from text fields
Word Frequency Analysis
Frequency statistics using CRF-based segmentation
New Word Discovery
Algorithmic keyword and neologism detection
Text Similarity Scoring
Evaluate text-to-dictionary similarity using score-based matching
Language Detection
Support for detecting 53 languages in mixed-language content
Noise Filtering
Eliminate noise characters across 28 languages
Sentiment Analysis
Calculate sentiment polarity (positive/negative) of Chinese text using a specified dictionary
Text Classification
Classify data via keyword extraction and labeling algorithms
Vectorization
Compute LDA (Latent Dirichlet Allocation) vectors for document clustering or topic modeling
Dimensional Tagging
Convert text into dimensional keywords for analytical mapping
Advanced JSON Processing
Support for nested parsing, merging, updating, deleting, and user-defined parsing rules。
All Products

Power in One Suite

Trinity ETL
Data Integration
Management
Trinity SDM
Streaming Data
Management
Trinity BDM
Big Data
Management
Trinity JCS
Job Scheduling
Automation
Trinity Metaman
Data Governance & Data Quality
Trinity DP
Data Protection & De-identify
Trinity AN
Address
Normalization
Trinity UDM
Unstructured Data Management