Historical data cleaning of the data stored by the enterprise in various business systems according to the established data standards, is an inevitable stage in the process of data standardization for high quality data standard library construction. As a non-project construction work of the enterprise, data standardization requires normal data standardization management mode. Therefore, enterprises need to fully realize that data cleaning is vital to ensure the enterprise's high quality data standard library, and data governance possesses complexity, professionality, technicality, long-term characteristic and persistency.
Data governance is critical to ensuring the accuracy, the moderate sharing and the protection of data. An effective data governance plan will return values to the business and ultimately increases revenue and profits by improving decisions, cutting costs, reducing risks and promoting safety compliance.
Data cleaning platform (DCP) is the core standard component of SunwayWorld's information standardization and management integration platform (6P+2E+Mobile). The data cleaning platform can establish the open data cleaning function based on the many-to-many relation data mode, and supports the extraction, word segmentation, semantic recognition, cleaning and integration of raw data so as to build a master data information library of different subject models. The platform interface operations friendly, which enables a quick start of enterprise managers to control the extraction, cleaning and realignment of existing data, including the transformation of mapping relations and the storage of comparison relations. Thus, high efficiency human intervention and data validation can be achieved, which greatly improves the systematic and intelligent support for the data cleaning of the enterprise, and reduces the operational complexity of data cleaning, and improves data quality.
The function of mapping relation establishment
Support for matching strategy definition in the process of data cleaning: matching rules, complex matching strategies, data features and so on can be defined by users. The system can establish the initial mapping relationship based on the automatically scanned results, enumerate data in the standard library based on similarity and flexibly establish mapping relationship and cancel matching through online duplicate checking function and manual intervention.
Semantic recognition function
Moreover, the suite supports semantic definition and recognition functions, supports enterprise-level semantic model library, compares each original material data with the semantic model in the semantic library and recognizes and extracts necessary information through semantic context. In spite of differences in input format, punctuation and terminology, etc., semantic recognition can be realized and the semantic recognition of the Master Data Management System can be applied in structural data and nonstructural data. The semantic recognition and parsing improve data processing efficiency.
Batch data operation and management
The system can detect non-compliant data based on the detection rules and complete batch operation through manual intervention on the basis of automatic result detection by the system. In the process of data cleaning, the range of list of values of standard template attribute and upper and lower limits can all be added. The suite also supports batch category designation of original data and appointment of responsible persons, supports self-definition of cleaning rules through logical conditions and verification rules and completes batch data cleaning through the operating cleaning rules.
01 Data import and extraction management
Support to extract master data information from the business system through configuration, or gain direct access to ODBC data source, XML data source, Excel form, text files and other imported master data information. Make online duplicate checking, list data with close similarity, calculate the similarity value and generate standardized master data and make verification analysis based on the defined coding rules.