Data cleaning is the process of amending or removing data in a database that is incorrect, incomplete, improperly formatted, or duplicated. Data is the most important asset of any decision that is made in an organization. Data are stored in databases and when the time comes to make an informed decision, query multiple databases and analyze the results, improper data can have dire effects. Check for an example where one unit of a store queries the database and adds current inventory to its own local database to communicate with the rest of the store. If the incoming data is dirty and the inventory of an item is below what is represented in the database, the unit will try to order more of the item unnecessarily. This could tie up funds that are needed elsewhere and also cause confusion when multiple units query the database looking for the same item. Data cleaning can provide the remedy. We would the clean the dirty inventory data and set up an automatic notification system that calls for a report generation when inventory levels are less than what is in the database.
Data Validation Techniques
The data validation techniques are involved in examining the quality of the data values as compared to the standard, rule or condition, which result during the data specification phase. It is all about error check! These errors are of few types like syntactical error which occurs due to spelling mistake, punctuation missing, illegal use of symbol etc. It may introduce inconsistency in the data and it can be done easily with the validation technique called syntax check. Another type of error occurs is about semantic of data, it occurs when the data values are not sensible compared to the given field. For example, age of infant with 25 years etc. It can be checked with the semantic validation technique. Another one is the constraint violation often it is occurred due to invalid use of input mask or constraint. For example, entering the text to the input specified only for date. This technique can be used to specify on the data values of the specified field on the entity or relationship. The last one functional dependency outlined in the data, if some specified condition applied to the data values it may lead to the certain output. This can be specified as the rule and can be verified using the conditional validation technique. Coming with the other concern the data validation must indicate the source of error and the possible methods to fix it. The data validation techniques mentioned above are quite effective ways to resolve issues.
No comments:
Post a Comment