ATALT stands for:
A Transformation And Loading Tool
or
Arnulf's Transformation And Loading Tool
Together with Reporting Tools with Tcl I (apw) and some colleagues of mine have developed a tool to load data from different data sources and in different formats with some transformations into the reporting database. At that time (1995) this did include conversion from EBCDIC to ASCII! Nowadays this would be called the TL parts of ETL.
Based on the experience of that tool I want to implement a newer version of that from scratch using the ideas of that tool but implemented with a newer technology.
The general idea is:
- store the information about the input fields in a database (as meta info)
- store the information about the target tables and fields in a database (as meta info)
- store the information about the validations or the validations themselves in a database (as meta info)
- extract the fields from the info using the meta info
- validate the fields using the meta info
- generate the relevant insert/update/delete sql statements
- store the info in the relevant database table using the generated sql statements
There are several possible input formats used for 1. above (the most important ones are here):
- a header line and the data rows/lines as text separated by tabs
- there is a definition of a fixed with format (every field has a fixed number of characters and is filled with blanks, if shorter)
- there are key value pairs with a known number of key value pairs per row
- xml format with a known tag as the "brace" around the fields of a row, the tag names are like the keys in key/value pairs
- a json like format
- one Tcl dict per row
There needs to be an information about the type of the contents for 1. and 2. like:
The meta info database needs to contain the info about the sequence or the name of the headers to recognize a field/column and the type and possibly length of the input fields, as well as the (schema)/table/field name and type of the target database table and the fields needed for a unique key in the target tables.
Eventually for the input headers it is necessary to know the language of the input, if language dependent header names are possible.
The validation rules can be:
- a list of valid values in a database table
- a validation rule coded in Tcl (normally a proc or class method) which is sourced and whose name is stored as one of the validation rules for that input field
- a mapping script coded in Tcl (normally a proc or class method) which is used to map the format of the input to the format for input into the database (i.e. date format etc.)
- a validation rule for checking, if the person who provided the input is allowed to insert/delete/update a database row (depending on access rules)
- a validation rule for checking, if the person who provided the input is allowed to update a field within a database row (depending on access rules for fields)
Input can be one of the following:
- a file with several lines provided as input
- a single row provided from a GUI as input
- one or several rows provided from an interface for example a soap interface
Access rules are either stored in the meta info database too or are fetched from another application and are based on:
- an individual as a person identified by a user id or by a combination of name/first name/department etc. or by an email address
- access right for the individual for the application
- a person can have several functions/roles within an application
- access right for a person with a function
- access right for a function for a field in a target table
- type of access write (a person may read a field but may not have the right to modify that field)
- also for joins of some tables for reporting for example sales people may see all fields besides the internal cost rate for building a manufactured part
A more detailed description on what the functionality of that tool should be will follow here (I hope soon).