Professional Documents
Culture Documents
Metadata Management - FAQ - 180420-0740-30
Metadata Management - FAQ - 180420-0740-30
Q: What is TMS?
TMS (Temenos Metastore Services): Used for converting the JSON format of TDE Designer job into the JSON format which is understood by ATLAS and
storing the metadata and lineage information in ATLAS.
For storing the metadata and lineage information related to any TDE job, the TMS needs 2 mandatory parameters, i.e Project ID and Job ID.
In addition to the above, TMS also can be used to create entities and types and perform CRUD operations on third party modules.
In other words, ATLAS does not understand the job JSON. Its JSON format usually consists of
1. Entities
2. Relationships
3. Columns
4. Data Sets
5. Process etc...
Process: Transformations, such as filters, which are applied on one component and the next component, will be the result of the same.
ATLAS has one limitation. If we apply two consecutive transformations on a single component, then it is not supported.
TMS uses KAFKA to create entities in ATLAS. The developers were facing performance issues in creating the entities in ATLAS using REST APIs and
hence the usage of KAFKA.
The other parts of the JSON, such as “relationships” and “columns” etc… are created using REST APIs only.
As soon as a job is created and we run the TMS, it executes the below steps:
This step will convert the Job JSON into an intermediate format, in which DBs and Schemas present in Job JSON are converted into “Entities” and
“Connections”.
- Entities: So, for example, if one of the component of the job is “Read MySQL”, then the corresponding DB, Table, Column details are all stored as
component.properties under “Entities” in JSON and the “RDBMS” is given as the unique identifier.
- Connections: This will help in establishing the lineage between the components present in the job. This will be stored in key-value format, with key
being the source component and value being the destination component.
This step is necessary for improved performance. Consider a scenario where there are large number of entities used in the job. If the job JSON is directly
passed into TMS, then TMS has to check all the entities for establishing the RDBMS entities, which will result in performance issues.
Step 2: Finally, convert the intermediate JSON to the JSON that ATLAS understands.
In this step, TMS will read the Intermediate JSON and perform the below tasks:
Q: What all steps needs to be executed for deploying TMS in 10.20.0.136 server?
- The basic TMS configuration should be stored in form of a JSON file, such as
- Currently, the ATLAS version in 10.20.0.136 is 1.1.0, which has many limitations. It will not display more than 10 connections at a time, for example.
- Currently, the TDE datalake team is still in the process of finalizing the HTrunk engine code. In other words, there are different enhancements
already implemented in V7, which are not present in R20, or vice-versa. So, until the issue gets fixed (or in other words, until V7 and R20 are in sync), we
will not able to run the job directly and expect that the TMS will capture the metadata and lineage and store it in ATLAS.
- As an alternative, we can pass the Project ID and Job ID to the TMS and the TMS will then capture the metadata.
Q: Why does the TDE Portal user require ROLE_ADMIN privileges for executing Rest API and Kafka calls through Swagger UI against TMS?
Apache ATLAS provides three levels of authorization for access control. These controls are present in atlas-simple-authz-policy.json
1. ROLE_ADMIN: This is the admin privilege through which we can add/update/delete both entities and types. In addition to this, we can perform
Import and Export operations as well. This is a feature, which is still in development as far as TDE Product is concerned. A workaround is to
update the role of the TDE portal user to “ROLE_ADMIN” in mysql table LOGGER.
2. DATA_STEWARD: A restricted privilege through which the user can perform the below operations:
a. Read/Create/Update entities
3. DATA_SCIENTIST: Completely restricted privilege. Once the user is granted this privilege, the user can only read the entities. He/she cannot
perform any update/delete operations.
Source: https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.0.1/configuring-atlas/content/configuring_atlas_authorization.html/
Q: How does the TMS pull the mysql connection string from?
TMS pulls the connection string by reading “context.xml” file in Tomcat configuration directory.
While building the code, if the name is given as “jdbc/LoggingDB”, then the TMS will accordingly read the mysql connection string from the below string,
which is present in context.xml file.
<Resource auth="Container" driverClassName="com.mysql.jdbc.Driver" factory="com.tde.tomcatenc.EncryptedDataSourceFactory" initialSize="34"
maxActive="377" maxIdle="233" maxWait="10000" minEvictableIdleTimeMillis="55000" minIdle="89" name="jdbc/LoggingDB" password="IMMgtyA
/F0xelkVbTO+zQg==" removeAbandoned="true" removeAbandonedTimeout="55" testOnBorrow="true" timeBetweenEvictionRunsMillis="34000" type="
javax.sql.DataSource" url="jdbc:mysql://mysql-inside.tde-mysql:3306/htrunk?zeroDateTimeBehavior=convertToNull&amp;
autoReconnect=true&amp;allowMultiQueries=true" username="root" validationInterval="34000" validationQuery="SELECT 1"/>