Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

Metadata Management - FAQ

Q: What is TMS?

TMS (Temenos Metastore Services): Used for converting the JSON format of TDE Designer job into the JSON format which is understood by ATLAS and
storing the metadata and lineage information in ATLAS.

For storing the metadata and lineage information related to any TDE job, the TMS needs 2 mandatory parameters, i.e Project ID and Job ID.

Both these parameters can be retrieved from Mysql JOBS table.

In addition to the above, TMS also can be used to create entities and types and perform CRUD operations on third party modules.

Q: What is the JSON format that ATLAS understands?

In other words, ATLAS does not understand the job JSON. Its JSON format usually consists of

1. Entities
2. Relationships
3. Columns
4. Data Sets
5. Process etc...

Q: What is a Data Set and what is a process?

Every entity may be a “Dataset” or a “Process”.

Dataset: The actual data set, stored in a table.

Process: Transformations, such as filters, which are applied on one component and the next component, will be the result of the same.

Q: Does ATLAS support consecutive transformations on one component?

ATLAS has one limitation. If we apply two consecutive transformations on a single component, then it is not supported.

The developers have fixed this by adding tde_in_memory schema type.

Q: How do we communicate with ATLAS?

It is through either Rest APIs or KAFKA.

Q: What is the use of KAFKA in communicating with ATLAS?

TMS uses KAFKA to create entities in ATLAS. The developers were facing performance issues in creating the entities in ATLAS using REST APIs and
hence the usage of KAFKA.

The other parts of the JSON, such as “relationships” and “columns” etc… are created using REST APIs only.

Q: How the TMS actually works?

As soon as a job is created and we run the TMS, it executes the below steps:

Step 1: Convert job JSON into intermediate JSON

This step will convert the Job JSON into an intermediate format, in which DBs and Schemas present in Job JSON are converted into “Entities” and
“Connections”.

- Entities: So, for example, if one of the component of the job is “Read MySQL”, then the corresponding DB, Table, Column details are all stored as
component.properties under “Entities” in JSON and the “RDBMS” is given as the unique identifier.

- Connections: This will help in establishing the lineage between the components present in the job. This will be stored in key-value format, with key
being the source component and value being the destination component.
This step is necessary for improved performance. Consider a scenario where there are large number of entities used in the job. If the job JSON is directly
passed into TMS, then TMS has to check all the entities for establishing the RDBMS entities, which will result in performance issues.

Step 2: Finally, convert the intermediate JSON to the JSON that ATLAS understands.

This is the actual logic.

In this step, TMS will read the Intermediate JSON and perform the below tasks:

1. Create the table in RDBMS_Table component in ATLAS


2. Create the columns in RDBMS_Columns
3. Create the DB in RDBMS_DB
4. Link all the above entities in ATLAS.
5. Create the TDE Process (for transformations) and in-memory schema (for successive transformations)
6. Establish the lineage based on “Connections” component in intermediate JSON.

Q: What all steps needs to be executed for deploying TMS in 10.20.0.136 server?

- The basic TMS configuration should be stored in form of a JSON file, such as

KAFKA Broker information


ATLAS URL (http://10.20.0.136:21000)

- This should be encrypted and stored as a file in the system

- Place the TemenosMetaService.WAR file under Tomcat/webapps folder.

- Currently, the ATLAS version in 10.20.0.136 is 1.1.0, which has many limitations. It will not display more than 10 connections at a time, for example.

- So, we need to upgrade ATLAS version in 10.20.0.136 to ATLAS 2.0

- Currently, the TDE datalake team is still in the process of finalizing the HTrunk engine code. In other words, there are different enhancements
already implemented in V7, which are not present in R20, or vice-versa. So, until the issue gets fixed (or in other words, until V7 and R20 are in sync), we
will not able to run the job directly and expect that the TMS will capture the metadata and lineage and store it in ATLAS.

- As an alternative, we can pass the Project ID and Job ID to the TMS and the TMS will then capture the metadata.

- TMS will not generate the required lineage, entity etc…

If the job fails


If the entities/lineage etc…already exists.

Q: Why does the TDE Portal user require ROLE_ADMIN privileges for executing Rest API and Kafka calls through Swagger UI against TMS?

Apache ATLAS provides three levels of authorization for access control. These controls are present in atlas-simple-authz-policy.json

1. ROLE_ADMIN: This is the admin privilege through which we can add/update/delete both entities and types. In addition to this, we can perform
Import and Export operations as well. This is a feature, which is still in development as far as TDE Product is concerned. A workaround is to
update the role of the TDE portal user to “ROLE_ADMIN” in mysql table LOGGER.

2. DATA_STEWARD: A restricted privilege through which the user can perform the below operations:

a. Read/Create/Update entities

b. Add/Update/Remove entity classification

3. DATA_SCIENTIST: Completely restricted privilege. Once the user is granted this privilege, the user can only read the entities. He/she cannot
perform any update/delete operations.

Source: https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.0.1/configuring-atlas/content/configuring_atlas_authorization.html/

Q: How does the TMS pull the mysql connection string from?

TMS pulls the connection string by reading “context.xml” file in Tomcat configuration directory.

While building the code, if the name is given as “jdbc/LoggingDB”, then the TMS will accordingly read the mysql connection string from the below string,
which is present in context.xml file.
<Resource auth="Container" driverClassName="com.mysql.jdbc.Driver" factory="com.tde.tomcatenc.EncryptedDataSourceFactory" initialSize="34"
maxActive="377" maxIdle="233" maxWait="10000" minEvictableIdleTimeMillis="55000" minIdle="89" name="jdbc/LoggingDB" password="IMMgtyA
/F0xelkVbTO+zQg==" removeAbandoned="true" removeAbandonedTimeout="55" testOnBorrow="true" timeBetweenEvictionRunsMillis="34000" type="
javax.sql.DataSource" url="jdbc:mysql://mysql-inside.tde-mysql:3306/htrunk?zeroDateTimeBehavior=convertToNull&amp;amp;
autoReconnect=true&amp;amp;allowMultiQueries=true" username="root" validationInterval="34000" validationQuery="SELECT 1"/>

You might also like