Metadata-Version: 2.1
Name: data-extraction-c.lynch278
Version: 0.0.5
Summary: A small example package
Home-page: https://github.com/pypa/sampleproject
Author: Example Author
Author-email: author@example.com
License: UNKNOWN
Project-URL: Bug Tracker, https://github.com/pypa/sampleproject/issues
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.6
Description-Content-Type: text/markdown
License-File: LICENSE

# DataExtraction
## Overview
This program is used to extract data from an implementation of Collibra into a SQL envrionment. In our case, we are extracting from Gore's instance of Collibra and into EXL_MDSDev. The program will extract the following objects:
  - Assets
  - Attributes
  - Attribute Types
  - Domains
  - Communities
  - Relations
  - Relation Types
  - Responsibilities
Each of these object types are stored as their own table in SQL.

## Instruction For Use

### Setup
The source executable file is kept in -- Artifact URL --. This folder contains a __main__.exe file and all of its dependencies. You will need add a 'prod_config.yml' file into the root of this folder. This config file can be found within the source code of the project and should be structured like so:
```yaml
API_CONFIG:
  limit: 1000000
AUTH:
  username: <Valid admin username in environment>
  password: <Valid admin password in environment>
  auth-header: <Auto-generated basic auth token, generated using postman>
ENVIRONMENT:
  gore: wlgore-<Envrionment Instance (dev,test,prod)>.collibra.com
```
### Running
Open a cmd prompt in the root of the project folder. Type __main__.exe and hit enter. The program will start to run and log its progress. During the run, the program will extract all data and overwrite the raw sql tables.
## SQL Tables
The following sql tables are created/overwritten during the run on this program:
  - collibra_assets_raw
  - collibra_attributes_raw
  - collibra_attribute_types_raw
  - collibra_communities_raw
  - collibra_domains_raw
  - collibra_relations_raw
  - collibra_relation_types_raw
  - collibra_responsibilities_raw

## SQL Stored Procedures
The following stored procedures are run on the raw tables to manipulate the data and add batch id's:
  - collibra.load_collibra_assets
  - collibra.load_collirba_attributes
  - collibra.load_collibra_attribute_types
  - collibra.load_collibra_communities
  - collibra.load_collibra_domains
  - collibra.load_collibra_relations
  - collibra.load_collibra_relation_types
  - collibra.load_collibra_responsibilities

All of these procedures may be run simultaneously with the collibra\._load_entire_batch procedure.

## Migrating from dev/test to prod
In order to change the environment in which the data extracton runs, the prod_config.yml file within the src folder will need to be changed. The username and password will need to be changed to that of a user in the new environment. Additionally the gore environment variable will need to be changed to the prod instance's URL. 
Example of a correctly configured prod_config.yml file for the prod environment: 
```yaml
API_CONFIG:
  limit: 1000000
AUTH:
  username: <Valid admin username in PROD environment>
  password: <Valid admin password in PROD environment>
  auth-header: <Auto-generated basic auth token, generated using postman>
ENVIRONMENT:
  gore: wlgore-prod.collibra.com
```


