Skip to content

https://ecosystem.delphix.com/docs/main/sensitive-data-discovery-with-delphix

Sensitive Data Discovery with Delphix

This section describes a solution template you can use to identify sensitive data columns from any data source using the Delphix Continuous Compliance Engine. For a high-level overview, refer to the Azure Architecture Center documentation.

Prerequisites

  1. Azure subscription - If you don’t have an Azure subscription, create a free Azure account before you begin.
  2. Delphix Continuous Compliance Engine - You must purchase a Delphix Continuous Compliance Engine. If you don’t have a Delphix Continuous Compliance Engine, create a Delphix Continuous Compliance Engine in Azure and register your engine. Refer to Azure Installation for more details on installing and configuring a Delphix Continuous Compliance Engine in Azure.
  3. Logic app for fileformat upload - Make sure that the Azure subscription supports the creation of consumption plan-based logic app resources. The following are the steps for generating a logic app for fileformat upload:

    • Click the Deploy to Azure button below to deploy this logic app in your Azure account. This step redirects to the custom deployment form for the logic app.
      Deploy to Azure
    • Select the resource group, region, and workflow name. Click review & create.
    • Once the logic app is generated and deployed, the URL can be retrieved from the step When an HTTP request is received. Click the Designer section to view the steps of the Logic App.
    • [Required] Copy the URL for the logic app from the value of the field HTTP POST URL. This URL is used as an input parameter for the Sensitive Data Discovery with Delphix pipeline.
  4. Azure Key Vault - Username & Password to login into the Delphix Continuous Compliance Engine should be present in the Azure Key Vault. Adequate permissions must be provided to access both the username and password information.

  5. Source Dataset - A source dataset containing unmasked data that can be connected from the ADF pipelines.
  6. Storage Account - A storage account with containers, a fileshare, and permissions to create the directories.
  7. FileShare - A FileShare to host the extracted source csv files and the masked target csv files that is required for the RuleSet creation. Ruleset creation will serve as a pre-requisite for Delphix’s Masking Job.
  8. Linked Service(s) - Linked Service to connect to source dataset, fileshare, and intermediate containers.
  9. Container and FileShare Root Folders - A dedicated container and a folder in the file share for processing. The values for these would be needed in the ContainerRootFolder and FileShareRootFolder parameters respectively.

Input Parameters

This template requires the following input parameters:

  1. VaultURLForUsername - Key Vault URL to fetch the Delphix Continuous Compliance Engine username. The URL is structured as {VaultBaseURL}/secrets/{secret-name}?api-version={keyVaultVersion}.
  2. VaultURLForPassword - Key Vault URL to fetch the Delphix Continuous Compliance Engine password. The URL is structured as {VaultBaseURL}/secrets/{secret-name}?api-version={keyVaultVersion}.
  3. MaskingEngineIP - IP of the Delphix Continuous Compliance Engine.
  4. LogicappURL - The URL to access the logic app deployed above.
  5. QueryToGetTableSchema - Query to fetch schema name and table name from the source database. This query fetches schema name and table name for all tables that are required in the target data source. The query given in the below example can be modified based on the source database.

    Example

    SELECT TABLE_SCHEMA,TABLE_NAME FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA = '{source schema}’

  6. FileShareUserName - Key Vault URL to fetch the username for the file share. It would be needed to mount the file share on the Delphix Engine (usually your storage account name).

  7. FileSharePassword - Key Vault URL to fetch the password for the file share. It would be needed to mount the file share on the Delphix Engine.
  8. FileShareMountPointName - Name of the mount point that would appear on the Delphix Engine.
  9. FileShareUrl - Connection URL/host address for your storage account.
  10. FileShareName - Name of the file share that you want to mount.
  11. ApplicationName - Application name for setting up profiling.
  12. SourceEnvironmentName - Name of the source environment.
  13. TargetEnvironmentName - Name of the target environment.
  14. SourceConnectorName - Name of the source connector.
  15. TargetConnectorName - Name of the target connector.
  16. SourceRulesetName - Name of the source ruleset
  17. TargetRulesetName - Name of the target ruleset.
  18. ContainerRootFolder - Name of the container where the masked data and file formats would be stored.
  19. FileShareRootFolder - Name of the root folder in the file share that would be used to store unmasked and masked data.

For more information, refer to the Delphix Engine Terminologies.

Note

  • For different source environments or engine, it is recommended to use different root folders to maintain consistency of the uploaded file formats.
  • Host address of your storage account usually has the following pattern: <Storage Account name>.file.core.windows.net.
  • To find the username or password of the file share, refer to Step 7 in the above steps. To view the script, click the Show Script button. The username and password information can be found in the script.

About this Template

This template extracts the schema information for all tables from the source dataset, converts the schema into a format required by the Delphix Continuous Compliance Engine, and uploads the schema information into the Delphix Continuous Compliance Engine. This schema information is used by the Delphix Continuous Compliance engine to identify and flag sensitive data columns a.k.a Profiling.

Note

The Source dataset must be a supported database in Azure.

The template contains the following activities:

  1. Lookup activity retrieves the name of all the tables that are present in the source database with respect to the input schema and passes them to subsequent ForEach activity.
  2. Get Metadata activity checks if schemafile directory was already created.
  3. If activity cleans up the schemafile directory if created.
  4. ForEach activity gets the table name from the Lookup activity and iterates each table to fetch the header (column) information and store the same into a file. These will be created within the schemafile directory within a container. One file is created for each table. The name of the file will be {SCHEMA_NAME}.{TABLE_NAME}.csv.

    This ForEach activity will also be used to create empty files with the same name as mentioned above within the target directory of the Fileshare. These files are required to set up the masking Job on the Delphix Continuous Compliance Engine.

  5. Execute Pipeline activity sets up the folder structure required for the processing of the pipeline.

  6. Get Metadata activity gets the list of all the files that were created within the schemafile directory.
  7. ForEach activity scans all the files created within the schemafile directory and processes the file content into the desired format and writes the same into the fileformat directory.
  8. Get Metadata activity to get all the files that needs to be uploaded in the current execution.
  9. Get Metadata activity to get all the files that were uploaded in the previous executions.
  10. Filter activity to get all the files that were previously not uploaded or fresh files.
  11. ForEach activity to append the above files in an array.
  12. Filter activity to list all previously uploaded files and are being uploaded in this execution as well.
  13. ForEach activity to check if all filtered files in the above filter are fit to be uploaded.
  14. Web activity to fetch the username from Azure Key Vault to login into the Delphix Continuous Compliance Engine.
  15. Web activity to fetch the password from Azure Key Vault to login into the Delphix Continuous Compliance Engine.
  16. Web activity logins into the Delphix Continuous Compliance Engine using the credentials that are retrieved from Azure Key Vault.
  17. Execute pipeline activity runs setup on the provided Delphix Engine. This activity does the following activities based on the parameters provided:

    1. Creates an application.
    2. Creates the source and target Environment.
    3. Mounts the file share and creates a mount point based on the parameter name.
    4. Creates the source and target connectors for the above mount point.
    5. Creates a dummy ruleset for the source and target environment.
  18. ForEach activity iterates over all the processed files, append the filename with a subset of pipeline run id to make the filename unique, and upload the same to the Delphix Continuous Compliance Engine. The name of the files uploaded will appear as follows:
    {SCHEMA_NAME}.{TABLE_NAME}.csv_{pipeline run ID}.

    This upload will be done with the help of the Logic app that you must install as a part of the prerequisites. This also cleans up the fileformat for tables where table schema has changed given the older uploaded fileformat is no longer linked to the file(table).

How to Use this Template

  1. Go to the template gallery.
  2. Select the Sensitive Data Discovery with Delphix template.
  3. Select the Linked Service to connect to the Azure blob storage. It is required to create the intermediate containers. If no service is available, click on + New and add the Linked Service.
  4. Select the Linked service for the Azure fileshare. If no service is available, click on + New and add the Linked Service.
  5. Select the Linked Service to connect to the source database from the dropdown. If no service is available, click on + New and add the Linked Service.
  6. Click Use this template.

    This step creates a pipeline in your data factory.
  7. Click Debug to start the pipeline. You must provide the input parameters to proceed further.
  8. After you run the template, complete the Profiling and Inventory review (confirming that you are satisfied with the sensitive data discovered by Delphix), before using the masking template. For more details on the Inventory, refer to The Inventory Screen section in the Continuous Compliance documentation.

    Note

    • When you are selecting a fileformat during Ruleset creation in the Delphix Continuous Compliance Engine, you may see multiple fileformats for the same file. It is because of the creation of a new fileformat upon each Sensitive Data Discovery pipeline run. The name of the file format is of the form [SCHEMA-NAME].[TABLE-NAME].csv_[first 8 characters of pipelineRunID]. You must select the fileformat corresponding to the pipelineRunId.
    • When linking a fileformat to a file, use CR+LF (Windows/DOS) as the end of line character.