Zendesk with Matillion Integration- Ticket Custom Fields Ingestion Challenges and Solution

By May 7, 2021 May 25th, 2021 Blogs

Zendesk provides a complete customer service solution that’s easy to use and scales with business. The conversations are seamlessly connected, irrespective of chosen medium, be it messaging , email or voice. It builds an integrated help center and community forum. Custom can resolve issues at their own pace – while reducing resolution times for agents. It also gives a picture of Team’s bandwidth and SLA clarity by business. How effective a team is in delivering to customers. It also helps agents, administrators and internal teams do their best work. Zendesk offers robust out-of-the-bos tools that are quick to set up and easy to use.

Matillion provides a powerful integration query component retrieves data from Zendesk database and loads into a table. We will take an example of Ticket [ important table ] and design ingestion pipeline end to end. For this use case Matillion with Redshift is used.

Challenges with Ticket table with Matillion 

  • As part of dataware , if ticket volume is more than 70K + , it goes into a Timeout issue.
  • Instead Zendesk provides – endpoint _Incremental Export to get items changed or were created since the last request.  For e.g. 
    • Request at 5pm: “Give me all the tickets that have changed since noon today.”
    • Response: “Here are the tickets that changed since noon up until, and including, 5pm.”
    • Request at 7pm: “Give me the tickets that have changed since 5pm.”
    • Response: “Here are the tickets that changed since 5pm up until, and including, 7pm.”
  • Now while getting data via Connector , we always have to give parameter as start_time in advanced mode 
    • Select * from ticket_incremental where start_time = ‘2021-03-31’
  • Since in DWH , we need the entire export to store or at-least incremental export to capture all changes done in a given window so no data is missed. With the above point , we might run into data missing conditions as start_time has to be provided as Value.
  • There is Column called “Custom_Fields” which is a dynamic Array of key value pairs of custom fields filled in Zendesk tickets , we need to extra parser to fetch these custom fields in Columnar format. As this is dynamic in nature , columns can be added or removed.

To overcome challenges mentioned above , we can design custom ticket load orchestration jobs to load data for range and also parse custom fields.

  1. Define following metadata which holds Custom Fields ID / Value and column Name
    1. Contains table name and schema name where ticket table will be stored
    2. ID , custom field name and data type [ based on content of data ]
    3. Custom_column_name can be derived by replacing  _ , ( , ) , space , > , ? in custom_field_name with “_”  and appending custom_ to it
      Zendesk with Matillion Integration - 11
    4. This can be derived from end point in zendesk  = ticket_fields  [https://{subdomain}.zendesk.com/api/v2/ticket_fields.json] It contains 2 fields i.e ID , Raw_Title which Custom Field Name
  2. Matillion Pipeline Step by Step Process Design
    • Drop all intermediate /temp table created
      Zendesk with Matillion Integration - 9
    • Get Date Range for which it needs to run via HWM set in Metadata from last run . Added -1  as safety NetZendesk with Matillion Integration - 15
      Zendesk with Matillion Integration - 1
    • With Diff_Days , get the Loop Iterator and derive start time to get data from Zendesk Connector
      Zendesk with Matillion Integration - 7
    • Load_ticket_landing will calculate the start time as start_range + Loop Indicator passed from Step 3 to run iteratively and load data in Pre Staging table 
      1. select * from ${jv_source_table_name} where start_Time = ‘${jv_start_time}’
        Zendesk with Matillion Integration - 2
      2. Please note – Set ReCreate_Table to OFF
      3. Custom Fields after staging are as follows , key/value pair
        Zendesk with Matillion Integration - 10
    • Remove duplicates if any and store the result in temp table
      Zendesk with Matillion Integration - 6Zendesk with Matillion Integration - 14
    • Set the Partition to run as Custom fields  Json Parser Code in parallel. As this is Nested JSON, it might run into performance issues if Number of custom fields for a given Zendesk is more. It can consume more credits or Redshift can timeout. Running it No of records will result in performance improvement. For this example jv_common_partition_cnt is set to 2000
      Zendesk with Matillion Integration - 21Zendesk with Matillion Integration - 17
    • This is a very important step. This code parse Custom Fields from Json to Column  dynamically depending on Step 1, metadata defined and store result into temp table
      Zendesk with Matillion Integration - 20

      • Query Used :
        • This code will convert custom fields into valid Json
        • Using Json extract path text , it will parse into Rows 
        • Audit_Config.Zendesk_Number_Config – should be a simple table with one column i.e. Ordinal which contains only numbers of custom fields i.e. 0 to 100 if 100 custom fields are there in ticket. So set a higher number to automatically get all custom added or removed without manual intervention.

 

INSERT INTO landing_incremental.ticket_zendesk_desktopsupport_custom

SELECT * FROM (

with

tickets as (

select   id , ‘{“custom_fields”:’|| custom_fields || ‘}’ as custom_fields , created_at

from  landing_incremental.ticket_zendesk_desktopsupport_sub

  where unq_key >= ($jvLoop-$jvpartitionset)*2000  and  unq_key < (1)*${jvpartitionset}

)  

, ticket_dtl as (

select

     id,

    json_extract_path_text(custom_fields, ‘custom_fields’, true ) as custom_fields , created_at

from tickets )

, joined_tickets as (

    select

        id  ,

        json_array_length(ticket_dtl.custom_fields, true) as number_of_items,

        json_extract_array_element_text(

            ticket_dtl.custom_fields,

            numbers.ordinal::int,

            true

            ) as custom_fields

    from ticket_dtl

    cross join (  Select Ordinal as ordinal From audit_config.zendesk_numbers_cfg 

                 order by 1  ) numbers

     where numbers.ordinal <

        json_array_length(ticket_dtl.custom_fields, true)

)  SELECT   

        joined_tickets.id as ticket_id,

        cfg.custom_column_name as custom_column_name,

        json_extract_path_text(custom_fields, ‘id’)::numeric as custom_field_id,

        json_extract_path_text(custom_fields, ‘value’) as value  

    FROM   joined_tickets ,  audit_config.etl_zendesk_custom_field_config cfg

    WHERE custom_field_id =  cfg.id   and cfg.is_active = true  )

    WHERE   value  IS NOT NULL;

                     Output ⇒
Zendesk with Matillion Integration - 3

    • Now step to PIVOT the Custom column to store on TICKET level. Above steps will get the data set in Row Format. This again has to be dynamic in order to support addition of new columns.
      Zendesk with Matillion Integration - 19
      Pivot Redshift Query Dynamic Derivation on Custom Metadata Data

jv_custom_select : SELECT 

‘SELECT ticket_id,’

|| RTRIM ( LISTAGG ( ‘MAX ( CASE WHEN custom_column_name = ”’ || custom_column_name || ”’ THEN value ELSE null END )   

                    AS ‘ || custom_column_name ||’, ‘) ,’, ‘) 

|| ‘ FROM   ${jv_landing_schema_name}.${jv_landing_table_name}_unq_custom 

   GROUP BY

          ticket_id’  as select_

 FROM ( select distinct custom_column_name from audit_config.etl_zendesk_custom_field_config  

WHERE datalake_schema_name = ‘${jv_datalake_schema_name}’ 

AND datalake_table_name = ‘${jv_datalake_table_name}’

AND is_Active)

 

    • Once this custom create table DDL is automatically generated, create and store results of final Pivoted data into Custom Final table
      Zendesk with Matillion Integration - 22

Output generated in Pivot Format
Zendesk with Matillion Integration - 12

    • Last Step 
      • Get all column names for custom columns dynamically so when joined that listagg value can be used.
        Zendesk with Matillion Integration - 18Zendesk with Matillion Integration - 5
      • Join the Ticket table with the Main Custom Table and Final Table is ready for use on Ticket ID
        Zendesk with Matillion Integration - 13Zendesk with Matillion Integration - 16       

Final Pipeline –
Zendesk with Matillion Integration - 8

Here comes the Final Output –

Zendesk with Matillion Integration - 4

Towards the end of this implementation , the advantages as follows but not limited 

  • As Matillion does not have a direct way to flatten this out , the above method gives flexibility to PIVOT columns to further processing.
  • Since it is a metadata drive , adding any column will need to Insert in Metadata – No code change , removing any column to set is_Active flag in Metadata from True to false – No code change is required.
  • In case of Custom Fields are excessive in numbers, parallel partitioned ensures best performance to PIVOT columns
  • All intermediate tables where results are stored are dropped towards the end. 
  • This provides more modularity to derive Custom Fields which in Json Format 
  • This does not require any special processing/component , all redshift inbuilt query power and metadata set drives this solution.
Prajakta Borkar

Prajakta Borkar

Enterprise Datawarehouse Architect, 14+ years of experience a host of technologies including Oracle Exadata, Snowflake, SQL-server, Shell/Powershell/PLSQL script, Datastage, Ab-Initio and specialized in modern cloud technologies like Snowflake, Matillion, Denodo, PowerBI, Business Objects with Data Modelling, Data analytics etc.