Go Green One tree
One life
Trees
Loading...

Extract Optimization in Tableau: Best Practices and Strategies

Author
Kajal Sharma
Posted

July 19, 2024

Updated

July 24th, 2024

Category Tableau

One of the most essential practices when dealing with Tableau is maintaining extracts—when extract schedules lead to high resource utilization or the extracts take a lot of time to complete, performance suffers.

This blog will discuss when and how extraction can be tuned to get the best performance from your Tableau Server.

When to Optimize for Extracts

  • High CPU Usage and Extract Schedules: Check the CPU site status. If you find that some jobs take a very long time or several jobs run simultaneously, they may cause high memory usage. Try to schedule the jobs for different times within the day.
  • Extracts Failing or Running Slowly: Rely on the background tasks for extracts administrative view to count failed extracts and the total time to complete them. Recurring errors may point to problems with specific data sources.

Strategies or Best Practices for Extract Optimization

Extract Optimization in Tableau

Adjusting Extract Refresh Schedules

To optimize for extracts, start by adjusting the refresh schedules:

  • Identify Optimal Times: Examine the Background Jobs dashboard to determine the optimal time for running extracts during non-business hours whenever possible.
  • Distribute Refreshes: To avoid overloading the servers, try distributing extract refreshes rather than doing them all at once.
  • Reduce Frequency: Schedule the frequency of the extract refreshes to the lowest possible frequency without compromising the data you want to get in the reports.

Speeding Up Specific Extracts

Identify and address failing or long-running extracts using the Background Tasks for Extracts view:

  • Reduce Extract Size: Optimize server load by preprocessing the data to keep extracts small and frequent. Some of the checklists on the Tableau desktop include Hide All Unused Fields and Aggregate Data for Visible Dimensions.
  • Incremental Refresh Jobs: Add incremental refresh jobs to maintain the extract by constantly appending new information rather than rebuilding it from the ground up. This approach works quickly as it only processes new data in its cycles. However, one must not forget to run a few complete refresh jobs to incorporate new updates to the metadata.

Specifying the Modes for the Extract Refreshes

Set the extract refresh schedules to the parallel execution mode due to many backgrounder processes to improve the speed at which tasks are completed. In some instances, running the schedules in serial mode may be advantageous, perhaps when a massive job requires all the backgrounder processes.

Increasing Backgrounder Processes

A basic background process may take a single entire CPU core to complete some tasks. Hence, the number of instances that must be run depends on the number of cores in the computer. For Tableau Server clusters, set the number of backgrounder processes between the cores available on the backgrounder node divided by two and the total cores on the backgrounder node.

Isolating Processes

This is especially useful when the Tableau Server is deployed in a cluster and backgrounder processes must be moved to a different node to avoid resource contention issues. They also help prevent CPU-intensive processes, such as VizQL server and data engine processes, from affecting the server.

Check Database Health

Here are the essentials for optimizing Tableau extract performance: stop Creating huge lists or views of SQL tables for a Tableau extract to pull. Before making any changes to the data source, the Tableau Developer should check the performance of the query in the database. Suppose the query takes longer to complete the execution in the database. In that case, the Tableau Developer must contact the respective Database Administrator, as the reason for performance delay can be index fragmentation or the missing join condition.

Points to remember while optimizing the Extracts

  • Avoid using full outer joins in the data source.
  • Use Relationships instead of joins.
  • While creating the extracts, remember to hide all the unused fields.
  • Apply Extract filters to remove unnecessary data. If isdeleted=true, then such data should not be included in the extract.
  • Avoid Writing Complex Logic. Divide the logic into multiple calculations to reduce the computing time. Practice creating generalized calculations and either deleting or hiding unnecessary calculations.
  • If the data source needs to contain many tables, it can be divided into multiple parts and used in the Tableau dashboard using data source blending.
  • Before optimizing the extract, execute the query in the database to check whether the database health is fine.
  • Remove unnecessary tables from the data source and try to optimize the joins. For example, if the two transactional tables need to be joined, make sure that the date fields and the common dimension field are also added to the join conditions.

Conclusion

Techniques used in improving extracts include analyzing CPU utilization by extracts, modifying extract refresh options, and ensuring that processes run as efficiently as possible in Tableau.

Following these strategies, one can improve the Tableau Server’s readiness for the Server’s SG extract refreshes, which deliver accurate data updates.

Are you looking to hire tableau developers for your project? Contact us today to discuss your project.

Kajal Sharma
Author
Kajal Sharma

Kajal Sharma is a Senior Content Writer at SPEC INDIA with over 6 years of experience. Specializing in SEO-centric writing, and with a strong hold in the IT sector, she excels at crafting engaging and optimized content. Kajal is adept at driving measurable results through strategic content creation. She also has wide experience in running marketing campaigns.

Let’s get in touch!