开始使用 Tableau Prep

  • 版本 :2022.1 及更高版本

注意:从版本 2020.4.1 开始,作为创建者,您还可以在 Web 上创建和编辑流。本教程是使用 Tableau Prep Builder 设计的,但也可以在 Web 上完成,但有一些明显的例外。

本教程向您介绍 Tableau Prep 中可用的常见操作。使用 Tableau Prep 附带的示例数据集,您将演练如何为示例超级商店创建流。本教程使用最新版本的 Tableau Prep Builder。如果您使用的是以前的版本,则结果可能会有所不同。

在此过程中,请观看相关提示,深入了解 Tableau Prep 如何帮助您清理和调整数据以进行分析。

要在继续学习本教程之前安装 Tableau 准备构建器,请参阅从用户界面安装 Tableau Desktop 或 Tableau Prep Builder(链接在新窗口中打开)在 Tableau Desktop 和 Tableau Prep Builder 部署指南中。否则,您可以下载免费试用(链接在新窗口中打开).

示例文件

若要完成本教程中的任务,您需要安装 Tableau Prep Builder,或者,如果在服务器版本 2020.4 或更高版本上启用了 Web 制作,则还可以在 Web 上尝试这些步骤。

在计算机上安装 Tableau Prep Builder 后,您还可以在以下位置找到示例文件:

  • (视窗)C:\Program Files\Tableau\Tableau Prep Builder

  • (苹果)/Applications/Tableau Prep Builder

或者,从这些链接下载示例文件,并创建示例目录和南部子目录。如果在 Web 上完成本教程,则需要执行此操作。

下载到示例目录下载到南方子目录

故事是这样的...

您在一家大型零售连锁店的总部工作。您的老板想要分析公司过去四年的产品销售和利润。您建议他使用 Tableau Desktop 来执行此操作。你的老板认为这是一个好主意,并希望你做对。

当您开始收集所需的所有数据时,您会注意到每个区域的数据的收集和跟踪方式不同。您还注意到在不同的文件中输入了很多创意数据,并且一个地区甚至每年都有一个单独的文件!

在开始分析 Tableau 中的数据之前,您必须先进行一些严肃的数据清理,这将是一个漫长的夜晚。

当您翻找餐厅菜单以订购晚餐时,您会记得 Tableau 有一款名为 Tableau Prep 的产品,它可以帮助您完成繁重的数据清理任务。

您下载产品,或注册免费试用(链接在新窗口中打开)并决定试一试。

1. 连接到数据

打开 Tableau Prep Builder 时,您首先看到的是带有“连接”窗格的“开始”页面,就像 Tableau Desktop 一样。

首先,第一步是连接到数据并创建输入步骤。从那里,您将开始构建一个工作流或“流”(在 Tableau Prep 中称为工作流或“流”),并添加更多步骤以随时对数据执行操作。

提示:输入步骤是数据的引入点,也是流的起点。您可以有多个输入步骤,其中一些步骤可能包含多个数据文件。有关连接到数据的详细信息,请参阅连接到数据(链接在新窗口中打开).

不同地区的销售数据文件以不同的格式存储,而您来自南方的订单实际上是多个文件。查看“连接”窗格,发现有很多选择来连接到数据。伟大!

由于您的其他区域对于所有四年的数据都有一个文件,因此您决定首先处理来自南方的文件。

  1. 在“连接”窗格上,单击“添加连接”按钮。

    在 Web 制作中,从主页单击“创建>”,或从“浏览”页中单击“新建>”。然后单击“连接到数据”。

  2. 这些文件.csv文件,因此请在连接列表中选择“文本文件”。

  3. 导航到文件的目录。在“订单南部”子目录中,选择第一个文件orders_south_2015.csv,然后单击“打开”将其添加到流程中。(有关文件位置,请参阅总结和资源

    连接到第一个文件后,将打开 Tableau Prep Builder 工作区,您会看到它分为两个主要部分。顶部的“流程”窗格和底部的“输入”窗格。

    与 Tableau Desktop 非常相似,此“流程”窗格是您的工作区,您可以在其中直观地与数据进行交互并构建流程。“输入”窗格包含有关如何引入数据的配置选项。它还显示数据集中的字段、数据类型和值示例。

    我们将在下一节中介绍如何与此数据进行交互。

    提示:对于单个表,当您向流程中添加数据时,Tableau Prep 会在“流程”窗格中自动为您创建一个“输入”步骤。否则,您可以使用拖放将表添加到“流程”窗格中。

  4. 您在南方的订单还有另外三个文件,如何组合它们取决于您的工作地点。

    在 Tableau Prep Builder 中:

    In Tableau Server or Tableau Online:

    The wildcard option isn't currently available for Tableau Server or Tableau Online. Still, you want to include all of the files from the South and handle the data alike, so combining them makes sense.

    1. Drag Orders_South_2016 on top of Orders_South_2015 and drop it on the Union option.

    2. Drag Orders_South_2017 on top of the new Union step and drop it on Add. Repeat this step with the final file.

    1. Repeat steps 2 and 3 to add the rest of the files from the Orders South subdirectory.

    2. Combine them with a union step. (For more details, see Union files and database tables in the Input step(Link opens in a new window).)

    3. 您可以单独添加每个文件,但您希望将所有文件合并到一个输入步骤中,因此请单击“输入”窗格中的“多个文件”选项卡。

    4. 您会看到通配符并集的选项。选择它。

      您注意到,选择文件的目录已填充,所需的其他文件列在“输入”窗格的“包含的文件”部分中。

      提示: 使用通配符并集是连接和组合具有相似名称和结构的单个数据源中的多个文件的好方法。若要使用此选项,文件必须位于同一父目录或子目录中。如果您没有立即看到所需的文件,请更改搜索条件。有关详细信息,请参阅在输入步骤中合并文件和数据库表(链接在新窗口中打开).

    5. Click Apply to add the data from these files to the orders_south_2015 input step.

    6. The files for the other regions are all single table files, so you can select all of the files at once and add them to your flow.

      Note: On the web, files can only be uploaded individually.

  5. Add the remaining files.

    In Tableau Prep Builder:

    Note: These are different file types. If you don't see all of these files, make sure your file explorer or finder is set to view all file types.

    In Tableau Server or Tableau Online:

    • Follow steps 2 and 3 to add Orders_Central.csv and Orders_West.csv.

    • On the Connections pane, click the Add connection button. Click Microsoft Excel and select Orders_East.xlsx.

    • Orders_Central.csv

    • Orders_East.xlsx

    • Orders_West.csv

    • Open File Explorer or Finder and navigate to the directory for the files. Ctrl-click or Cmd-click (MacOS) to select the following files and drag-and-drop them onto the Flow pane to add them to your flow. (For file location, see Wrap up and resources.)

Check your work: Watch "Connect to data" in action.

Click the image to replay it

2. Explore your data

Now that you have the data files loaded into Tableau Prep, you're pretty sure that you want to combine all of the files together. But before you do that, it might be a good idea to take a look at them first and see if you can spot any issues.

When you select an Input step in the Flow pane, you can see the settings used to bring in the data, the fields that are included, and a preview of your values.

This is a good place to decide how much data you want to include in your flow and remove or filter fields that you don't want. You can also change any data types that were assigned incorrectly.

Tip: If you are working with large data sets, Tableau Prep automatically brings in a sample of the data to maximize performance. If you don't see the data you expect, you might need to adjust the sample. You can do this on the Data Sample tab. For more information about configuring your data options and sample size, see Set your data sample size(Link opens in a new window).

In the Flow pane, as you select each step and look over each data set, you notice a few things that you want to fix later and one thing that you can fix now in the Input step.

  • Select the Orders_West Input step.

    • The State field uses abbreviations for the state name. Other files spell this out, so you'll need to fix that later.

    • There are a lot of fields that start with Right_. These fields appear to be duplicates of the other fields. You don't want to include these duplicate fields in your flow. This is something you can fix right here in the Input step:

      To fix this now, clear the check box for all fields that start with Right_. This tells Tableau Prep to ignore these fields and not to include them in the flow.

      Tip: When you perform cleaning operations in a step, like removing fields, Tableau Prep tracks your changes in the Changes pane and adds an annotation (in the form of a little icon) in the Flow pane to help you keep track of the actions you take on your data. For Input steps, an annotation is also added to each field.

  • In the Flow pane, click the Orders_Central Input step to select it. In the Input pane, you notice the following issues:

    You'll need to do some cleaning on these fields before you can combine this file with the others files. But you can't fix that here in the Input step, so you make a note to do this later

    • The order dates and ship dates are separated out into fields for month, day, and year.

    • Some of the fields have different data types than the same fields in other files.

    • There is no field for Region.

  • Select the Orders_East Input step.

    The fields in this file look like they align pretty well with the other files. But the Sales values all seem to have the currency code included. You'll need to fix that later, too.

Now that you've identified a few troublemakers in your data sets, the next step is to examine your data a bit more closely and clean up any issues that you find so that you can combine and shape your data and generate an output file that you can use for analysis.

3. Clean your data

在 Tableau Prep 中,检查和清理数据是一个迭代过程。确定要使用的数据集后,下一步是通过对该数据应用各种清理、整形和组合操作来检查该数据并对其执行操作。您可以通过向流程中添加步骤来应用这些操作。有关清洁选项的详细信息,请参阅清理和调整数据(链接在新窗口中打开)

步骤有多种形式,具体取决于您要执行的操作。例如,每当要将清理操作应用于字段(如筛选、合并、拆分、重命名等)时,请添加清理步骤。添加聚合步骤以对字段进行分组和聚合,并更改数据的详细级别。有关不同步骤类型及其用法的详细信息,请参阅构建和组织流(链接在新窗口中打开).

提示: 当您向流程中添加步骤时,会自动添加一条流程线以将步骤相互连接。您可以移动这些流线,并根据需要删除或添加它们。

运行流程时,需要这些连接点,以便 Tableau Prep 了解哪些步骤已连接以及这些步骤在流程中的应用顺序。如果缺少流线,则流将中断,并且您将收到错误。

清洁Orders_Central

要解决您之前注意到的问题并查看是否存在任何其他问题,请首先向“Orders_Central输入”步骤添加清理步骤。

  1. 在“流程”窗格中,选择Orders_Central,执行下列操作之一:

    向流中添加清理步骤时,工作区会发生变化,并且您会看到数据的详细信息。

    A. “流程”窗格,B. 工具栏,C. “配置”窗格,D. 数据网格

    工作区现在分为三部分:“”窗格、带工具栏的“配置文件”窗格和“数据”网格。

    配置文件”窗格显示数据的结构,将字段值汇总到条柱中,以便您可以快速查看相关值并发现异常值和 null 值。“数据”网格显示字段的行级别详细信息。

    提示“配置”面板中的每个字段都显示在配置文件卡上。使用每个卡上的“更多选项”菜单(在早期版本中为下拉箭头)可查看并选择可用于该字段类型的不同清洁选项。您还可以对字段值进行排序、更改数据类型、为字段分配数据角色或拖放配置文件卡和数据网格中的列以重新排列它们。

    使用计算字段清理数据

    此数据集缺少“区域”字段。由于其他数据集具有此字段,因此您需要添加它,以便以后可以合并数据。您需要使用计算字段来执行此操作。

    • 单击加号图标并添加清理步骤。根据您的版本,此菜单选项为“添加步骤”、“添加清理步骤”“清理步骤”。

    • 单击建议的清理步骤(Tableau Prep Builder 版本 2020.3.3 及更高版本以及 Web 上)

  2. 在工具栏中,单击“创建计算字段”。

  3. 在计算编辑器中,将计算字段命名为“区域”。然后输入“中心”(包括报价)并点击保存

    You love the flexibility of being able to use calculated fields to shape you data. You are pleased to see that Tableau Prep uses the same calculation editor language as Tableau Desktop.

    Tip: When you make changes to your fields and values, Tableau Prep keeps track of them in the Changes pane on the left. An icon (annotation) representing the change is also added to the cleaning step in the flow and to the field in the Profile pane. We'll look at the Changes pane after making more changes.

    Next you want to address the separate order date and ship date fields. You want to combine them into two single fields, one for Order Date and one for Ship Date so they align with the same fields in the other data sets. Making sure your tables have the same fields will enable you to combine the tables using a union later.

    You can use a calculated field again to do this in one easy step.

  4. In the toolbar, click Create Calculated Field to combine the Order Year, Order Month, and Order Day fields into one field with the format "MM/DD/YYYY".

  5. In the Calculation editor, name the calculated field Order Date. Then enter the following calculation and click Save:

    MAKEDATE([Order Year],[Order Month],[Order Day])

    Now that you have a new field for your order date, you want to remove the existing fields, as you no longer need them.

    You have a lot of fields in the Profile pane. You notice a Search box in the top right corner on the toolbar. You wonder if you can use that to quickly find the fields that you want to remove. You decide to give it a try.

  6. In the Profile pane, in the search box, type Order.

    Tableau Prep quickly scrolls all the fields with Order in the name into view. Cool!

  7. Ctrl-click or Cmd-click (MacOS) to select the fields for Order Year, Order Month, and Order Day. Then right-click on the selected fields and select Remove (Remove Field in prior versions) from the menu to remove them.

  8. Now repeat steps 4 though 7 above to create a single field for Ship Date. Try it on your own or use the steps below to help you.

    Tip: Tableau Prep summarizes the data in the Profile pane into bins to help you quickly see the shape of your data, find outliers, spot relationships between fields, and so on.

    In this scenario, the order and ship dates can now be summarized by year. Each bin represents a year from January of the starting year to January of the following year and is labeled accordingly. Because there are sales dates and ship dates that fall in the latter part of 2018 and 2019, we get a bin for that data that is labeled with the ending year 2019 and 2020 accordingly.

    To change this view to the actual dates, click the More options menu (drop-down arrow in prior versions) in the Profile card and select Detail.

    Interact directly with fields to clean your data

    Your data is starting to look good. But, as you finish removing the extra fields for the order and ship dates, you notice that the Discounts field has a couple of issues.

    This will cause a problem when you combine the files, so you better fix that too.

    • It's assigned to a String data type instead of a Number (decimal) data type.

    • There's a field value None instead of a numeric value for no discount.

    • In the toolbar, click Create Calculated Field to combine the Ship Year, Ship Month, and Ship Day fields into one field with the format "MM/DD/YYYY".

    • Name the calculated field Ship Date and enter the calculation . Then click Save.MAKEDATE([Ship Year],[Ship Month],[Ship Day])

    • Remove the Ship Year, Ship Month, and Ship Day fields. Search for the fields, select them, and select Remove (Remove Field in prior versions) from the menu to remove the fields.

  9. Clear your search and enter disc in the search box to find the field.

  10. Select the Discounts field, double-click the field value None, and change it to the numeric value 0.

  11. To change the data type for the Discount field from String to Number (decimal), click Abc and select Number (decimal) from the drop-down menu.

  12. Finally name your step to help keep track of what you did in this step. In the Flow pane, double-click the step name Clean 1 and type in Fix dates/field names.

Review your changes

You made a lot of changes to this data set and you start to worry that you won't remember everything you did. As you look over your work, you see a column on the left of the Profile pane called Changes.

You click the arrow to open it and are delighted to see a list of every change you just made. As you scroll through the changes in the list, you notice that you can delete or edit your changes or even move them around to change the order that you did them in.

You love that you can easily find the changes you made in any step as you build your flow and experiment with the order of those changes to get the most out of your data.

Check your work: Watch "Clean Orders_Central" in action.

Click the image to replay it

Now that you've cleaned one file, you take a look at the other files to see what other issues you need to fix.

You decide to look at the Excel file for Orders_East next.

Clean Orders_East

As you look over the fields for the Orders_East file, most of the fields look like they align with the other files, except for Sales. To take a closer look and see if there are any other issues to address, you add a cleaning step to the Orders_East Input step.

  1. In the Flow pane, select Orders_East and do one of the following:

    Looking at the Sales field you quickly see that the USD currency code has been included with the sales numbers, and Tableau Prep interpreted these field values as a string.

    You'll need to remove the currency code from this field and change the data type if you want to get accurate sales data.

    Fixing the data type is easy, you already know how to do that. But there are over 2000 unique rows of sales data and fixing every individual row to remove the currency code seems cumbersome.

    But this is Tableau Prep, and you decide to check out the drop-down menu to see if there is an option to fix this.

    When you click the More options (drop-down arrow in prior versions) for the Sales field, you see a menu option called Clean and an option under that to remove letters. You decide to give that a try and see what it does.

    • Click the plus icon and add a clean step. Depending on your version, this menu option is Add Step, Add Clean Step, or Clean Step.

    • Click on the suggested clean step (Tableau Prep Builder version 2020.3.3 and later and on the web).

  2. Select the Sales field. Click the More optionsmenu (drop-down arrow in prior versions) and select Clean > Remove Letters.

    Wow! That cleaning option instantly removed the currency code from every field. Now you just need to change the data type from String to Number (decimal) and this file is looking good.

  3. Click the data type for the Sales field and select Number (decimal) from the drop-down list to change the data type.

  4. The rest of the file looks pretty good. Name your cleaning step to keep track of your work. For example, Change data type.

Next you look at your last file for Orders_West to see if there are any issues there that you need to fix.

Clean Orders_West

As you look over the fields for the Orders_West file, most of the fields look like they align with the other files, but you remember seeing that the States field used abbreviations for the values instead of spelling out the state name. To combine this file with the other files, you'll need to fix this. So you add a cleaning step to the Orders_West Input step.

  1. In the Flow pane, select Orders_West and do one of the following:

    • Click the plus icon and add a clean step.

    • Click on the suggested clean step (Tableau Prep Builder version 2020.3.3 and later and on the web).

  2. Scroll or use Search to find the State field.

    You see that all the state name values use the short abbreviation. There are only 11 unique values for this field. You could manually change each one, but maybe Tableau Prep has another way to do this?

    You click the More optionsmenu (drop-down arrow in prior releases) for the field and see an option called Group Values (Group and Replace in previous versions). When you select it you see several options:

    The state names don't sound alike, they aren't spelled incorrectly, and they don't share the same characters, so you decide to try the Manual Selection option.

    Tip: You can double-click a field name or field value to edit a single value. To edit multiple values you can select all the values and use the right-click menu option Edit Values. But when you want to map one or more values to specific values, use the Group Values option in the drop-down menu.
    For more information about editing and grouping values, see
    Edit field values(Link opens in a new window).

    • Manual Selection

    • Pronunciation

    • Common Characters

    • Spelling

  3. Select the State field. Click the drop-down arrow and select Group Values (Group and Replace in previous versions) > Manual Selection.

    A two column card opens. This is the Group Values editor. The column on the left shows the current field values and the column on the right shows the fields that are available to map to the fields on the left.

    You want to map your state abbreviations to the spelled out version of the state name, but you don't have those values in the Orders_West data set. You wonder if you can just edit the name directly and maybe add it there, so you give that a try.

  4. In the Group Values editor in the left pane, double-click AZ to highlight the value and type Arizona. Then press Enter to add your change.

    Tableau Prep created a mapped value for your new value Arizona and automatically mapped the old value, AZ to it. Having a mapped relationship set up for these values will save you time if you get more data from this region entered like this.

    Tip: You can add field values that aren't in your data sample to set up mapping relationships to organize your data. If you refresh your data source and new data is added, you can add the new data to the mapping instead of manually fixing each value.

    When you manually add a value that isn't in your data sample, the value is marked with a red dot to help you easily identify it.

  5. Repeat these steps to map each state to the spelled out version of its name.

    AbbreviationState Name
    AZArizona
    CACalifornia
    COColorado
    IDIdaho
    MTMontana
    NMNew Mexico
    NVNevada
    OROregon
    UTUtah
    WAWashington
    WYWyoming

    Then click Done to close the Group Values editor.

    After all the states are mapped, you look at the Changes pane and see there is only one entry there instead of 11.

    Tableau Prep grouped similar actions for a field together. You like that because it will make it easier to find changes you made to your data set later.

    Fixing the State field values was the only change you needed to make here.

  6. Name your cleaning step to keep track of your work. For example Rename states.

You've done a lot of clean up in your files, and you can't believe how quick and easy it was. You might make it home for dinner after all! To make sure that you don't lose all of your work so far, save your flow.

Note: If working on the web, your changes are automatically saved as you go, creating a draft flow. Click in the draft title to name your draft. For more information about authoring on the web, see Tableau Prep on the Web in the Tableau Server(Link opens in a new window) or Tableau Online(Link opens in a new window) help.

Click File > Save or File > Save As. Save your file as a flow file (.tfl) and give it a name. For example, My Superstore.

Tip: When you save your flow files, you can either save them as a flow file (.tfl) or you can save them as a packaged file (.tflx) and package your local data files with them to share the flow and files with someone else. For more information about saving and sharing your flows, see Save and Share Your Work(Link opens in a new window).

4. Combine your data

Now that all the files are cleaned up, you are finally ready to combine them all.

Because all the files have similar fields after your clean up efforts, to pull all the rows together into a single table, you need to union the tables.

You remember that there was a step option called Union, but you wonder if you can simply drag and drop the steps to union them. You decide to try it and see.

Union your data

  1. Follow the steps for where you are working.

    Tableau Prep Builder

    Tableau Server or Tableau Online

    • In the Flow pane, drag the cleaning step Rename states on to the Union step you created earlier for your South files and drop it on the Add option.

      You see that Tableau Prep added your new files to your previous union. Great! Now you want to add the other files to this union too.

    • In the Flow pane, drag the cleaning step Rename states on to the cleaning step Changed data type step and drop it on the Union option.

      You see that Tableau Prep Builder added a new Union step to your flow. Great! Now you want to add the other files to this union too.

  2. Drag the next cleaning step in the flow on to the Union step, then drop it on Add to add it to the existing union.

  3. Drag the remaining step (orders_south_2015 Input step if working in Tableau Prep Builder or your cleaning step if working on the web) to the new Union step. Drop it on Add to add it to the existing union.

    Now all of your files are combined into a single table. In the Flow pane, select the new Union step to see your results.

    On Tableau Prep Builder:

    On Tableau Server or Tableau Online:


    You notice that Tableau automatically matched up the fields that had the same names and types.

    You also see that the colors assigned to the steps in the flow are used in the union profiles to indicate where the field came from and also appear in the colored band across the top of each field to show you if that field exists in that table.

    You notice that a new field called Table Names was added that lists the tables where all the rows in the union come from.

    A list of mismatched fields also shows in the summary pane and you can see right away that the fields Product and Discounts only appear in the Orders_Central file.

  4. To take a closer look at these fields, in the Union Results pane, select the Show only mismatched fields check box.

    Looking at the field data, you quickly see that the data is the same, but the field name is different. You could simply rename the field, but you wonder if you could just drag and drop these fields to merge them. You decide to try that and see.

  5. Select the Product field and drag and drop it onto the Product Name field to merge the fields. After the fields are merged, they no longer appear in the pane.

  6. Repeat this step to merge the Discounts field with the Discount field.

    The only field that doesn't have a match now is the File Paths field. In Tableau Prep Builder, this field shows the file paths for the wildcard union that you did for your sales orders from the South. You decide to leave this field there as it has good information.

    Tip: You have several options when fixing mismatched fields after a union. If Tableau Prep detects a possible match, it will highlight it in yellow. To merge the fields hover over the highlighted field and click the plus button that appears.

    For more ways to merge fields in a union, see Fix fields that don’t match.

  7. Clear the Show only mismatched fields check box to show all the fields included in the union.

  8. Name your Union step to represent what this union includes work. For example, All orders.

Check your work: Watch "Union your data" in action.

Click the image to replay it

You are a cleaning genius! As you are admiring your results, your boss calls. He forgot to mention that he also wants you to include any product returns in your analysis. He hopes that won't be too much trouble. With Tableau Prep in your toolkit, it's no problem at all!

Clean the product returns data

You look over the Excel file that your boss sent you for product returns and it looks a little messy. You add the new file return_reasons new to your flow to take a closer look.

  1. In the Connections pane, click Add connection. Select Microsoft Excel and navigate to the sample data files you've been using for this exercise. (See Sample files to download the file.)

  2. Select return reasons_new.xlsx, and then click Open to add the file to the flow pane.

    There are only four fields that you want to include from this file in your flow: Order ID, Product ID, Return Reason and Notes.

  3. In the Input pane for returns_new clear the check box at the top of the left-most column to clear all the check boxes. Then select the check box for the Order ID, Product ID, Return Reason and Notes fields.

  4. Rename the Input step to better reflect the data that is included in this input. In the Flow pane, double-click the Input step name Returns_new and type in Returns (all).

    Looking at the sample field values, you notice that the Notes field seems to have a lot of different data combined together.

    You have some cleaning to do in this file before you can do any further work with the data, so you add a cleaning step to check it out.

  5. In the Flow pane, select the Input step Returns (all), click the plus icon or on the suggested clean step to add a clean step.

    In the Profile pane, re-size the Notes field so you can see the entries better. To do this, click and drag the outer right edge of the field to the right.

  6. In the Notes field, use the visual scroll bar to the right of the field values to scan the values.

    You notice a few things that are problematic:

    To tackle the extra spaces, you remember that there was a cleaning option to remove trailing spaces, so you decide to try that to see if it can fix that problem.

    • Some of the entries have an extra space in the entry. This can result in the field being read as a null value.

    • It looks like the name of the approver is included in the return notes entry. To better work with this data you'll want that information in a separate field.

  7. Select the Notes field. Click the More optionsmenu (drop-down arrow in prior releases) and select Clean > Trim Spaces.

    Yes! It did exactly what you wanted it to do. The extra spaces are gone.

    Next you want to create a separate field for the approver name. You see a Split Values option in the menu, so you decide to try that.

  8. Select the Notes field. Click the More optionsmenu (drop-down arrow in prior releases) and select Split Values > Automatic Split.

    This option did exactly what you were hoping it would do. It automatically split the return notes and the approver name into separate fields.

    Just like Tableau Desktop, Tableau Prep automatically assigned a name to those fields. So you'll need to rename the new fields to something meaningful.

  9. Select the field Notes-Split 1. Double-click in the field name and type Return Notes.

  10. Repeat this step for the second field and rename it to Approver.

  11. Finally remove the original Notes field, as you no longer need it. Select the Notes field, click the More optionsmenu (drop-down arrow in prior versions), and select Remove (Remove Field in prior versions) from the menu.

    Looking at the new Approver field, you notice that the field values lists the same names but they are entered differently. You want to group them to eliminate multiple variations of the same value.

    Maybe the Group Values (Group and Replace in prior versions) option can help with that?

    You remember there was an option for Common Characters. Since these values share the same letters, you decide to try that.

  12. Select the Approver field. Click the More optionsmenu (drop-down arrow in prior versions) and select Group Values (Group and Replace in prior versions) > Common Characters.

    This option grouped all of the variations of each name together for you. That's exactly what you wanted to do.

    After checking the other names to make sure they are grouped properly, you click Done to close the Group Values editor.

    This file is looking pretty good.

  13. Name your cleaning step to keep track of your work. For example Cleaned notes.

Now that the product return data is all cleaned up, you want to add this data to the orders data from your unioned files. But many of these fields don't exist in the unioned files. To add these fields (columns of data) to your unioned data set, you need to use a join.

Join your data

When you join data, the files must have at least one field in common. Your files share the Order ID and Product ID fields, so you can join on those fields to see all the rows that have those fields in common. You remember an option to create a join when you created your union using drag and drop, so you give that a try.

  1. In the Flow pane, drag the Cleaned notes step on to the All orders Union step and drop it on Join.

    When you join files, Tableau Prep shows you the results of your join in the Join Profile.

    Working with joins can be tricky. You often want to have a clear view of the factors that are included in the join, such as the fields used to join the files, the number of rows included in the results and any fields that aren't included or are null values.

    As you review the results of the join in Tableau Prep, you are delighted to see so much information and interactivity at your fingertips.

    Tip: The far left pane of the join profile is where you can explore and interact with your join. You can also edit values directly in the Join Clauses panes and perform cleaning operations in the Join Results pane.

    Click in the Join Type diagram to try different join configurations and see the number of rows included or excluded in your join for each table in the Summary of Join Results section.

    Select the fields that you want to join on in the Applied Join Clauses section or add suggested join clauses from the Join Clause Recommendations section.

    For more information about working with joins, see
    Aggregate, Join, or Union Data(Link opens in a new window).

    You see that you have over 13,000 rows excluded from your All Orders files. When you created your join, Tableau Prep automatically joined on the Product ID field, but you also wanted to join on the Order ID field.

    As you scan the left pane of the join profile, you see that Order ID is in the list of recommended join clauses, so you quickly add it from there.

  2. In the left pane of the Join profile, in the Join Clause Recommendations section, select Order ID = Order ID and click the plus button to add the join clause.

    Because the Join Type is set to an inner join (the default setting for Tableau Prep), the join is only including values that exist in both files. But you want all of the data from your Orders files as well as the return data for those files. So you'll need to change the join type.

  3. In the Join Type section, click the side of the diagram to include all orders. In the example below, click the left side of the diagram to change the join type to a Left join and include all data from the All orders union step and any matching data from the Cleaned notes step.

    Now you have all of the data from the sales order files and any return data that apply to those orders. You review the Join Clauses pane and see the distinct values that don't exist in the other file.

    For example there are many order rows (shown in red) that have no corresponding return data. You love being able to explore this level of detail about your join.

    You're anxious to start analyzing this data in Tableau Desktop, but you notice a few results from the join that you want to clean up before you do that. Good thing you know what to do!

    Tip: Wonder if your data is clean enough? From Tableau Prep Builder, you can preview your data in Tableau Desktop from any step in your flow to check it out.

    Just right-click on the step in the Flow pane and select Preview in Tableau Desktop from the menu.

    You can experiment with your data and any changes that you make in Tableau Desktop won't write back to your data source in Tableau Prep Builder. For more information see
    View flow output in Tableau Desktop(Link opens in a new window).

  4. Before you start cleaning your join results, name your Join step Orders+Returns and save your flow.

Clean your join results

Note: To clean up the fields in your join, you can perform cleaning operations directly in the Join step. For the purposes of this tutorial we will add a cleaning step so you can clearly see your cleaning operations. If you want to try performing these steps directly in the join step skip steps 1 and 3 below.

When you joined the two steps, the common fields Order ID and Product ID were added for both tables.

You want to keep the Product ID field from all of your orders and the Order ID field from the returns file and remove the duplicate fields that came from those files. You also don't need the File Paths and Table Names fields in your output file, so you want to remove those fields as well.

Tip: When you join tables using fields that exist in both files, Tableau Prep brings in both fields and renames the duplicate field from the second file by adding a "-1" or a "-2" to the field name. For example Order ID and Order ID-1.

  1. In the Flow pane, select Orders+Returns, click the plus icon, and add a clean step.

  2. In the Profile pane, select and remove the following fields:

    • Table Names

    • Order ID

    • File Paths (Tableau Prep Builder only)

    • Product ID-1

  3. Rename the field Order ID-1 to Order ID.

    You have quite a few null values where the product was returned but there was no return note or approver indicated. To make this data easier to analyze, you want to add a field with a value of Yes and No to indicate whether the product was returned.

    You don't have this field, but you can add it by creating a calculated field.

  4. In the toolbar, click Create Calculated Field.

  5. Name the field Returned? and then enter the following calculation and click Save.

    If ISNULL([Return Reason])=FALSE THEN "Yes" ELSE "No" END

    For your analysis you would also like to know the number of days it takes to ship an order, but you don't have that field either.

    You have all the information that you need to create it though, so you add another calculated field to create it.

  6. In the toolbar, click Create Calculated Field.

  7. Name the field Days to Ship and then enter the following calculation and click Save.

    DATEDIFF('day',[Order Date],[Ship Date])

  8. Name your step Clean Orders +Returns.

  9. Save your flow.

5. Run your flow and generate output

Your data is looking good and you're ready to generate your output file to start analyzing it in Tableau Desktop. All you need to do is run your flow and generate your extract file. To do this you need to add an Output step.

Depending on where you're working, you can output your flow to a file (Tableau Prep Builder only) , to a published data source or to a database.

  1. In the Flow pane, select Clean Orders+Returns, click the plus icon and select Output (Add Output in prior versions).

    When you add an Output step, the Output pane opens and shows you a snapshot of your data. Here you can select the type of output that you want to generate, and specify the name and where you want to save the file.

    The default location is in the My Tableau Prep Builder repository in your data sources folder.

  2. In the left pane in the Save output to drop-down, depending on where you are working, do one of the following:

    Tableau Prep Builder

    Tableau Server or Tableau Online

    Tip: You have choices when generating output from your flow. You can generate an extract file (Tableau Prep Builder only), you can publish your data as a data source to Tableau Server or Tableau Online or you can write your data to a database. For more information about generating output files, see Create data extract files and published data sources(Link opens in a new window).

    1. Select Published data source.

    2. Select a project.

    3. Enter a name for the file, for example Orders_Returns_Superstore.

    4. Select File (select Save to file in previous versions).

    5. Click the Browse button, then in the Save Extract As dialog, enter a name for the file, for example Orders_Returns_Superstore, and click Accept.

    6. In the Output type field, select an output type. Select Tableau Data Extract (.hyper) for Tableau Desktop or Comma Separated Values (.csv) if you want to share the extract with a third party.

  3. In the Write Options section, view the options to write the new data to your files. You want to use the default (Create table) and replace the table with your flow output, so there is nothing to change here.

    Tip: Starting in version 2020.2.1, you can choose how you want to write your flow data back to your table. You can choose from two options; Create table or Append table. By default, Tableau Prep uses the Create table option and overwrites your table data with the new data when you run your flow. If you choose Append table, Tableau Prep adds the flow data to the existing table so you can track both new and historical data on every flow run. For more information, see Configure write options

  4. In the Output pane, click Run Flow or click the Run Flow button in the flow pane to generate your output.

    Note: If you are working on the web, click Publish to publish your draft flow. Only published flows can be run.

  5. When the flow is finished running, a status dialog shows whether the flow ran successfully and the time it took to run. Click Done to close the dialog.

    如果在 Web 上工作,请导航到“浏览>所有流”页,然后找到你的流。您可以在“流程概述”页上查看流程运行的状态。

    若要使数据保持最新,可以手动运行流或使用命令行。如果您具有数据管理加载项并启用了 Tableau Prep Conductor,则还可以在 Tableau ServerTableau Online 中按计划运行流程。

    从 Tableau Prep Builder 版本 2020.2.1 和 Web 上开始,您还可以选择在每次运行流程时刷新所有数据,或者使用增量刷新运行流程,每次仅处理新数据。

    有关使数据保持最新的详细信息,请参阅下列主题:

总结和资源

你是一个数据准备摇滚明星!您获取了脏数据并轻松转换了它!您立即从多个数据集中清理并准备了数据,并将其转换为时尚、干净的数据集,您现在可以在 Tableau Desktop 中使用该数据集进行分析。

想要更多练习?尝试使用此处找到的数据文件复制 Superstore 的示例流的其余部分:

安装 Tableau Prep Builder 后,您还可以在计算机上的以下位置找到这些文件:

  • (视窗)C:\Program Files\Tableau\Tableau Prep Builder

  • (苹果)/Applications/Tableau Prep Builder