内容:

Hortonworks Hadoop Hive

  • 版本 :2022.1 及更高版本

本文介绍如何将 Tableau 连接到 Hortonworks Hadoop Hive 数据库并设置数据源。

开始之前

在开始之前,请收集以下连接信息:

  • 承载要连接到的数据库的服务器的名称

  • 身份验证方法:

    • 无身份验证

    • 克贝罗斯

    • 用户名

    • 用户名和密码

    • Microsoft Azure HDInsight Service(从版本 10.2.1 开始)

  • 传输选项取决于您选择的身份验证方法,可以包括以下内容:

    • 二元的

    • 萨萨尔

    • 断续器

  • 登录凭据取决于您选择的身份验证方法,并且可以包括以下内容:

    • 用户名

    • 密码

    • 领域

    • 主机 FQDN

    • 服务名称

    • HTTP 路径

  • 是否连接到 SSL 服务器?

  • (可选)每次 Tableau 连接时运行的初始 SQL 语句

需要驱动程序

此连接器需要驱动程序才能与数据库通信。您的计算机上可能已经安装了所需的驱动程序。如果您的计算机上未安装该驱动程序,Tableau 将在连接对话框中显示一条消息,其中包含指向驱动程序下载(链接在新窗口中打开)页面,您可以在其中找到驱动程序链接和安装说明。

注意:确保使用最新的可用驱动程序。要获取最新的驱动程序,请参阅Hortonworks Hadoop Hive(链接在新窗口中打开)在“Tableau 驱动程序下载”页上。

建立连接并设置数据源

  1. 启动 Tableau,然后在“连接”下,选择 Hortonworks Hadoop Hive。有关数据连接的完整列表,请选择“到服务器”下的“更多”。然后执行以下操作:

    1. 输入承载数据库的服务器的名称。

    2. “身份验证”下拉列表中,选择要使用的身份验证方法。

    3. Enter the information that you are prompted to provide. The information you are prompted for depends on the authentication method you choose.

    4. (Optional) Select Initial SQL to specify a SQL command to run at the beginning of every connection, such as when you open the workbook, refresh an extract, sign in to Tableau Server, or publish to Tableau Server. For more information, see Run Initial SQL.

    5. Select Sign In.

      Select the Require SSL option when connecting to an SSL server.

      If Tableau can't make the connection, verify that your credentials are correct. If you still can't connect, your computer is having trouble locating the server. Contact your network administrator or database administrator.

  2. On the data source page, do the following:

    1. (Optional) Select the default data source name at the top of the page, and then enter a unique data source name for use in Tableau. For example, use a data source naming convention that helps other users of the data source figure out which data source to connect to.

    2. From the Schema drop-down list, select the search icon or enter the schema name in the text box and select the search icon, and then select the schema.

    3. In the Table text box, select the search icon or enter the table name and select the search icon, and then select the table.

    4. Drag the table to the canvas, and then select the sheet tab to start your analysis.

      Use custom SQL to connect to a specific query rather than the entire data source. For more information, see Connect to a Custom SQL Query.

      Note: This database type only support equal (=) join operations.

Sign in on a Mac

If you use Tableau Desktop on a Mac, when you enter the server name to connect, use a fully qualified domain name, such as mydb.test.ourdomain.lan, instead of a relative domain name, such as mydb or mydb.test.

Alternatively, you can add the domain to the list of Search Domains for the Mac computer so that when you connect, you need to provide only the server name. To update the list of Search Domains, go to System Preferences > Network > Advanced, and then open the DNS tab.

Work with Hadoop Hive data

Work with date/time data

Tableau supports TIMESTAMP and DATE types natively. However, if you store date/time data as a string in Hive, be sure to store it in ISO format (YYYY-MM-DD). You can create a calculated field that uses the DATEPARSE or DATE function to convert a string to a date/time format. Use DATEPARSE() when working with an extract, otherwise use DATE(). For more information, see Date Functions.

For more information about Hive data types, see Dates(Link opens in a new window) on the Apache Hive website.

NULL value returned

A NULL value is returned when you open a workbook in Tableau 9.0.1 and later and 8.3.5 and later 8.3.x releases that was created in an earlier version and has date/time data stored as a string in a format that Hive doesn't support. To resolve this issue, change the field type back to String and create a calculated field using DATEPARSE() or DATE() to convert the date. Use DATEPARSE() when working with an extract, otherwise use the DATE() function.

High latency limitation

Hive 是一个面向批处理的系统,尚不能以非常快的周转时间回答简单的查询。此限制可能使浏览新数据集或试验计算字段变得困难。一些较新的SQL-on-Hadoop技术(例如,Cloudera的Impala和Hortonworks的Stringer项目)旨在解决这一限制。

Tableau 中被截断的列

Hortonworks Hadoop Hive 的默认字符串列长度为 255 个字符。有关 Hortonworks Hive ODBC 驱动程序配置选项的详细信息,特别是关于 DefaultStringColumnLength 的详细信息,请参阅蜂巢 ODBC 驱动程序用户指南(链接在新窗口中打开)来自Hortonworks。

另请参见