How To Create Cluster Table In Sap Abap
This document describes how to create and use clustered tables in BigQuery. For an overview of clustered table support in BigQuery, see Introduction to clustered tables.
Limitations
Clustered tables in BigQuery are subject to the following limitations:
- Only standard SQL is supported for querying clustered tables and for writing query results to clustered tables.
-
Clustering columns must be top-level, non-repeated columns of one of the following types:
-
DATE -
BOOL -
GEOGRAPHY -
INT64 -
NUMERIC -
BIGNUMERIC -
STRING -
TIMESTAMP -
DATETIME
For more information about data types, see Standard SQL data types.
-
-
You can specify up to four clustering columns.
-
When using
STRINGtype columns for clustering, BigQuery uses only the first 1,024 characters to cluster the data. The values in the columns can themselves be longer than 1,024.
Creating clustered tables
You can create a clustered table in the following ways:
- From query results:
- By using a DDL
CREATE TABLE AS SELECTstatement. - By running a query that creates a clustered destination table.
- By using a DDL
- By using a DDL
CREATE TABLEstatement with aCLUSTER BYclause containing aclustering_column_list. - Manually by using the
bqcommand-line toolbq mkcommand. - Programmatically by calling the
tables.insertAPI method. - When you load data.
- By using the client libraries.
Table naming
When you create a table in BigQuery, the table name must be unique per dataset. The table name can:
- Contain up to 1,024 characters.
- Contain Unicode characters in category L (letter), M (mark), N (number), Pc (connector, including underscore), Pd (dash), Zs (space). For more information, see General Category.
For example, the following are all valid table names: table 01, ग्राहक, 00_お客様, étudiant-01.
Some table names and table name prefixes are reserved. If you receive an error saying that your table name or prefix is reserved, then select a different name and try again.
Required permissions
To create a table, you need the following IAM permissions:
-
bigquery.tables.create -
bigquery.tables.updateData -
bigquery.jobs.create
Additionally, you might require the bigquery.tables.getData permission to access the data that you write to the table.
Each of the following predefined IAM roles includes the permissions that you need in order to create a table:
-
roles/bigquery.dataEditor -
roles/bigquery.dataOwner -
roles/bigquery.admin(includes thebigquery.jobs.createpermission) -
roles/bigquery.user(includes thebigquery.jobs.createpermission) -
roles/bigquery.jobUser(includes thebigquery.jobs.createpermission)
Additionally, if you have the bigquery.datasets.create permission, you can create and update tables in the datasets that you create.
For more information on IAM roles and permissions in BigQuery, see Predefined roles and permissions.
Creating an empty clustered table with a schema definition
You specify clustering columns when you create a table in BigQuery. After the table is created, you can modify the clustering columns; see Modifying clustering specification for details.
Clustering columns must be top-level, non-repeated columns, and they must be one of the following simple data types:
-
DATE -
BOOLEAN -
GEOGRAPHY -
INTEGER -
NUMERIC -
BIGNUMERIC -
STRING -
TIMESTAMP
You can specify up to four clustering columns. When you specify multiple columns, the order of the columns determines how the data is sorted. For example, if the table is clustered by columns a, b and c, the data is sorted in the same order: first by column a, then by column b, and then by column c. As a best practice, place the most frequently filtered or aggregated column first.
The order of your clustering columns also affects query performance and pricing. For more information about query best practices for clustered tables, see Querying clustered tables.
To create an empty clustered table with a schema definition:
Console
-
In the Google Cloud Console, go to the BigQuery page.
Go to the BigQuery page
-
In the Explorer panel, expand your project and select a dataset.
-
Expand the Actions option and click Open.
-
In the details panel, click Create table .
-
On the Create table page, under Source, for Create table from, select Empty table.
-
Under Destination:
- For Dataset name, choose the appropriate dataset, and in the Table name field, enter the name of the table you're creating.
- Verify that Table type is set to Native table.
-
Under Schema, enter the schema definition.
-
Enter schema information manually by:
-
Enabling Edit as text and entering the table schema as a JSON array.
-
Using Add field to manually input the schema.
-
-
-
For Clustering order, enter between one and four comma-separated column names.
-
(Optional) Click Advanced options and for Encryption, click Customer-managed key to use a Cloud Key Management Service key. If you leave the Google-managed key setting, BigQuery encrypts the data at rest.
-
Click Create table.
bq
Use the bq mk command with the following flags:
-
--table(or the-tshortcut). -
--schema. You can supply the table's schema definition inline or use a JSON schema file. -
--clustering_fields. You can specify up to four clustering columns.
Optional parameters include --expiration, --description, --time_partitioning_type, --time_partitioning_field, --time_partitioning_expiration, --destination_kms_key, and --label.
If you are creating a table in a project other than your default project, add the project ID to the dataset in the following format: project_id:dataset .
--destination_kms_key is not demonstrated here. For information about using --destination_kms_key, see customer-managed encryption keys.
Enter the following command to create an empty clustered table with a schema definition:
bq mk \ --table \ --expiration INTEGER1 \ --schema SCHEMA \ --clustering_fields CLUSTER_COLUMNS \ --description "DESCRIPTION" \ --label KEY:VALUE,KEY:VALUE \ PROJECT_ID:DATASET.TABLE
Replace the following:
-
INTEGER1: the default lifetime, in seconds, for the table. The minimum value is 3,600 seconds (one hour). The expiration time evaluates to the current UTC time plus the integer value. If you set the table's expiration time when you create a table, the dataset's default table expiration setting is ignored. Setting this value deletes the table after the specified time. -
SCHEMA: an inline schema definition in the formatCOLUMN:DATA_TYPE,COLUMN:DATA_TYPEor the path to the JSON schema file on your local machine. -
CLUSTER_COLUMNS: a comma-separated list of up to four clustering columns. The list cannot contain any spaces. -
DESCRIPTION: a description of the table, in quotes. -
KEY:VALUE: the key-value pair that represents a label. You can enter multiple labels using a comma-separated list. -
PROJECT_ID: your project ID. -
DATASET: a dataset in your project. -
TABLE: the name of the table you're creating.
When you specify the schema on the command line, you cannot include a RECORD (STRUCT) type, you cannot include a column description, and you cannot specify the column's mode. All modes default to NULLABLE. To include descriptions, modes, and RECORD types, supply a JSON schema file instead.
Examples:
Enter the following command to create a clustered table named myclusteredtable in mydataset in your default project. The table's expiration is set to 2,592,000 (1 30-day month), the description is set to This is my clustered table, and the label is set to organization:development. The command uses the -t shortcut instead of --table.
The schema is specified inline as: timestamp:timestamp,customer_id:string,transaction_amount:float. The specified clustering field customer_id is used to cluster the table.
bq mk -t \ --expiration 2592000 \ --schema 'timestamp:timestamp,customer_id:string,transaction_amount:float' \ --clustering_fields customer_id \ --description "This is my clustered table" \ --label org:dev \ mydataset.myclusteredtable Enter the following command to create a clustered table named myclusteredtable in myotherproject, not your default project. The description is set to This is my clustered table, and the label is set to organization:development. The command uses the -t shortcut instead of --table. This command does not specify a table expiration. If the dataset has a default table expiration, it is applied. If the dataset has no default table expiration, the table never expires.
The schema is specified in a local JSON file: /tmp/myschema.json. The customer_id field is used to cluster the table.
bq mk -t \ --expiration 2592000 \ --schema /tmp/myschema.json \ --clustering_fields=customer_id \ --description "This is my clustered table" \ --label org:dev \ myotherproject:mydataset.myclusteredtable After the table is created, you can update the table's description and labels.
API
Call the tables.insert method with a defined table resource that specifies the clustering.fields property and the schema property.
Python
Before trying this sample, follow the Python setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Python API reference documentation.
Go
Before trying this sample, follow the Go setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Go API reference documentation.
Java
Before trying this sample, follow the Java setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Java API reference documentation.
Creating a clustered table from a query result
There are two ways to create a clustered table from a query result:
- Write the results to a new destination table and specify the clustering columns. This method is discussed below.
- By using a DDL
CREATE TABLE AS SELECTstatement. For more information about this method, see Creating a clustered table from the result of a query on the Using data definition language statements page.
You can create a clustered table by querying either a partitioned table or a non-partitioned table. You cannot change an existing table to a clustered table by using query results.
When you create a clustered table from a query result, you must use standard SQL. Currently, legacy SQL is not supported for querying clustered tables or for writing query results to clustered tables.
Console
You cannot specify clustering options for a destination table when you query data using the Cloud Console unless you use a DDL statement. For more information, see Using data definition language statements.
bq
Enter the following command to create a new, clustered destination table from a query result:
bq --location=LOCATION query \ --use_legacy_sql=false 'QUERY'
Replace the following:
-
LOCATION: the name of your location. The--locationflag is optional. For example, if you are using BigQuery in the Tokyo region, you can set the flag's value toasia-northeast1. You can set a default value for the location using the .bigqueryrc file. -
QUERY: a query in standard SQL syntax. Currently, you cannot use legacy SQL to query clustered tables or to write query results to clustered tables. The query can contain aCREATE TABLEDDL statement that specifies the options for creating your clustered table. You can use DDL rather than specifying the individual command-line flags.
Examples:
Enter the following command to write query results to a clustered destination table named myclusteredtable in mydataset. mydataset is in your default project. The query retrieves data from a non-partitioned table: mytable. The table's customer_id column is used to cluster the table. The table's timestamp column is used to create a partitioned table.
bq query --use_legacy_sql=false \ 'CREATE TABLE mydataset.myclusteredtable PARTITION BY DATE(timestamp) CLUSTER BY customer_id AS SELECT * FROM `mydataset.mytable`' API
To save query results to a clustered table, call the jobs.insert method, configure a query job, and include a CREATE TABLE DDL statement that creates your clustered table.
Specify your location in the location property in the jobReference section of the job resource.
Creating a clustered table when you load data
You can create a clustered table by specifying clustering columns when you load data into a new table. You do not need to create an empty table before loading data into it. You can create the clustered table and load your data at the same time.
For more information about loading data, see Introduction to loading data into BigQuery.
To define clustering when defining a load job:
Controlling access to clustered tables
To configure access to tables and views, you can grant an IAM role to an entity at the following levels, listed in order of range of resources allowed (largest to smallest):
- a high level in the Google Cloud resource hierarchy such as the project, folder, or organization level
- the dataset level
- the table/view level
You can also restrict access to data within tables, by using different methods:
- column-level security
- row-level security
Access with any resource protected by IAM is additive. For example, if an entity does not have access at the high level such as a project, you could grant the entity access at the dataset level, and then the entity will have access to the tables and views in the dataset. Similarly, if the entity does not have access at the high level or the dataset level, you could grant the entity access at the table or view level.
Granting IAM roles at a higher level in the Google Cloud resource hierarchy such as the project, folder, or organization level gives the entity access to a broad set of resources. For example, granting a role to an entity at the project level gives that entity permissions that apply to all datasets throughout the project.
Granting a role at the dataset level specifies the operations an entity is allowed to perform on tables and views in that specific dataset, even if the entity does not have access at a higher level. For information on configuring dataset-level access controls, see Controlling access to datasets.
Granting a role at the table or view level specifies the operations an entity is allowed to perform on specific tables and views, even if the entity does not have access at a higher level. For information on configuring table-level access controls, see Controlling access to tables and views.
You can also create IAM custom roles. If you create a custom role, the permissions you grant depend on the specific operations you want the entity to be able to perform.
You can't set a "deny" permission on any resource protected by IAM.
For more information about roles and permissions, see:
- Understanding roles in the IAM documentation
- BigQuery Predefined roles and permissions
For more information about control access to resources and data, see the following:
- Controlling access to datasets
- Controlling access to tables and views
- Restricting access with column-level security
- Introduction to row-level security
Using clustered tables
Getting information about clustered tables
You can get information about tables in the following ways:
- Using the Cloud Console.
- Using the
bqcommand-line tool'sbq showcommand. - Calling the
tables.getAPI method. - Querying
INFORMATION_SCHEMAviews.
Required permissions
At a minimum, to get information about tables, you must be granted bigquery.tables.get permissions. The following predefined IAM roles include bigquery.tables.get permissions:
-
bigquery.metadataViewer -
bigquery.dataViewer -
bigquery.dataOwner -
bigquery.dataEditor -
bigquery.admin
In addition, if a user has bigquery.datasets.create permissions, when that user creates a dataset, they are granted bigquery.dataOwner access to it. bigquery.dataOwner access gives the user the ability to get information about tables in a dataset.
For more information about IAM roles and permissions in BigQuery, see Predefined roles and permissions.
Getting clustered table information
To view information about a clustered table:
Console
-
In the Google Cloud Console, go to the Resources pane. Click your dataset name to expand it, and then click the table name you want to view.
-
Click Details. This page displays the table's details including the clustering columns.
bq
Issue the bq show command to display all table information. Use the --schema flag to display only table schema information. The --format flag can be used to control the output.
If you are getting information about a table in a project other than your default project, add the project ID to the dataset in the following format: project_id:dataset .
bq show \ --schema \ --format=prettyjson \ PROJECT_ID:DATASET.TABLE
Replace the following:
-
PROJECT_ID: your project ID -
DATASET: the name of the dataset -
TABLE: the name of the table
Examples:
Enter the following command to display all information about myclusteredtable in mydataset. mydataset in your default project.
bq show --format=prettyjson mydataset.myclusteredtable The output should look like the following:
{ "clustering": { "fields": [ "customer_id" ] }, ... } API
Call the bigquery.tables.get method and provide any relevant parameters.
SQL
For clustered tables, you can query the CLUSTERING_ORDINAL_POSITION column in the INFORMATION_SCHEMA.COLUMNS view to retrieve information about a table's clustering columns.
-- Set up a table with clustering. CREATE TABLE myDataset.data (column1 INT64, column2 INT64) PARTITION BY _PARTITIONDATE CLUSTER BY column1, column2; -- This query returns 1 for column1 and 2 for column2. SELECT column_name, clustering_ordinal_position FROM myDataset.INFORMATION_SCHEMA.COLUMNS;
More table metadata is available through the TABLES, TABLE_OPTIONS, COLUMNS, and COLUMN_FIELD_PATH views in INFORMATION_SCHEMA.
Listing clustered tables in a dataset
You can list clustered tables in datasets in the following ways:
- Using the Cloud Console.
- Using the
bqcommand-line tool'sbq lscommand. - Calling the
tables.listAPI method. - Using the client libraries.
- Querying the
CLUSTERING_ORDINAL_POSITIONcolumn in theINFORMATION_SCHEMA.COLUMNSview.
The permissions required to list clustered tables and the steps to list them are the same as for standard tables. For more information about listing tables, see Listing tables in a dataset.
Modifying clustering specification
You can change or remove a table's clustering specifications, or change the set of clustered columns in a clustered table. This method of updating the clustering column set is useful for tables that use continuous streaming inserts because those tables cannot be easily swapped by other methods.
You can change the clustering specification in the following ways:
-
Call the
tables.updateortables.patchAPI method. -
Call the
bqcommand-line tool'sbq updatecommand with the--clustering_fieldsflag.
When a table is converted from non-clustered to clustered or the clustered column set is changed, automatic re-clustering only works from that time onward. For example, a non-clustered 1 PB table that is converted to a clustered table using tables.update still has 1 PB of non-clustered data. Automatic re-clustering only applies to any new data committed to the table after the update.
Table security
To control access to tables in BigQuery, see Introduction to table access controls.
Next steps
- For information about querying clustered tables, see Querying clustered tables.
- For an overview of partitioned table support in BigQuery, see Introduction to partitioned tables.
- To learn how to create partitioned tables, see Creating partitioned tables.
- To see an overview of
INFORMATION_SCHEMA, go to Introduction to BigQueryINFORMATION_SCHEMA.
How To Create Cluster Table In Sap Abap
Source: https://cloud.google.com/bigquery/docs/creating-clustered-tables
Posted by: burrowsbegather45.blogspot.com

0 Response to "How To Create Cluster Table In Sap Abap"
Post a Comment