Google Professional Data Engineer Governance and Security

Use for IAM, policy tags, DLP, Dataplex, Data Catalog, encryption, lineage, auditing, and compliance-oriented data controls.

Exams
PROFESSIONAL-DATA-ENGINEER
Questions
19
Comments
242

1. PROFESSIONAL-DATA-ENGINEER Topic 1 Question 272

Sequence
7
Discussion ID
130221
Source URL
https://www.examtopics.com/discussions/google/view/130221-exam-professional-data-engineer-topic-1-question-272/
Posted By
scaenruy
Posted At
Jan. 3, 2024, 6:55 p.m.

Question

You have a BigQuery dataset named “customers”. All tables will be tagged by using a Data Catalog tag template named “gdpr”. The template contains one mandatory field, “has_sensitive_data”, with a boolean value. All employees must be able to do a simple search and find tables in the dataset that have either true or false in the “has_sensitive_data’ field. However, only the Human Resources (HR) group should be able to see the data inside the tables for which “has_sensitive data” is true. You give the all employees group the bigquery.metadataViewer and bigquery.connectionUser roles on the dataset. You want to minimize configuration overhead. What should you do next?

  • A. Create the “gdpr” tag template with private visibility. Assign the bigquery.dataViewer role to the HR group on the tables that contain sensitive data.
  • B. Create the “gdpr” tag template with private visibility. Assign the datacatalog.tagTemplateViewer role on this tag to the all employees group, and assign the bigquery.dataViewer role to the HR group on the tables that contain sensitive data.
  • C. Create the “gdpr” tag template with public visibility. Assign the bigquery.dataViewer role to the HR group on the tables that contain sensitive data.
  • D. Create the “gdpr” tag template with public visibility. Assign the datacatalog.tagTemplateViewer role on this tag to the all employees group, and assign the bigquery.dataViewer role to the HR group on the tables that contain sensitive data.

Suggested Answer

C

Answer Description Click to expand


Community Answer Votes

Comments 11 comments Click to expand

Comment 1

ID: 1117465 User: raaad Badges: Highly Voted Relative Date: 2 years, 2 months ago Absolute Date: Tue 09 Jan 2024 13:26 Selected Answer: C Upvotes: 19

- The most straightforward solution with minimal configuration overhead.
- By creating the "gdpr" tag template with public visibility, you ensure that all employees can search and find tables based on the "has_sensitive_data" field.
- Assigning the bigquery.dataViewer role to the HR group on tables with sensitive data ensures that only they can view the actual data in these tables.

Comment 1.1

ID: 1153181 User: ML6 Badges: - Relative Date: 2 years ago Absolute Date: Sun 18 Feb 2024 11:08 Selected Answer: - Upvotes: 2

Wouldn't employees still need the roles/datacatalog.tagTemplateViewer role to view private AND public tags?
To get the permissions that you need to view public and private tags on Bigtable resources, ask your administrator to grant you the following IAM roles:
- roles/datacatalog.tagTemplateViewer
- roles/bigtable.viewer
Source: https://cloud.google.com/bigtable/docs/manage-data-assets-using-data-catalog#permissions-view-tags

Comment 1.2

ID: 1153189 User: ML6 Badges: - Relative Date: 2 years ago Absolute Date: Sun 18 Feb 2024 11:21 Selected Answer: - Upvotes: 1

Ignore the last reply. The correct answer would be C.
Tags = Custom metadata fields that you can attach to a data entry to provide context.
Tag templates = Reusable structures that you can use to rapidly create new tags.
In short, the employees do not need a tagTemplateViewer role because it pertains to the tag templates, not the tags themselves.

Comment 2

ID: 1710201 User: NickForDiscussions Badges: Most Recent Relative Date: 1 month, 1 week ago Absolute Date: Thu 29 Jan 2026 09:55 Selected Answer: D Upvotes: 1

"All employees must be able to do a simple search and find tables in the dataset that have either true or false in the “has_sensitive_data’ field." To be able to search for values in the tags you need the role roles/datacatalog.tagTemplateViewer. Meaning option D is correct.

Comment 3

ID: 1263396 User: meh_33 Badges: - Relative Date: 1 year, 7 months ago Absolute Date: Sat 10 Aug 2024 10:59 Selected Answer: C Upvotes: 2

This Guy Raasd is mostly correct with explanation thanks mate.

Comment 4

ID: 1260029 User: iooj Badges: - Relative Date: 1 year, 7 months ago Absolute Date: Fri 02 Aug 2024 22:43 Selected Answer: - Upvotes: 1

A - employees cannot use the tag
B - increases the configuration overhead
C - exactly what we need
D - unnecessary role assignment, the tag template is already visibile

Comment 5

ID: 1181979 User: d11379b Badges: - Relative Date: 1 year, 11 months ago Absolute Date: Sun 24 Mar 2024 22:12 Selected Answer: C Upvotes: 2

While D works well, it is not obligated to give all employees the role of tagTemplateViewer, as it will give them the view permission for tag templates as well as the tags created by the template.
However, Tags are a type of business metadata. Adding tags to a data entry helps provide meaningful context to anyone who needs to use the asset.And public tags provide less strict access control for searching and viewing the tag as compared to private tags. Any user who has the required view permissions for a data entry can view all the public tags associated with it. View permissions for public tags are only required when you perform a search in Data Catalog using the tag: syntax or when you view an unattached tag template.

Comment 5.1

ID: 1181980 User: d11379b Badges: - Relative Date: 1 year, 11 months ago Absolute Date: Sun 24 Mar 2024 22:15 Selected Answer: - Upvotes: 1

As all employees have the role “ bigquery.metadataViewer” they are already capable to see tags on BigQuery then

Comment 6

ID: 1155254 User: JyoGCP Badges: - Relative Date: 2 years ago Absolute Date: Wed 21 Feb 2024 06:02 Selected Answer: C Upvotes: 2

I'll go with raaad's answer

Comment 7

ID: 1137526 User: tibuenoc Badges: - Relative Date: 2 years, 1 month ago Absolute Date: Thu 01 Feb 2024 11:44 Selected Answer: B Upvotes: 4

If you working with PII, We can't granted public access. So Private Visibility for the Tag Template its the best option.

Check it https://cloud.google.com/data-catalog/docs/tags-and-tag-templates

Comment 8

ID: 1112996 User: scaenruy Badges: - Relative Date: 2 years, 2 months ago Absolute Date: Wed 03 Jan 2024 18:55 Selected Answer: D Upvotes: 4

D. Create the “gdpr” tag template with public visibility. Assign the datacatalog.tagTemplateViewer role on this tag to the all employees group, and assign the bigquery.dataViewer role to the HR group on the tables that contain sensitive data.

2. PROFESSIONAL-DATA-ENGINEER Topic 1 Question 338

Sequence
9
Discussion ID
382514
Source URL
https://www.examtopics.com/discussions/google/view/382514-exam-professional-data-engineer-topic-1-question-338/
Posted By
67bdb19
Posted At
Jan. 16, 2026, 8:15 a.m.

Question

Your organization stores highly personal data in BigQuery and needs to comply with strict data privacy regulations. You need to ensure that sensitive data values are rendered unreadable whenever an employee leaves the organization. What should you do?

  • A. Use column-level access controls with policy tags and revoke viewer permissions when employees leave the organization.
  • B. Use dynamic data masking and revoke viewer permissions when employees leave the organization.
  • C. Use customer-managed encryption keys (CMEK) and delete keys when employees leave the organization.
  • D. Use AEAD functions and delete keys when employees leave the organization.

Suggested Answer

D

Answer Description Click to expand


Community Answer Votes

Comments 1 comment Click to expand

Comment 1

ID: 1708067 User: DerfelCardarn Badges: - Relative Date: 1 month, 3 weeks ago Absolute Date: Mon 19 Jan 2026 19:45 Selected Answer: D Upvotes: 3

C is too complex

3. PROFESSIONAL-DATA-ENGINEER Topic 1 Question 217

Sequence
13
Discussion ID
129864
Source URL
https://www.examtopics.com/discussions/google/view/129864-exam-professional-data-engineer-topic-1-question-217/
Posted By
e70ea9e
Posted At
Dec. 30, 2023, 9:42 a.m.

Question

You have a BigQuery table that contains customer data, including sensitive information such as names and addresses. You need to share the customer data with your data analytics and consumer support teams securely. The data analytics team needs to access the data of all the customers, but must not be able to access the sensitive data. The consumer support team needs access to all data columns, but must not be able to access customers that no longer have active contracts. You enforced these requirements by using an authorized dataset and policy tags. After implementing these steps, the data analytics team reports that they still have access to the sensitive columns. You need to ensure that the data analytics team does not have access to restricted data. What should you do? (Choose two.)

  • A. Create two separate authorized datasets; one for the data analytics team and another for the consumer support team.
  • B. Ensure that the data analytics team members do not have the Data Catalog Fine-Grained Reader role for the policy tags.
  • C. Replace the authorized dataset with an authorized view. Use row-level security and apply filter_expression to limit data access.
  • D. Remove the bigquery.dataViewer role from the data analytics team on the authorized datasets.
  • E. Enforce access control in the policy tag taxonomy.

Suggested Answer

B

Answer Description Click to expand


Community Answer Votes

Comments 12 comments Click to expand

Comment 1

ID: 1116435 User: qq589539483084gfrgrgfr Badges: Highly Voted Relative Date: 2 years, 2 months ago Absolute Date: Mon 08 Jan 2024 06:52 Selected Answer: - Upvotes: 7

Option B & E

Comment 2

ID: 1123270 User: datapassionate Badges: Highly Voted Relative Date: 2 years, 1 month ago Absolute Date: Mon 15 Jan 2024 12:14 Selected Answer: E Upvotes: 5

B& E
https://cloud.google.com/bigquery/docs/column-level-security-intro

Comment 3

ID: 1703981 User: Kalai_1 Badges: Most Recent Relative Date: 2 months ago Absolute Date: Mon 05 Jan 2026 10:37 Selected Answer: B Upvotes: 1

B & E. As per Google recommended security best practices

Comment 4

ID: 1243086 User: Lenifia Badges: - Relative Date: 1 year, 8 months ago Absolute Date: Sat 06 Jul 2024 01:48 Selected Answer: A Upvotes: 2

the correct options are:

A. Create two separate authorized datasets; one for the data analytics team and another for the consumer support team.
C. Replace the authorized dataset with an authorized view. Use row-level security and apply filter_expression to limit data access.

Comment 4.1

ID: 1243087 User: Lenifia Badges: - Relative Date: 1 year, 8 months ago Absolute Date: Sat 06 Jul 2024 01:48 Selected Answer: - Upvotes: 2

Explanation of why other options are incorrect:

B. Ensure that the data analytics team members do not have the Data Catalog Fine-Grained Reader role for the policy tags: This role relates to viewing data in Data Catalog based on policy tags, not directly controlling access to BigQuery data.

D. Remove the bigquery.dataViewer role from the data analytics team on the authorized datasets: Removing this role would block all access to the dataset, which is too restrictive if they still need access to non-sensitive columns.

E. Enforce access control in the policy tag taxonomy: While policy tags are used to enforce access controls, simply enforcing controls in the taxonomy does not directly address the issue of sensitive data access in BigQuery.

Comment 5

ID: 1122596 User: GCP001 Badges: - Relative Date: 2 years, 1 month ago Absolute Date: Sun 14 Jan 2024 15:26 Selected Answer: E Upvotes: 3

B & E
B - It will ensure they don't have access to secure columns
E- It will allow to enforce column level security
Ref - https://cloud.google.com/bigquery/docs/column-level-security-intro

Comment 6

ID: 1121500 User: Matt_108 Badges: - Relative Date: 2 years, 1 month ago Absolute Date: Sat 13 Jan 2024 11:20 Selected Answer: B Upvotes: 2

Option B& E to me

Comment 7

ID: 1116447 User: MaxNRG Badges: - Relative Date: 2 years, 2 months ago Absolute Date: Mon 08 Jan 2024 07:10 Selected Answer: A Upvotes: 3

A & B
The current setup is not effective because the data analytics team still has access to the sensitive columns despite using an authorized dataset and policy tags. This indicates that the policy tags are not being enforced properly, and the data analytics team members are able to view the tags and gain access to the sensitive data.
Separating the data into two distinct authorized datasets is a better approach because it isolates the sensitive data from the non-sensitive data. This prevents the data analytics team from accessing the sensitive columns directly, even if they have access to the authorized dataset for general customer data.
Additionally, revoking the Data Catalog Fine-Grained Reader role from the data analytics team members ensures that they cannot view or modify the policy tags. This limits their ability to bypass the access control imposed by the authorized dataset and policy tags.

Comment 7.1

ID: 1121499 User: Matt_108 Badges: - Relative Date: 2 years, 1 month ago Absolute Date: Sat 13 Jan 2024 11:19 Selected Answer: - Upvotes: 3

Max I feel like it's more B&E.
I do agree on the revoking Data Catalog Fine-grained reader role to avoid the data analytics team to read policy tags metadata, but if the tags are setup as stated, it's just missing the enforcement of the policy tags themselves.
Creating 2 auth dataset is not efficient on big datasets and Data catalog+ policy tags are built to manage these situations. Don't you agree?

Comment 8

ID: 1113695 User: imiu Badges: - Relative Date: 2 years, 2 months ago Absolute Date: Thu 04 Jan 2024 14:05 Selected Answer: - Upvotes: 1

And the second answer? One is option B and the other is option D maybe?

Comment 9

ID: 1113225 User: raaad Badges: - Relative Date: 2 years, 2 months ago Absolute Date: Thu 04 Jan 2024 00:40 Selected Answer: B Upvotes: 3

- The Data Catalog Fine-Grained Reader role allows users to read metadata that is restricted by policy tags.
- If members of the data analytics team have this role, they might bypass the restrictions set by policy tags.
- Ensuring they do not have this role will help enforce the restrictions intended by the policy tags.

Comment 10

ID: 1109543 User: e70ea9e Badges: - Relative Date: 2 years, 2 months ago Absolute Date: Sat 30 Dec 2023 09:42 Selected Answer: B Upvotes: 2

Prevents data analytics team members from viewing sensitive data, even if it's tagged.
Restricts access to policy tags themselves, ensuring confidentiality of sensitive information.

4. PROFESSIONAL-DATA-ENGINEER Topic 1 Question 269

Sequence
34
Discussion ID
130424
Source URL
https://www.examtopics.com/discussions/google/view/130424-exam-professional-data-engineer-topic-1-question-269/
Posted By
raaad
Posted At
Jan. 5, 2024, 6:08 p.m.

Question

Your organization's data assets are stored in BigQuery, Pub/Sub, and a PostgreSQL instance running on Compute Engine. Because there are multiple domains and diverse teams using the data, teams in your organization are unable to discover existing data assets. You need to design a solution to improve data discoverability while keeping development and configuration efforts to a minimum. What should you do?

  • A. Use Data Catalog to automatically catalog BigQuery datasets. Use Data Catalog APIs to manually catalog Pub/Sub topics and PostgreSQL tables.
  • B. Use Data Catalog to automatically catalog BigQuery datasets and Pub/Sub topics. Use Data Catalog APIs to manually catalog PostgreSQL tables.
  • C. Use Data Catalog to automatically catalog BigQuery datasets and Pub/Sub topics. Use custom connectors to manually catalog PostgreSQL tables.
  • D. Use customer connectors to manually catalog BigQuery datasets, Pub/Sub topics, and PostgreSQL tables.

Suggested Answer

B

Answer Description Click to expand


Community Answer Votes

Comments 25 comments Click to expand

Comment 1

ID: 1114681 User: raaad Badges: Highly Voted Relative Date: 2 years, 2 months ago Absolute Date: Fri 05 Jan 2024 18:08 Selected Answer: B Upvotes: 15

- It utilizes Data Catalog's native support for both BigQuery datasets and Pub/Sub topics.
- For PostgreSQL tables running on a Compute Engine instance, you'd use Data Catalog APIs to create custom entries, as Data Catalog does not automatically discover external databases like PostgreSQL.

Comment 1.1

ID: 1127636 User: AllenChen123 Badges: - Relative Date: 2 years, 1 month ago Absolute Date: Sun 21 Jan 2024 06:16 Selected Answer: - Upvotes: 5

Agree. https://cloud.google.com/data-catalog/docs/concepts/overview#catalog-non-google-cloud-assets

Comment 2

ID: 1134752 User: datapassionate Badges: Highly Voted Relative Date: 2 years, 1 month ago Absolute Date: Mon 29 Jan 2024 09:24 Selected Answer: C Upvotes: 14

Data Catalog is the best choice. But for catalogging PostgreSQL it is better to use a connector when available, instead of using API.
https://cloud.google.com/data-catalog/docs/integrate-data-sources#integrate_unsupported_data_sources

Comment 2.1

ID: 1283155 User: 7787de3 Badges: - Relative Date: 1 year, 6 months ago Absolute Date: Fri 13 Sep 2024 14:23 Selected Answer: - Upvotes: 1

I agree. On the linked page:
If you can't find a connector for your data source, you can still manually integrate it by creating entry groups and custom entries.
As we can find a connector there, it should be used.

Comment 2.2

ID: 1137469 User: tibuenoc Badges: - Relative Date: 2 years, 1 month ago Absolute Date: Thu 01 Feb 2024 10:48 Selected Answer: - Upvotes: 4

Agree. If it doesn't have a connector, it must be manually built on the Data Catalog API.
As PostgreSQL already has a connector it's the best option is C

Comment 3

ID: 1619945 User: af17139 Badges: Most Recent Relative Date: 4 months, 2 weeks ago Absolute Date: Fri 24 Oct 2025 09:07 Selected Answer: C Upvotes: 1

Automatic Cataloging: Data Catalog (now part of Dataplex) has built-in integrations to automatically discover and register metadata from Google Cloud managed services like BigQuery and Pub/Sub. This requires little to no configuration.
Custom Connector for PostgreSQL: For the PostgreSQL instance running on Compute Engine, you'll need to develop a "custom connector." This typically involves writing a script or application that:
Connects to the PostgreSQL database.
Extracts schema information (tables, columns, types, etc.), potentially from information_schema.
Transforms this information into the format required by the Data Catalog API.
Uses the Data Catalog APIs to create or update entries for the PostgreSQL tables. This connector can be run on a schedule (e.g., using Cloud Scheduler and Cloud Functions) to keep the metadata in Data Catalog synchronized with the PostgreSQL instance.

Comment 4

ID: 1560528 User: aaaaaaaasdasdasfs Badges: - Relative Date: 11 months ago Absolute Date: Mon 14 Apr 2025 09:06 Selected Answer: B Upvotes: 1

This is the correct approach. Data Catalog provides automatic discovery and metadata extraction for both BigQuery datasets and Pub/Sub topics as native Google Cloud services. You'll only need to use the Data Catalog APIs to manually catalog the PostgreSQL tables running on Compute Engine, as this is a non-native data source. This maximizes automation while minimizing development effort.

Comment 5

ID: 1410923 User: Abizi Badges: - Relative Date: 11 months, 2 weeks ago Absolute Date: Thu 27 Mar 2025 15:21 Selected Answer: C Upvotes: 1

Why C is Correct?
BigQuery datasets → ✅ Automatically cataloged in Data Catalog

Pub/Sub topics → ✅ Automatically cataloged in Data Catalog

PostgreSQL on Compute Engine → ❌ Not automatically cataloged

Requires a custom connector to extract metadata and push it to Data Catalog.

Option B (using Data Catalog APIs manually) is not enough because PostgreSQL metadata isn’t natively supported.

Why Not B?
Option B suggests using Data Catalog APIs manually for PostgreSQL.

However, Data Catalog does not natively support PostgreSQL metadata extraction.

You need a custom connector to first extract PostgreSQL schema information, then push it to Data Catalog.

Comment 6

ID: 1330621 User: AWSandeep Badges: - Relative Date: 1 year, 2 months ago Absolute Date: Mon 23 Dec 2024 00:56 Selected Answer: B Upvotes: 2

This section explains it clearly: https://cloud.google.com/data-catalog/docs/integrate-data-sources#integrate_unsupported_data_sources.

Comment 7

ID: 1294639 User: baimus Badges: - Relative Date: 1 year, 5 months ago Absolute Date: Tue 08 Oct 2024 10:21 Selected Answer: C Upvotes: 3

This is C. To clarify some issues below with B, the links provided by supporters of B actually do say that it's preferable to use a community connector where available, and to only use the API when the case is genuinely not supported by community connectors.
In this case it's Postgresql, so it's supported, see here for full list: https://cloud.google.com/data-catalog/docs/integrate-data-sources#integrate_on-premises_data_sources

So this would be B if it was something like Q+ or some genuinely unsupported database, but postgres is supported for community connector.

Comment 8

ID: 1271742 User: shanks_t Badges: - Relative Date: 1 year, 6 months ago Absolute Date: Sat 24 Aug 2024 18:35 Selected Answer: B Upvotes: 3

Data Catalog automatically catalogs metadata from Google Cloud sources such as BigQuery, Vertex AI, Pub/Sub, Spanner, Bigtable, and more.

To catalog metadata from non-Google Cloud systems in your organization, you can use the following:

Community-contributed connectors to multiple popular on-premises data sources
Manually build on the Data Catalog APIs for custom entries

Comment 8.1

ID: 1271744 User: shanks_t Badges: - Relative Date: 1 year, 6 months ago Absolute Date: Sat 24 Aug 2024 18:37 Selected Answer: - Upvotes: 1

C. While similar to B, using custom connectors for PostgreSQL might involve more development effort than using the Data Catalog APIs directly.

Comment 9

ID: 1263405 User: meh_33 Badges: - Relative Date: 1 year, 7 months ago Absolute Date: Sat 10 Aug 2024 11:24 Selected Answer: - Upvotes: 1

raaad mostly correct and we can check his description supporting his answer so we can go with it .Cheers mate

Comment 10

ID: 1256420 User: 987af6b Badges: - Relative Date: 1 year, 7 months ago Absolute Date: Sat 27 Jul 2024 20:37 Selected Answer: C Upvotes: 2

I’m voting for C because the documentation states that Postgres is a custom connector developed by the community.

Comment 10.1

ID: 1256788 User: 987af6b Badges: - Relative Date: 1 year, 7 months ago Absolute Date: Sun 28 Jul 2024 14:38 Selected Answer: - Upvotes: 1

Changed my mind. B.
-This is not on premise, so the custom connector should not be applicable
-Question says keep manual dev and config to a minimum

Comment 11

ID: 1231691 User: fitri001 Badges: - Relative Date: 1 year, 8 months ago Absolute Date: Mon 17 Jun 2024 05:37 Selected Answer: B Upvotes: 4

BigQuery Datasets and Pub/Sub Topics: Google Data Catalog can automatically catalog metadata from BigQuery and Pub/Sub, making it easy to discover and manage these data assets without additional development effort.

PostgreSQL Tables: While Data Catalog does not have built-in connectors for PostgreSQL, you can use the Data Catalog APIs to manually catalog the PostgreSQL tables. This requires some custom development but is manageable compared to creating custom connectors for everything.

Comment 12

ID: 1215683 User: virat_kohli Badges: - Relative Date: 1 year, 9 months ago Absolute Date: Wed 22 May 2024 13:05 Selected Answer: B Upvotes: 2

B. Use Data Catalog to automatically catalog BigQuery datasets and Pub/Sub topics. Use Data Catalog APIs to manually catalog PostgreSQL tables.

Comment 13

ID: 1211340 User: Cassim Badges: - Relative Date: 1 year, 10 months ago Absolute Date: Tue 14 May 2024 12:51 Selected Answer: B Upvotes: 2

Option B leverages Data Catalog to automatically catalog BigQuery datasets and Pub/Sub topics, which streamlines the process and reduces manual effort. Using Data Catalog APIs to manually catalog PostgreSQL tables ensures consistency across all data assets while minimizing development and configuration efforts.

Comment 14

ID: 1201758 User: LaxmanTiwari Badges: - Relative Date: 1 year, 10 months ago Absolute Date: Thu 25 Apr 2024 07:20 Selected Answer: C Upvotes: 3

I vote for c as per Integrate on-premises data sources
To integrate on-premises data sources, you can use the corresponding Python connectors contributed by the community:

under the link

https://cloud.google.com/data-catalog/docs/integrate-data-sources

Comment 14.1

ID: 1201759 User: LaxmanTiwari Badges: - Relative Date: 1 year, 10 months ago Absolute Date: Thu 25 Apr 2024 07:22 Selected Answer: - Upvotes: 1

data catalog api will come into effect if custom connectors are not available via community repos.

Comment 15

ID: 1190850 User: joao_01 Badges: - Relative Date: 1 year, 11 months ago Absolute Date: Sun 07 Apr 2024 10:24 Selected Answer: - Upvotes: 3

In the opction C, the expression "Use custom connectors to manually catalog PostgreSQL tables." is refering to the use case of Google when you want to use "Community-contributed connectors to multiple popular on-premises data sources". As you can see, this connectors are for ON-PREMISSES data sources ONLY. In this case the Postgres is in a VM in the cloud. Thus, the option correct is B.

Comment 15.1

ID: 1190851 User: joao_01 Badges: - Relative Date: 1 year, 11 months ago Absolute Date: Sun 07 Apr 2024 10:24 Selected Answer: - Upvotes: 1

Link: https://cloud.google.com/data-catalog/docs/concepts/overview#catalog-non-google-cloud-assets

Comment 16

ID: 1174778 User: hanoverquay Badges: - Relative Date: 1 year, 12 months ago Absolute Date: Sat 16 Mar 2024 07:27 Selected Answer: B Upvotes: 2

option B, there's no need to build a custom connector now, postgreSQL is now supported
https://github.com/GoogleCloudPlatform/datacatalog-connectors-rdbms/tree/master/google-datacatalog-postgresql-connector

Comment 16.1

ID: 1181886 User: d11379b Badges: - Relative Date: 1 year, 11 months ago Absolute Date: Sun 24 Mar 2024 18:28 Selected Answer: - Upvotes: 1

I think “custom connector” here may just infer that this is not official tools? as the doc mentioned “ connectors contributed by the community”
And should not be B as “manually catalog by API “ this is a way even more basic than using connector

Comment 17

ID: 1172481 User: Y___ash Badges: - Relative Date: 1 year, 12 months ago Absolute Date: Wed 13 Mar 2024 12:20 Selected Answer: B Upvotes: 2

Use Data Catalog to automatically catalog BigQuery datasets and Pub/Sub topics. Use Data Catalog APIs to manually catalog PostgreSQL tables.

5. PROFESSIONAL-DATA-ENGINEER Topic 1 Question 303

Sequence
79
Discussion ID
130326
Source URL
https://www.examtopics.com/discussions/google/view/130326-exam-professional-data-engineer-topic-1-question-303/
Posted By
scaenruy
Posted At
Jan. 4, 2024, 1:45 p.m.

Question

You are managing a Dataplex environment with raw and curated zones. A data engineering team is uploading JSON and CSV files to a bucket asset in the curated zone but the files are not being automatically discovered by Dataplex. What should you do to ensure that the files are discovered by Dataplex?

  • A. Move the JSON and CSV files to the raw zone.
  • B. Enable auto-discovery of files for the curated zone.
  • C. Use the bg command-line tool to load the JSON and CSV files into BigQuery tables.
  • D. Grant object level access to the CSV and JSON files in Cloud Storage.

Suggested Answer

A

Answer Description Click to expand


Community Answer Votes

Comments 24 comments Click to expand

Comment 1

ID: 1125223 User: GCP001 Badges: Highly Voted Relative Date: 2 years, 1 month ago Absolute Date: Wed 17 Jan 2024 19:40 Selected Answer: A Upvotes: 28

Should be A. Curated zone need Parquet, Avro, ORC format not CSV or JSON. Check the ref - https://cloud.google.com/dataplex/docs/add-zone#curated-zones

Comment 1.1

ID: 1574728 User: Positron75 Badges: - Relative Date: 9 months, 1 week ago Absolute Date: Wed 04 Jun 2025 11:03 Selected Answer: - Upvotes: 1

Agreed. This link makes it more explicit: https://cloud.google.com/dataplex/docs/discover-data?hl=en#invalid_data_format

"Invalid data format in curated zones (data not in Avro, Parquet, or ORC formats)."

Comment 2

ID: 1115178 User: raaad Badges: Highly Voted Relative Date: 2 years, 2 months ago Absolute Date: Sat 06 Jan 2024 13:57 Selected Answer: B Upvotes: 11

- Auto-Discovery Feature: Dataplex has an auto-discovery feature that, when enabled, automatically discovers and catalogs data assets within a zone.
- Appropriate for Both Raw and Curated Zones: This feature is applicable to both raw and curated zones, and it should be tailored to the specific data governance and cataloging needs of the organization.

Comment 2.1

ID: 1319450 User: cloud_rider Badges: - Relative Date: 1 year, 3 months ago Absolute Date: Thu 28 Nov 2024 21:48 Selected Answer: - Upvotes: 1

A is correct, Auto-Discovery features works on both curated and raw zones, but to keep JSON and CSV in curated zone, they must be kept along with the specification. Whereas in RAW zone, the discovery of these files happens even without specification file. refer to this link->> https://cloud.google.com/dataplex/docs/discover-data#discovery-configuration

Comment 3

ID: 1571098 User: 22c1725 Badges: Most Recent Relative Date: 9 months, 3 weeks ago Absolute Date: Wed 21 May 2025 22:35 Selected Answer: B Upvotes: 1

Still if you go with "A" you need to do "B". the question is not about best practice.

Comment 3.1

ID: 1574729 User: Positron75 Badges: - Relative Date: 9 months, 1 week ago Absolute Date: Wed 04 Jun 2025 11:07 Selected Answer: - Upvotes: 1

This isn't about best practice. The documentation outright states that data *not* in Avro, Parquet, or ORC formats within curated zones is considered invalid for the purposes of discovery: https://cloud.google.com/dataplex/docs/discover-data?hl=en#invalid_data_format

Comment 4

ID: 1560167 User: rajshiv Badges: - Relative Date: 11 months ago Absolute Date: Sat 12 Apr 2025 23:02 Selected Answer: B Upvotes: 2

We can store JSON and CSV in the curated zone if those files represent curated data. Usually we store JSON/CSV files in the raw zone if they are straight from source. But nowhere in the question is any of that detail mentioned. So I think the correct answer is :
B - Dataplex automatically discovers and catalogs data in the zones only if auto-discovery is enabled for the zone or asset. In this scenario - JSON and CSV files are being uploaded to a curated zone, which is fine. But if files are not being discovered, it's likely because auto-discovery is not enabled for that zone.

Comment 5

ID: 1399032 User: MBNR Badges: - Relative Date: 12 months ago Absolute Date: Sat 15 Mar 2025 21:05 Selected Answer: A Upvotes: 1

Answer is A
Data Format supported: Data in curated zones is typically columnar, Hive-partitioned, and stored in formats like Parquet, Avro, or ORC
Restrictions: Dataplex does NOT allow users to create CSV files within a "curated zone

Comment 5.1

ID: 1411743 User: desertlotus1211 Badges: - Relative Date: 11 months, 2 weeks ago Absolute Date: Sat 29 Mar 2025 16:44 Selected Answer: - Upvotes: 1

Auto-Discovery is the better option

Comment 6

ID: 1346541 User: juliorevk Badges: - Relative Date: 1 year, 1 month ago Absolute Date: Sat 25 Jan 2025 17:35 Selected Answer: A Upvotes: 1

- Raw zones store structured data, semi-structured data such as CSV files and JSON files, and unstructured data in any format from external sources. Raw zones are useful for staging raw data before performing any transformations. Data can be stored in Cloud Storage buckets or BigQuery datasets.
- Curated Zones do not support JSON / CSV

Comment 7

ID: 1305992 User: SamuelTsch Badges: - Relative Date: 1 year, 4 months ago Absolute Date: Fri 01 Nov 2024 21:49 Selected Answer: A Upvotes: 1

Raw zones store structured data, semi-structured data such as CSV files and JSON files, and unstructured data in any format from external sources. Curated zones store structured data. Data can be stored in Cloud Storage buckets or BigQuery datasets. Supported formats for Cloud Storage buckets include Parquet, Avro, and ORC.

Comment 8

ID: 1270140 User: rajnairds Badges: - Relative Date: 1 year, 6 months ago Absolute Date: Wed 21 Aug 2024 15:17 Selected Answer: B Upvotes: 3

Discovery configuration
Discovery is enabled by default when you create a new zone or asset. You can disable Discovery at the zone or asset level.

For each Dataplex asset with Discovery enabled, Dataplex does the following:

Scans the data associated with the asset.
Groups structured and semi-structured files into tables.
Collects technical metadata, such as table name, schema, and partition definition.
For unstructured data, such as images and videos, Dataplex Discovery automatically detects and registers groups of files sharing media type as filesets. For example, if gs://images/group1 contains GIF images, and gs://images/group2 contains JPEG images, Dataplex Discovery detects and registers two filesets. For structured data, such as Avro, Discovery detects files only if they are located in folders that contain the same data format and schema.

Reference : https://cloud.google.com/dataplex/docs/discover-data#exclude-files-from-Discovery

Comment 9

ID: 1241550 User: hussain.sain Badges: - Relative Date: 1 year, 8 months ago Absolute Date: Wed 03 Jul 2024 19:15 Selected Answer: B Upvotes: 3

While JSON and CSV can technically be stored in curated zones, it is not a common practice due to the reasons mentioned above. no where in the mention link its mention that there is a restriction.

Comment 10

ID: 1231634 User: Anudeep58 Badges: - Relative Date: 1 year, 8 months ago Absolute Date: Mon 17 Jun 2024 04:18 Selected Answer: A Upvotes: 4

While none of the original options (A, B, C, or D) directly address the issue, the closest solution is:

Move the JSON and CSV files to a raw zone. (This was previously marked as the most voted option, but it's not ideal due to data organization disruption)
Here's why this approach might be necessary (but not ideal):

Dataplex curated zones currently don't support native processing of JSON and CSV formats. They are designed for structured data formats like Parquet, Avro, or ORC.

Comment 11

ID: 1205311 User: chrissamharris Badges: - Relative Date: 1 year, 10 months ago Absolute Date: Thu 02 May 2024 07:24 Selected Answer: A Upvotes: 1

Option A
https://cloud.google.com/dataplex/docs/add-zone#raw-zones

Raw zones are the only zones that support CSV & JSON

Comment 12

ID: 1194410 User: joao_01 Badges: - Relative Date: 1 year, 11 months ago Absolute Date: Fri 12 Apr 2024 17:44 Selected Answer: - Upvotes: 1

Its B guys, i encounter this in my job, and I had to do B to make it work

Comment 12.1

ID: 1194411 User: joao_01 Badges: - Relative Date: 1 year, 11 months ago Absolute Date: Fri 12 Apr 2024 17:46 Selected Answer: - Upvotes: 1

Actually I did this in a Raw zone, not Curated.

Comment 12.1.1

ID: 1194415 User: joao_01 Badges: - Relative Date: 1 year, 11 months ago Absolute Date: Fri 12 Apr 2024 17:57 Selected Answer: - Upvotes: 5

Its A :)

Comment 13

ID: 1166648 User: demoro86 Badges: - Relative Date: 2 years ago Absolute Date: Tue 05 Mar 2024 18:33 Selected Answer: A Upvotes: 2

GCP001 agree with him

Comment 14

ID: 1163147 User: Moss2011 Badges: - Relative Date: 2 years ago Absolute Date: Fri 01 Mar 2024 02:16 Selected Answer: A Upvotes: 2

The answer can be found reading a common config of Dataplex in this URL: https://medium.com/google-cloud/google-cloud-dataplex-part-1-lakes-zones-assets-and-discovery-5f288486cb2f

Comment 15

ID: 1161537 User: kck6ra4214wm Badges: - Relative Date: 2 years ago Absolute Date: Wed 28 Feb 2024 11:53 Selected Answer: A Upvotes: 1

Dataplex does not allow users to create CSV files within a “curated zone”

Comment 16

ID: 1155991 User: daidai75 Badges: - Relative Date: 2 years ago Absolute Date: Thu 22 Feb 2024 02:20 Selected Answer: B Upvotes: 2

According to this URL: https://cloud.google.com/dataplex/docs/discover-data, the auto-discovery can support CSV and Json in both Raw-Zone and Curated-Zone. I also open a console the verify it, both Raw and Curated zone can set up csv&json auto-discovery.

Comment 17

ID: 1147771 User: dungct Badges: - Relative Date: 2 years, 1 month ago Absolute Date: Mon 12 Feb 2024 02:33 Selected Answer: B Upvotes: 4

Discovery raises the following administrator actions whenever data-related issues are detected during scans : Inconsistent data format in a table. For example, files of different formats exist with the same table prefix. Inconsistent data format in a table. For example, files of different formats exist with the same table prefix.

Comment 17.1

ID: 1147773 User: dungct Badges: - Relative Date: 2 years, 1 month ago Absolute Date: Mon 12 Feb 2024 02:35 Selected Answer: - Upvotes: 3

https://cloud.google.com/dataplex/docs/discover-data#invalid_data_format

6. PROFESSIONAL-DATA-ENGINEER Topic 1 Question 306

Sequence
88
Discussion ID
130321
Source URL
https://www.examtopics.com/discussions/google/view/130321-exam-professional-data-engineer-topic-1-question-306/
Posted By
scaenruy
Posted At
Jan. 4, 2024, 1:21 p.m.

Question

You are preparing an organization-wide dataset. You need to preprocess customer data stored in a restricted bucket in Cloud Storage. The data will be used to create consumer analyses. You need to comply with data privacy requirements.

What should you do?

  • A. Use Dataflow and the Cloud Data Loss Prevention API to mask sensitive data. Write the processed data in BigQuery.
  • B. Use customer-managed encryption keys (CMEK) to directly encrypt the data in Cloud Storage. Use federated queries from BigQuery. Share the encryption key by following the principle of least privilege.
  • C. Use the Cloud Data Loss Prevention API and Dataflow to detect and remove sensitive fields from the data in Cloud Storage. Write the filtered data in BigQuery.
  • D. Use Dataflow and Cloud KMS to encrypt sensitive fields and write the encrypted data in BigQuery. Share the encryption key by following the principle of least privilege.

Suggested Answer

A

Answer Description Click to expand


Community Answer Votes

Comments 9 comments Click to expand

Comment 1

ID: 1115169 User: raaad Badges: Highly Voted Relative Date: 1 year, 8 months ago Absolute Date: Sat 06 Jul 2024 12:31 Selected Answer: A Upvotes: 13

- Prioritizes Data Privacy: It protects sensitive information by masking it, reducing the risk of exposure in case of unauthorized access or accidental leaks.
- Reduces Data Sensitivity: Masking renders sensitive data unusable for attackers, even if they gain access to it.
- Preserves Data Utility: Masked data can still be used for consumer analyses, as patterns and relationships are often preserved, allowing meaningful insights to be derived.

Comment 1.1

ID: 1147786 User: dungct Badges: - Relative Date: 1 year, 7 months ago Absolute Date: Mon 12 Aug 2024 02:00 Selected Answer: - Upvotes: 2

why not d ?

Comment 1.1.1

ID: 1153841 User: ML6 Badges: - Relative Date: 1 year, 6 months ago Absolute Date: Mon 19 Aug 2024 10:37 Selected Answer: - Upvotes: 2

Data in Cloud Storage is encrypted by default.

Comment 2

ID: 1571102 User: 22c1725 Badges: Most Recent Relative Date: 9 months, 3 weeks ago Absolute Date: Wed 21 May 2025 22:42 Selected Answer: A Upvotes: 1

this is repeated question.

Comment 3

ID: 1411746 User: desertlotus1211 Badges: - Relative Date: 11 months, 2 weeks ago Absolute Date: Sat 29 Mar 2025 16:52 Selected Answer: C Upvotes: 1

If I had to choose...

I choose C or A... A can still leave partial sensitive data available.

Comment 3.1

ID: 1411747 User: desertlotus1211 Badges: - Relative Date: 11 months, 2 weeks ago Absolute Date: Sat 29 Mar 2025 16:54 Selected Answer: - Upvotes: 1

For data privacy, removing data through a DLP (Data Loss Prevention) system is generally considered better than masking, as it permanently eliminates sensitive information, whereas masking only conceals it, potentially leaving traces or vulnerabilities

Comment 4

ID: 1224788 User: AlizCert Badges: - Relative Date: 1 year, 3 months ago Absolute Date: Thu 05 Dec 2024 18:04 Selected Answer: A Upvotes: 3

What made me decide on A instead of C was the "The data will be used to create consumer analyses" sentence. Having all the PIIs completely redacted from the records, we were unable to distinguish between the individual customers.

Comment 5

ID: 1121877 User: Matt_108 Badges: - Relative Date: 1 year, 8 months ago Absolute Date: Sat 13 Jul 2024 16:54 Selected Answer: A Upvotes: 1

Option A, agree with raaad explanation

Comment 6

ID: 1113664 User: scaenruy Badges: - Relative Date: 1 year, 8 months ago Absolute Date: Thu 04 Jul 2024 12:21 Selected Answer: A Upvotes: 2

A. Use Dataflow and the Cloud Data Loss Prevention API to mask sensitive data. Write the processed data in BigQuery.

7. PROFESSIONAL-DATA-ENGINEER Topic 1 Question 319

Sequence
93
Discussion ID
152515
Source URL
https://www.examtopics.com/discussions/google/view/152515-exam-professional-data-engineer-topic-1-question-319/
Posted By
HectorLeon2099
Posted At
Dec. 4, 2024, 5:49 p.m.

Question

You are preparing an organization-wide dataset. You need to preprocess customer data stored in a restricted bucket in Cloud Storage. The data will be used to create consumer analyses. You need to follow data privacy requirements, including protecting certain sensitive data elements, while also retaining all of the data for potential future use cases. What should you do?

  • A. Use the Cloud Data Loss Prevention API and Dataflow to detect and remove sensitive fields from the data in Cloud Storage. Write the filtered data in BigQuery.
  • B. Use customer-managed encryption keys (CMEK) to directly encrypt the data in Cloud Storage. Use federated queries from BigQuery. Share the encryption key by following the principle of least privilege.
  • C. Use Dataflow and the Cloud Data Loss Prevention API to mask sensitive data. Write the processed data in BigQuery.
  • D. Use Dataflow and Cloud KMS to encrypt sensitive fields and write the encrypted data in BigQuery. Share the encryption key by following the principle of least privilege.

Suggested Answer

C

Answer Description Click to expand


Community Answer Votes

Comments 4 comments Click to expand

Comment 1

ID: 1322022 User: HectorLeon2099 Badges: Highly Voted Relative Date: 1 year, 3 months ago Absolute Date: Wed 04 Dec 2024 17:49 Selected Answer: C Upvotes: 6

It's C. "A" removes data and retaining all is a requirement.

Comment 2

ID: 1571077 User: 22c1725 Badges: Most Recent Relative Date: 9 months, 3 weeks ago Absolute Date: Wed 21 May 2025 21:45 Selected Answer: C Upvotes: 1

Not A since "while also retaining all of the data" is required.

Comment 3

ID: 1571076 User: 22c1725 Badges: - Relative Date: 9 months, 3 weeks ago Absolute Date: Wed 21 May 2025 21:44 Selected Answer: C Upvotes: 1

Removing data will lead to unability to do study & consumer analyses. since it's likely all of consumer data is under PII.

Comment 4

ID: 1366248 User: Nagamanikanta Badges: - Relative Date: 1 year ago Absolute Date: Fri 07 Mar 2025 13:23 Selected Answer: C Upvotes: 1

option C
we can simply mask the data and process in biguery

8. PROFESSIONAL-DATA-ENGINEER Topic 1 Question 40

Sequence
111
Discussion ID
17076
Source URL
https://www.examtopics.com/discussions/google/view/17076-exam-professional-data-engineer-topic-1-question-40/
Posted By
-
Posted At
March 21, 2020, 4:25 a.m.

Question

MJTelco Case Study -

Company Overview -
MJTelco is a startup that plans to build networks in rapidly growing, underserved markets around the world. The company has patents for innovative optical communications hardware. Based on these patents, they can create many reliable, high-speed backbone links with inexpensive hardware.

Company Background -
Founded by experienced telecom executives, MJTelco uses technologies originally developed to overcome communications challenges in space. Fundamental to their operation, they need to create a distributed data infrastructure that drives real-time analysis and incorporates machine learning to continuously optimize their topologies. Because their hardware is inexpensive, they plan to overdeploy the network allowing them to account for the impact of dynamic regional politics on location availability and cost.
Their management and operations teams are situated all around the globe creating many-to-many relationship between data consumers and provides in their system. After careful consideration, they decided public cloud is the perfect environment to support their needs.

Solution Concept -
MJTelco is running a successful proof-of-concept (PoC) project in its labs. They have two primary needs:
✑ Scale and harden their PoC to support significantly more data flows generated when they ramp to more than 50,000 installations.
✑ Refine their machine-learning cycles to verify and improve the dynamic models they use to control topology definition.
MJTelco will also use three separate operating environments `" development/test, staging, and production `" to meet the needs of running experiments, deploying new features, and serving production customers.

Business Requirements -
✑ Scale up their production environment with minimal cost, instantiating resources when and where needed in an unpredictable, distributed telecom user community.
✑ Ensure security of their proprietary data to protect their leading-edge machine learning and analysis.
Provide reliable and timely access to data for analysis from distributed research workers
image
✑ Maintain isolated environments that support rapid iteration of their machine-learning models without affecting their customers.

Technical Requirements -
✑ Ensure secure and efficient transport and storage of telemetry data
✑ Rapidly scale instances to support between 10,000 and 100,000 data providers with multiple flows each.
✑ Allow analysis and presentation against data tables tracking up to 2 years of data storing approximately 100m records/day
✑ Support rapid iteration of monitoring infrastructure focused on awareness of data pipeline problems both in telemetry flows and in production learning cycles.

CEO Statement -
Our business model relies on our patents, analytics and dynamic machine learning. Our inexpensive hardware is organized to be highly reliable, which gives us cost advantages. We need to quickly stabilize our large distributed data pipelines to meet our reliability and capacity commitments.

CTO Statement -
Our public cloud services must operate as advertised. We need resources that scale and keep our data secure. We also need environments in which our data scientists can carefully study and quickly adapt our models. Because we rely on automation to process our data, we also need our development and test environments to work as we iterate.

CFO Statement -
The project is too large for us to maintain the hardware and software required for the data and analysis. Also, we cannot afford to staff an operations team to monitor so many data feeds, so we will rely on automation and infrastructure. Google Cloud's machine learning will allow our quantitative researchers to work on our high-value problems instead of problems with our data pipelines.
You create a new report for your large team in Google Data Studio 360. The report uses Google BigQuery as its data source. It is company policy to ensure employees can view only the data associated with their region, so you create and populate a table for each region. You need to enforce the regional access policy to the data.
Which two actions should you take? (Choose two.)

  • A. Ensure all the tables are included in global dataset.
  • B. Ensure each table is included in a dataset for a region.
  • C. Adjust the settings for each table to allow a related region-based security group view access.
  • D. Adjust the settings for each view to allow a related region-based security group view access.
  • E. Adjust the settings for each dataset to allow a related region-based security group view access.

Suggested Answer

BE

Answer Description Click to expand


Community Answer Votes

Comments 21 comments Click to expand

Comment 1

ID: 363543 User: shanjin14 Badges: Highly Voted Relative Date: 4 years, 9 months ago Absolute Date: Sat 22 May 2021 11:18 Selected Answer: - Upvotes: 36

C is correct starting 2020, as BigQuery come with table level access control
https://cloud.google.com/blog/products/data-analytics/introducing-table-level-access-controls-in-bigquery

Comment 1.1

ID: 428781 User: samstar4180 Badges: - Relative Date: 4 years, 6 months ago Absolute Date: Sat 21 Aug 2021 17:17 Selected Answer: - Upvotes: 11

Yes, the correct answer should be BC - since we can have table-level access and each region represents a table.

Comment 2

ID: 503396 User: hendrixlives Badges: Highly Voted Relative Date: 4 years, 2 months ago Absolute Date: Fri 17 Dec 2021 07:38 Selected Answer: BE Upvotes: 8

B/E: Even if now BQ offers table level access control, since the number of tables can be expected to be high, controlling access at the dataset level would ease operations. That is why I would still go for E instead of C.

Comment 2.1

ID: 530201 User: exnaniantwort Badges: - Relative Date: 4 years, 1 month ago Absolute Date: Sun 23 Jan 2022 03:01 Selected Answer: - Upvotes: 1

1 table for each region
you can just expect there is a limited number of regions
this number should be manageable
not E

Comment 2.1.1

ID: 580963 User: devric Badges: - Relative Date: 3 years, 11 months ago Absolute Date: Tue 05 Apr 2022 01:55 Selected Answer: - Upvotes: 1

But the option B means that you will generate a data set per region also. So the permissions can be assigned at the dataset level.

If you pick the option A also, then you should assign the permissions over tables but... what about views? You should assign permissions to them also.

That's why I pick B and E.

Comment 3

ID: 1400491 User: willyunger Badges: Most Recent Relative Date: 11 months, 4 weeks ago Absolute Date: Wed 19 Mar 2025 12:21 Selected Answer: BE Upvotes: 1

Less effort to assign access by dataset, not by table or view.

Comment 4

ID: 1259059 User: iooj Badges: - Relative Date: 1 year, 7 months ago Absolute Date: Wed 31 Jul 2024 21:09 Selected Answer: BE Upvotes: 1

C doesn't make sense, if you have already selected B

Comment 5

ID: 1214108 User: suwalsageen12 Badges: - Relative Date: 1 year, 9 months ago Absolute Date: Mon 20 May 2024 06:13 Selected Answer: BE Upvotes: 3

If I choose Option E, then Option C and D are eliminated because, Once you provide the access level in dataset for the use group, it applies for both Table and view.

Now for remaining options A and B.
The Question itself has stated to include the table region wise in dataset, So Option A is eliminated.
So, B and E are Correct answers.

Comment 6

ID: 1162341 User: niru12376 Badges: - Relative Date: 2 years ago Absolute Date: Thu 29 Feb 2024 09:28 Selected Answer: - Upvotes: 6

BE... If we've already split the tables into regions via datasets then why give regional access through tables again.. If not, then AC but definitely not BC.. Please help

Comment 7

ID: 1065315 User: rocky48 Badges: - Relative Date: 2 years, 4 months ago Absolute Date: Wed 08 Nov 2023 05:35 Selected Answer: - Upvotes: 3

Answer : BE

B. Ensure each table is included in a dataset for a region.
This means that you should organize your data in BigQuery into separate datasets, one for each region. Each dataset contains the tables specific to that region. This ensures that data is segregated by region.
E. Adjust the settings for each dataset to allow a related region-based security group view access.

Comment 8

ID: 1050802 User: rtcpost Badges: - Relative Date: 2 years, 4 months ago Absolute Date: Sun 22 Oct 2023 17:28 Selected Answer: BE Upvotes: 1

B. Ensure each table is included in a dataset for a region.

This means that you should organize your data in BigQuery into separate datasets, one for each region. Each dataset contains the tables specific to that region. This ensures that data is segregated by region.
E. Adjust the settings for each dataset to allow a related region-based security group view access.

You should set the access controls at the dataset level in BigQuery. This means configuring access permissions for each dataset based on regional security groups. This way, you can enforce the regional access policy to the data, ensuring that users from different regions can only access the data associated with their region.
Option A is not necessary because you don't need to include all the tables in a global dataset. Segregating data into region-specific datasets is a better approach for enforcing access controls.

Options C and D are not typical actions in BigQuery. Access control and permissions are usually managed at the dataset level, and you can grant access to specific groups at that level.

Comment 9

ID: 961337 User: Mathew106 Badges: - Relative Date: 2 years, 7 months ago Absolute Date: Mon 24 Jul 2023 10:36 Selected Answer: - Upvotes: 4

It's B and E or B and C. However, B and E makes some more sense because if you have one dataset for each region and they need to access the data for each region then why not allow them access to the whole dataset? What if you want to add other supplementary tables later? If you did that on a table level you would have to add access to every table separately.

Still, I think both are valid because we don't have any extra requirement, but B E makes more sense.

Comment 10

ID: 887781 User: Jarek7 Badges: - Relative Date: 2 years, 10 months ago Absolute Date: Tue 02 May 2023 20:16 Selected Answer: C Upvotes: 4

The intended answer was for sure BE. If C or D would be the right answers there is absolutely no reason to do B, right? Why should you put each table into separate dataset if you then set the accesss on table/view level? What is more the question is about tables not views, so I have no idea why would anybody take D.
The issue is that this question is out of date and now the right answer would be sole C.

Comment 11

ID: 879268 User: Oleksandr0501 Badges: - Relative Date: 2 years, 10 months ago Absolute Date: Mon 24 Apr 2023 13:38 Selected Answer: - Upvotes: 2

The two actions that should be taken are B and E.
B. Ensure each table is included in a dataset for a region: By creating separate datasets for each region and including only the tables associated with that region, you can enforce the regional access policy.

E. Adjust the settings for each dataset to allow a related region-based security group view access: By adjusting the settings for each dataset to allow only the related region-based security group view access, you can ensure that employees can only view data associated with their region.

A is incorrect because including all tables in a global dataset would not enforce the regional access policy.

C is incorrect because adjusting the settings for each table is not a scalable solution, especially as the number of tables grows.

D is incorrect because adjusting the settings for each view does not ensure that employees can only view data associated with their region.

Comment 12

ID: 870865 User: sjtesla Badges: - Relative Date: 2 years, 11 months ago Absolute Date: Sat 15 Apr 2023 13:17 Selected Answer: BC Upvotes: 2

B: Location is on dataset level: https://cloud.google.com/bigquery/docs/datasets#dataset_limitations
C: IAM can be set on table level

Comment 13

ID: 791742 User: Lestrang Badges: - Relative Date: 3 years, 1 month ago Absolute Date: Sun 29 Jan 2023 15:53 Selected Answer: - Upvotes: 3

Guys,
there are 2 possible combinations
If you think that each table represents a region, then they should all be in a global dataset and you should apply table access control to them.
So A+C

Otherwise you would put each table in a regional dataset, and apply access control to the dataset. Why would you create a dataset for the purpose of controlling regional access, and then only apply the controls to a table inside it? that is not extensible in the future.
Anyway create dataset+access control for dataset (B+E) is also valid.

Which to choose? I dont know.

Comment 14

ID: 788665 User: PolyMoe Badges: - Relative Date: 3 years, 1 month ago Absolute Date: Thu 26 Jan 2023 12:41 Selected Answer: BE Upvotes: 2

First put tables in region-dedicated dataset (B)
Then, ensure access control at dataset level (by creating region-based security groups) (E)

Comment 15

ID: 769018 User: korntewin Badges: - Relative Date: 3 years, 2 months ago Absolute Date: Sun 08 Jan 2023 02:45 Selected Answer: AC Upvotes: 3

I would vote for AC. As we already split the table for each region, why do we need to split the dataset per region? Furthermore, the access control will be provided to the users based on table level anyway.

Comment 15.1

ID: 769021 User: korntewin Badges: - Relative Date: 3 years, 2 months ago Absolute Date: Sun 08 Jan 2023 02:50 Selected Answer: - Upvotes: 1

Oh, the location should be specified in the dataset level! Then, the dataset should be splitted by region, my bad!

Comment 16

ID: 708406 User: MisuLava Badges: - Relative Date: 3 years, 4 months ago Absolute Date: Mon 31 Oct 2022 13:37 Selected Answer: BE Upvotes: 7

if you create table-level access control and grant it to different groups for different tables, what is the point of putting tables in different datasets and different regions?
So i choose BE

Comment 17

ID: 678794 User: svkds Badges: - Relative Date: 3 years, 5 months ago Absolute Date: Sun 25 Sep 2022 15:02 Selected Answer: BC Upvotes: 2

BigQuery come with table level access control. Since we can have table-level access and each region represents a table, B & C is correct answer.

9. PROFESSIONAL-DATA-ENGINEER Topic 1 Question 293

Sequence
126
Discussion ID
130301
Source URL
https://www.examtopics.com/discussions/google/view/130301-exam-professional-data-engineer-topic-1-question-293/
Posted By
scaenruy
Posted At
Jan. 4, 2024, 11:37 a.m.

Question

Your organization is modernizing their IT services and migrating to Google Cloud. You need to organize the data that will be stored in Cloud Storage and BigQuery. You need to enable a data mesh approach to share the data between sales, product design, and marketing departments. What should you do?

  • A. 1. Create a project for storage of the data for each of your departments.
    2. Enable each department to create Cloud Storage buckets and BigQuery datasets.
    3. Create user groups for authorized readers for each bucket and dataset.
    4. Enable the IT team to administer the user groups to add or remove users as the departments’ request.
  • B. 1. Create multiple projects for storage of the data for each of your departments’ applications.
    2. Enable each department to create Cloud Storage buckets and BigQuery datasets.
    3. Publish the data that each department shared in Analytics Hub.
    4. Enable all departments to discover and subscribe to the data they need in Analytics Hub.
  • C. 1. Create a project for storage of the data for your organization.
    2. Create a central Cloud Storage bucket with three folders to store the files for each department.
    3. Create a central BigQuery dataset with tables prefixed with the department name.
    4. Give viewer rights for the storage project for the users of your departments.
  • D. 1. Create multiple projects for storage of the data for each of your departments’ applications.
    2. Enable each department to create Cloud Storage buckets and BigQuery datasets.
    3. In Dataplex, map each department to a data lake and the Cloud Storage buckets, and map the BigQuery datasets to zones.
    4. Enable each department to own and share the data of their data lakes.

Suggested Answer

D

Answer Description Click to expand


Community Answer Votes

Comments 12 comments Click to expand

Comment 1

ID: 1119941 User: raaad Badges: Highly Voted Relative Date: 2 years, 2 months ago Absolute Date: Thu 11 Jan 2024 17:10 Selected Answer: D Upvotes: 11

- Decentralized ownership: Each department controls its data lake, aligning with the core principle of data ownership in a data mesh.
- Self-service data access: Departments can create and manage their own Cloud Storage buckets and BigQuery datasets within their data lakes, enabling self-service data access.
- Interdepartmental sharing: Dataplex facilitates data sharing by enabling departments to publish their data products from their data lakes, making it easily discoverable and usable by other departments.

Comment 2

ID: 1366592 User: daed09 Badges: Most Recent Relative Date: 1 year ago Absolute Date: Sat 08 Mar 2025 15:18 Selected Answer: - Upvotes: 1

It says we want to work with GCS data assets, thus Dataplex is a better option, so it's option D!

For those who says B as correct answer, I'd say that question states data assets in both GCS and BQ, however Analytics Hub is focused primarily for BQ assets.

Comment 3

ID: 1352379 User: plum21 Badges: - Relative Date: 1 year, 1 month ago Absolute Date: Thu 06 Feb 2025 13:43 Selected Answer: D Upvotes: 1

D because it looks like B is impossible due to the lack of GCS support in Analytics Hub

Comment 4

ID: 1337366 User: marlon.andrei Badges: - Relative Date: 1 year, 2 months ago Absolute Date: Mon 06 Jan 2025 22:23 Selected Answer: B Upvotes: 2

In "a data mesh approach to share the data between sales, product design, and marketing departments", Analytics Hub is the solution.

Comment 5

ID: 1259973 User: Nandababy Badges: - Relative Date: 1 year, 7 months ago Absolute Date: Fri 02 Aug 2024 19:37 Selected Answer: - Upvotes: 2

B is better option as organization is migrating to google cloud, that means teams doesnt have much hands on, analytical hub is more ease to use and solved the purpose as compared to dataplex were setup itself if very complex.

Comment 6

ID: 1252622 User: 987af6b Badges: - Relative Date: 1 year, 7 months ago Absolute Date: Sun 21 Jul 2024 19:18 Selected Answer: B Upvotes: 3

For a straightforward data mesh approach where the focus is on decentralizing data management while enabling easy data sharing and discovery, Analytics Hub is often the more appropriate choice due to its simplicity and directness. It facilitates the core objectives of a data mesh—decentralized data ownership and accessible data sharing—without the added complexity of managing data lakes and advanced governance features.

Comment 7

ID: 1193533 User: joao_01 Badges: - Relative Date: 1 year, 11 months ago Absolute Date: Thu 11 Apr 2024 09:34 Selected Answer: - Upvotes: 2

I think its B. I know since we are talking about Datamesh we want to go to the Dataplex service suddenly. However, in Dataplex a Lake can only have assets (bq tables etc) that are in the same project as the Dataplex service.

Example: There is bq table in project A and B. I want to to create a Lake in Dataplex in Project A that contains tables of project B. I can´t do that, i can only host tables of the Project A, since the Lake is in project A.

With this said, I think the best option is B, because the datamesh approach is related to "to share the data between sales, product design, and marketing departments". So the question is focusing only in the sharing part of the datamesh. Option B fits just fine.

Comment 7.1

ID: 1193536 User: joao_01 Badges: - Relative Date: 1 year, 11 months ago Absolute Date: Thu 11 Apr 2024 09:39 Selected Answer: - Upvotes: 3

I was wrong in my explanation guys. Look at this link:
https://cloud.google.com/dataplex/docs/add-zone

"A lake can include one or more zones. While a zone can only be part of one lake, it may contain assets that point to resources that are part of projects outside of its parent project."

So, option D seems good.

Comment 8

ID: 1155733 User: JyoGCP Badges: - Relative Date: 2 years ago Absolute Date: Wed 21 Feb 2024 18:12 Selected Answer: D Upvotes: 1

Option D

Comment 9

ID: 1120862 User: Matt_108 Badges: - Relative Date: 2 years, 2 months ago Absolute Date: Fri 12 Jan 2024 15:55 Selected Answer: D Upvotes: 2

that's pure data mesh, which is what dataplex has been built for

Comment 10

ID: 1118791 User: Sofiia98 Badges: - Relative Date: 2 years, 2 months ago Absolute Date: Wed 10 Jan 2024 17:02 Selected Answer: D Upvotes: 1

For me, Dataplex looks more logical

Comment 11

ID: 1115745 User: GCP001 Badges: - Relative Date: 2 years, 2 months ago Absolute Date: Sun 07 Jan 2024 11:59 Selected Answer: - Upvotes: 1

D. Dataplex looks more suitable for data mesh approach, Check the ref - https://cloud.google.com/dataplex/docs/introduction

10. PROFESSIONAL-DATA-ENGINEER Topic 1 Question 200

Sequence
130
Discussion ID
79650
Source URL
https://www.examtopics.com/discussions/google/view/79650-exam-professional-data-engineer-topic-1-question-200/
Posted By
ducc
Posted At
Sept. 3, 2022, 4:04 a.m.

Question

Government regulations in the banking industry mandate the protection of clients' personally identifiable information (PII). Your company requires PII to be access controlled, encrypted, and compliant with major data protection standards. In addition to using Cloud Data Loss Prevention (Cloud DLP), you want to follow
Google-recommended practices and use service accounts to control access to PII. What should you do?

  • A. Assign the required Identity and Access Management (IAM) roles to every employee, and create a single service account to access project resources.
  • B. Use one service account to access a Cloud SQL database, and use separate service accounts for each human user.
  • C. Use Cloud Storage to comply with major data protection standards. Use one service account shared by all users.
  • D. Use Cloud Storage to comply with major data protection standards. Use multiple service accounts attached to IAM groups to grant the appropriate access to each group.

Suggested Answer

D

Answer Description Click to expand


Community Answer Votes

Comments 25 comments Click to expand

Comment 1

ID: 735753 User: NicolasN Badges: Highly Voted Relative Date: 2 years, 9 months ago Absolute Date: Mon 05 Jun 2023 08:08 Selected Answer: A Upvotes: 18

✅[A] is the only acceptable answer.
❌[B] rejected (no need to elaborate)
❌[C] and [D] rejected. Why should we be obliged to use Cloud Storage? Other storage options in Google Cloud aren't compliant with "major data protection standards"?
=============================================
❗[D] has another rejection reason, the following quotes:
🔸From <https://cloud.google.com/iam/docs/service-accounts>: "You can add service accounts to a Google group, then grant roles to the group. However, adding service accounts to groups is not a best practice. Service accounts are used by applications, and each application is likely to have its own access requirements"
🔸From <https://cloud.google.com/iam/docs/best-practices-service-accounts#groups>: "Avoid using groups for granting service accounts access to resources"

Comment 1.1

ID: 931399 User: KC_go_reply Badges: - Relative Date: 2 years, 2 months ago Absolute Date: Sat 23 Dec 2023 12:10 Selected Answer: - Upvotes: 5

Rejecting C + D solely based on Cloud Storage, which CAN be used in this scenario, is not sound reasoning.

Comment 1.2

ID: 1102929 User: MaxNRG Badges: - Relative Date: 1 year, 8 months ago Absolute Date: Fri 21 Jun 2024 20:07 Selected Answer: - Upvotes: 3

A single shared service account or granting every employee direct access violates security best practices, so not [A].

Comment 2

ID: 785364 User: cetanx Badges: Highly Voted Relative Date: 2 years, 7 months ago Absolute Date: Sun 23 Jul 2023 13:08 Selected Answer: D Upvotes: 14

for A: please refer to this link below which suggests "Sharing a single service account across multiple applications can complicate the management of the service account" - meaning it's not a best practice.
https://cloud.google.com/iam/docs/best-practices-service-accounts#single-purpose
Also, what if we have hundreds of users, does it really make sense to manage each user's IAM individually?


for D: it's indeed not one of the best practices but I believe it's much more managable and better than A

Comment 3

ID: 1365009 User: skhaire Badges: Most Recent Relative Date: 1 year ago Absolute Date: Tue 04 Mar 2025 17:40 Selected Answer: D Upvotes: 1

Without Cloud storage, I believe just DLP does not provide encryption. DLP can redact or mask data, not encrypt it. Only on Cloud storage, encryption can be performed. So seems like option D is the closest choice, though service accounts should NOT be attached to IAM groups.

Comment 4

ID: 1102932 User: MaxNRG Badges: - Relative Date: 1 year, 8 months ago Absolute Date: Fri 21 Jun 2024 20:09 Selected Answer: D Upvotes: 2

To align with Google's recommended practices for managing access to personally identifiable information (PII) in compliance with banking industry regulations, let's analyze the options:

A. Assign the required IAM roles to every employee, and create a single service account to access project resources: While assigning specific IAM roles to employees is a good practice for access control, using a single service account for all access to PII is not ideal. Service accounts should be used for applications and automated processes, not as a shared account for multiple users or employees.

Comment 4.1

ID: 1102933 User: MaxNRG Badges: - Relative Date: 1 year, 8 months ago Absolute Date: Fri 21 Jun 2024 20:09 Selected Answer: - Upvotes: 1

B. Use one service account to access a Cloud SQL database, and use separate service accounts for each human user: Again, service accounts are intended for automated tasks or applications, not for individual human users. Assigning separate service accounts to each human user is not a recommended practice and does not align with the principle of least privilege.

Comment 4.1.1

ID: 1102934 User: MaxNRG Badges: - Relative Date: 1 year, 8 months ago Absolute Date: Fri 21 Jun 2024 20:09 Selected Answer: - Upvotes: 1

C. Use Cloud Storage to comply with major data protection standards. Use one service account shared by all users: Using Cloud Storage can indeed help comply with data protection standards, especially when configured correctly with encryption and access controls. However, sharing a single service account among all users is not a best practice. It goes against the principle of least privilege and does not provide adequate granularity for access control.

Comment 4.1.1.1

ID: 1102935 User: MaxNRG Badges: - Relative Date: 1 year, 8 months ago Absolute Date: Fri 21 Jun 2024 20:09 Selected Answer: - Upvotes: 1

D. Use Cloud Storage to comply with major data protection standards. Use multiple service accounts attached to IAM groups to grant the appropriate access to each group: This approach is more aligned with best practices. Using Cloud Storage can ensure compliance with data protection standards. Creating multiple service accounts, each with specific access controls attached to different IAM groups, allows for more granular and controlled access to PII. This setup adheres to the principle of least privilege, ensuring that each service (or group of services) only has access to the resources necessary for its function.

Based on these considerations, option D is the most appropriate choice. It ensures compliance with data protection standards, uses Cloud Storage for secure data management, and employs multiple service accounts tied to IAM groups for granular access control, aligning well with Google-recommended practices and regulatory requirements in the banking industry.

Comment 5

ID: 960809 User: vamgcp Badges: - Relative Date: 2 years, 1 month ago Absolute Date: Tue 23 Jan 2024 23:31 Selected Answer: D Upvotes: 2

Option D - Using multiple service accounts attached to IAM groups helps enforce the principle of least privilege. Each group can be assigned only the necessary permissions, reducing the risk of unauthorized access to sensitive data.

Comment 6

ID: 947777 User: MoeHaydar Badges: - Relative Date: 2 years, 2 months ago Absolute Date: Wed 10 Jan 2024 08:39 Selected Answer: D Upvotes: 2

Google Cloud Storage is designed to comply with major data protection standards. Creating multiple service accounts and attaching them to IAM groups provides granular control over who has access to the data. This approach is aligned with the principle of least privilege, a security best practice where a user is given the minimum levels of access necessary to complete their tasks.

Comment 7

ID: 931400 User: KC_go_reply Badges: - Relative Date: 2 years, 2 months ago Absolute Date: Sat 23 Dec 2023 12:11 Selected Answer: D Upvotes: 2

It´s not A because
1. assigning IAM roles to single users instead of groups is not Google best practice, and
2. the question explicitly states that we want to use multiple service accounts.

Comment 8

ID: 915570 User: Ender_H Badges: - Relative Date: 2 years, 3 months ago Absolute Date: Tue 05 Dec 2023 19:23 Selected Answer: D Upvotes: 2

D. Use Cloud Storage to comply with major data protection standards. Use multiple service accounts attached to IAM groups to grant the appropriate access to each group.

- Google Cloud Storage is built for secure and compliant data storage. It supports compliance with major data protection standards, which is essential in the banking industry where data protection regulations are stringent.
- Service accounts in Google Cloud represent non-human users (applications or services) that need to authenticate and be authorized to access specific Google Cloud resources.
- Creating multiple service accounts attached to IAM groups allows you to manage access control in a granular manner. This follows the principle of least privilege, providing each group with only the permissions they need to perform their tasks, which is a recommended practice for managing access to sensitive data like PII.

Comment 8.1

ID: 915571 User: Ender_H Badges: - Relative Date: 2 years, 3 months ago Absolute Date: Tue 05 Dec 2023 19:24 Selected Answer: - Upvotes: 1

❌ D. Use Cloud Storage to comply with major data protection standards. Use one service account shared by all users.

- Sharing one service account among all users is not a secure practice. It goes against the principle of least privilege and does not allow for granular control over access permissions. If the shared service account were to be compromised, all resources accessible by the account would be at risk.

Comment 9

ID: 847542 User: juliobs Badges: - Relative Date: 2 years, 5 months ago Absolute Date: Fri 22 Sep 2023 21:39 Selected Answer: - Upvotes: 6

Why are so many questions like this?
None of the answers is best practice.

Comment 9.1

ID: 1083593 User: alfguemat Badges: - Relative Date: 1 year, 9 months ago Absolute Date: Wed 29 May 2024 15:05 Selected Answer: - Upvotes: 1

I would like to ask your question to those who decide the questions on the exams. I don't understand what they're trying to do, many of the questions cause divided responses, because they don't have a clear answer. The certification process is a waste of time.

Comment 10

ID: 838910 User: SuperVee Badges: - Relative Date: 2 years, 5 months ago Absolute Date: Thu 14 Sep 2023 13:58 Selected Answer: D Upvotes: 5

I could be wrong but I think the wording in D caused this confusion, so it is an English problem. -- "Use multiple service accounts attached to IAM groups to grant the appropriate access to each group"
I believe what D really means is that you can create a group for a bunch of people who only need access to resource A, so attach a Service account to the group and service account only have access to A.
Then you create another group for another bunch of people who only need access to resource B, so attach a service account to this group. this service account can only access to B.
So each group/service account has a very specific access target, and purpose of the group is very narrowly defined which is allowed by best practice. However, wording in option D merged all these into one sentence causing confusions.
Option A is an administrative nightmare to manage IAM for a larger user population which is actually also against GCP best practices.

Comment 11

ID: 808233 User: Aamir185 Badges: - Relative Date: 2 years, 7 months ago Absolute Date: Mon 14 Aug 2023 09:10 Selected Answer: D Upvotes: 2

D it is

Comment 12

ID: 763426 User: AzureDP900 Badges: - Relative Date: 2 years, 8 months ago Absolute Date: Sun 02 Jul 2023 01:08 Selected Answer: - Upvotes: 1

D is right

Comment 13

ID: 754895 User: Amar2022 Badges: - Relative Date: 2 years, 8 months ago Absolute Date: Sat 24 Jun 2023 13:30 Selected Answer: A Upvotes: 1

A is the correct one

Comment 14

ID: 753213 User: jkhong Badges: - Relative Date: 2 years, 8 months ago Absolute Date: Thu 22 Jun 2023 11:13 Selected Answer: A Upvotes: 1

Agree with NicolasN, D is bad practice. For D this may result in permission creep, where a group is granted access to an increasing number of resources. Only grant service accounts specific access to resources.

Comment 15

ID: 752644 User: odacir Badges: - Relative Date: 2 years, 8 months ago Absolute Date: Wed 21 Jun 2023 18:38 Selected Answer: A Upvotes: 1

A is the answer, as NicalsN says.
https://cloud.google.com/iam/docs/service-accounts#groups

Comment 16

ID: 747096 User: Andrix2405 Badges: - Relative Date: 2 years, 8 months ago Absolute Date: Fri 16 Jun 2023 11:19 Selected Answer: A Upvotes: 2

Avoid using groups for granting service accounts access to resources -> D

Comment 16.1

ID: 747097 User: Andrix2405 Badges: - Relative Date: 2 years, 8 months ago Absolute Date: Fri 16 Jun 2023 11:20 Selected Answer: - Upvotes: 1

Sorry A

Comment 17

ID: 723993 User: hauhau Badges: - Relative Date: 2 years, 9 months ago Absolute Date: Sun 21 May 2023 23:57 Selected Answer: - Upvotes: 1

Why GCS?

11. PROFESSIONAL-DATA-ENGINEER Topic 1 Question 232

Sequence
151
Discussion ID
130175
Source URL
https://www.examtopics.com/discussions/google/view/130175-exam-professional-data-engineer-topic-1-question-232/
Posted By
scaenruy
Posted At
Jan. 3, 2024, 12:47 p.m.

Question

You are on the data governance team and are implementing security requirements to deploy resources. You need to ensure that resources are limited to only the europe-west3 region. You want to follow Google-recommended practices.

What should you do?

  • A. Set the constraints/gcp.resourceLocations organization policy constraint to in:europe-west3-locations.
  • B. Deploy resources with Terraform and implement a variable validation rule to ensure that the region is set to the europe-west3 region for all resources.
  • C. Set the constraints/gcp.resourceLocations organization policy constraint to in:eu-locations.
  • D. Create a Cloud Function to monitor all resources created and automatically destroy the ones created outside the europe-west3 region.

Suggested Answer

A

Answer Description Click to expand


Community Answer Votes

Comments 8 comments Click to expand

Comment 1

ID: 1113862 User: raaad Badges: Highly Voted Relative Date: 2 years, 2 months ago Absolute Date: Thu 04 Jan 2024 17:17 Selected Answer: A Upvotes: 11

- The constraints/gcp.resourceLocations organization policy constraint is used to define where resources in the organization can be created.
- Setting it to in:europe-west3-locations would specify that resources can only be created in the europe-west3 region.

Comment 1.1

ID: 1177898 User: SanjeevRoy91 Badges: - Relative Date: 1 year, 11 months ago Absolute Date: Wed 20 Mar 2024 04:37 Selected Answer: - Upvotes: 3

I am new to this forum. In almost all the questions, the reveal solution is different than the once's discussed here??

Comment 2

ID: 1337524 User: b3e59c2 Badges: Most Recent Relative Date: 1 year, 2 months ago Absolute Date: Tue 07 Jan 2025 11:36 Selected Answer: A Upvotes: 2

https://cloud.google.com/resource-manager/docs/organization-policy/defining-locations#location_types

Comment 3

ID: 1287733 User: chrissamharris Badges: - Relative Date: 1 year, 5 months ago Absolute Date: Sun 22 Sep 2024 15:38 Selected Answer: - Upvotes: 1

B, D
B - Increase the Cloud Composer 2 environment size from medium to large.
Increasing the environment size will provide more resources (including memory) to the entire environment, which should help mitigate memory usage issues. This will also support scaling if the jobs demand more resources.
D. Increase the memory available to the Airflow workers.
Increasing memory for Airflow workers will directly address the memory usage issue that's causing the pod evictions. By allocating more memory to the workers, they can handle larger tasks or more intensive workloads without failing due to memory constraints.

Comment 3.1

ID: 1313322 User: hrishi19 Badges: - Relative Date: 1 year, 3 months ago Absolute Date: Sun 17 Nov 2024 01:56 Selected Answer: - Upvotes: 2

this comment is for previous question, not this one.

Comment 4

ID: 1154425 User: JyoGCP Badges: - Relative Date: 2 years ago Absolute Date: Tue 20 Feb 2024 03:28 Selected Answer: A Upvotes: 1

Option A

Comment 5

ID: 1121553 User: Matt_108 Badges: - Relative Date: 2 years, 1 month ago Absolute Date: Sat 13 Jan 2024 12:21 Selected Answer: A Upvotes: 2

Option A

Comment 6

ID: 1112718 User: scaenruy Badges: - Relative Date: 2 years, 2 months ago Absolute Date: Wed 03 Jan 2024 12:47 Selected Answer: A Upvotes: 2

Set the constraints/gcp.resourceLocations organization policy constraint to in:europe-west3-locations.

12. PROFESSIONAL-DATA-ENGINEER Topic 1 Question 210

Sequence
159
Discussion ID
129857
Source URL
https://www.examtopics.com/discussions/google/view/129857-exam-professional-data-engineer-topic-1-question-210/
Posted By
e70ea9e
Posted At
Dec. 30, 2023, 9:33 a.m.

Question

You are designing a data mesh on Google Cloud with multiple distinct data engineering teams building data products. The typical data curation design pattern consists of landing files in Cloud Storage, transforming raw data in Cloud Storage and BigQuery datasets, and storing the final curated data product in BigQuery datasets. You need to configure Dataplex to ensure that each team can access only the assets needed to build their data products. You also need to ensure that teams can easily share the curated data product. What should you do?

  • A. 1. Create a single Dataplex virtual lake and create a single zone to contain landing, raw, and curated data.
    2. Provide each data engineering team access to the virtual lake.
  • B. 1. Create a single Dataplex virtual lake and create a single zone to contain landing, raw, and curated data.
    2. Build separate assets for each data product within the zone.
    3. Assign permissions to the data engineering teams at the zone level.
  • C. 1. Create a Dataplex virtual lake for each data product, and create a single zone to contain landing, raw, and curated data.
    2. Provide the data engineering teams with full access to the virtual lake assigned to their data product.
  • D. 1. Create a Dataplex virtual lake for each data product, and create multiple zones for landing, raw, and curated data.
    2. Provide the data engineering teams with full access to the virtual lake assigned to their data product.

Suggested Answer

D

Answer Description Click to expand


Community Answer Votes

Comments 15 comments Click to expand

Comment 1

ID: 1333715 User: f74ca0c Badges: - Relative Date: 1 year, 2 months ago Absolute Date: Sun 29 Dec 2024 21:38 Selected Answer: D Upvotes: 1

Create a Dataplex lake that acts as the domain for your data mesh.
Add zones to your lake that represents individual teams within each domain and provide managed data contracts.
Attach assets that map to data stored in Cloud Storage.
https://cloud.google.com/transfer-appliance/docs/4.0/overview?_gl=1*8pbq1*_up*MQ..&gclid=CjwKCAiAg8S7BhATEiwAO2-R6gOYtlc2FJa7zE4lhz3-2f00x9F3hwgul9lYjfJs2cAprxOIeXq_NhoCw-8QAvD_BwE&gclsrc=aw.ds

Comment 2

ID: 1304176 User: SamuelTsch Badges: - Relative Date: 1 year, 4 months ago Absolute Date: Mon 28 Oct 2024 21:39 Selected Answer: D Upvotes: 1

just like MaxNRG said

Comment 3

ID: 1151149 User: JyoGCP Badges: - Relative Date: 2 years ago Absolute Date: Thu 15 Feb 2024 18:22 Selected Answer: D Upvotes: 1

Answer D

Comment 4

ID: 1123200 User: datapassionate Badges: - Relative Date: 2 years, 1 month ago Absolute Date: Mon 15 Jan 2024 09:58 Selected Answer: D Upvotes: 1

D. 1. Create a Dataplex virtual lake for each data product, and create multiple zones for landing, raw, and curated data.
2. Provide the data engineering teams with full access to the virtual lake assigned to their data product.

Lake: A logical construct representing a data domain or business unit. For example, to organize data based on group usage, you can set up a lake for each department (for example, Retail, Sales, Finance).
Zone: A subdomain within a lake, which is useful to categorize data by the following:
Stage: For example, landing, raw, curated data analytics, and curated data science.

Comment 4.1

ID: 1123202 User: datapassionate Badges: - Relative Date: 2 years, 1 month ago Absolute Date: Mon 15 Jan 2024 09:58 Selected Answer: - Upvotes: 1

https://cloud.google.com/dataplex/docs/introduction

Comment 5

ID: 1121418 User: Matt_108 Badges: - Relative Date: 2 years, 1 month ago Absolute Date: Sat 13 Jan 2024 09:32 Selected Answer: D Upvotes: 1

D: 1 virtual lake per Data Product (which stands for domain basically), zones to split data by "status". Each Data Eng team can access their own data exclusively and in a data mesh compliant way

Comment 6

ID: 1116050 User: MaxNRG Badges: - Relative Date: 2 years, 2 months ago Absolute Date: Sun 07 Jan 2024 18:59 Selected Answer: D Upvotes: 4

The best approach is to create a Dataplex virtual lake for each data product, with multiple zones for landing, raw, and curated data. Then provide the data engineering teams with access only to the zones they need within the virtual lake assigned to their product.

To enable teams to easily share curated data products, you should use cross-lake sharing in Dataplex. This allows curated zones to be shared across virtual lakes while maintaining data isolation for other zones.

Comment 6.1

ID: 1116051 User: MaxNRG Badges: - Relative Date: 2 years, 2 months ago Absolute Date: Sun 07 Jan 2024 19:00 Selected Answer: - Upvotes: 3

So the steps would be:
1. Create a Dataplex virtual lake for each data product.
2. Within each lake, create separate zones for landing, raw, and curated data.
3. Provide each data engineering team with access only to the zones they need within their assigned virtual lake.
4. Configure cross-lake sharing on the curated data zones to share curated data products between teams.
This provides isolation and access control between teams for raw data while enabling easy sharing of curated data products.
https://cloud.google.com/dataplex/docs/introduction#a_domain-centric_data_mesh

Comment 7

ID: 1115666 User: Smakyel79 Badges: - Relative Date: 2 years, 2 months ago Absolute Date: Sun 07 Jan 2024 10:10 Selected Answer: - Upvotes: 2

I believe the answer is B, but there is a misspelling in the answer, should say "create multiple zones"

Comment 8

ID: 1115540 User: Helinia Badges: - Relative Date: 2 years, 2 months ago Absolute Date: Sun 07 Jan 2024 02:26 Selected Answer: D Upvotes: 1

Each lake should be created per data product since data product sounds like a domain in this question.

Since we have landing, raw, curated data, we should create different zones.

"Zones are of two types: raw and curated.

Raw zone: Contains data that is in its raw format and not subject to strict type-checking.

Curated zone: Contains data that is cleaned, formatted, and ready for analytics. The data is columnar, Hive-partitioned, and stored in Parquet, Avro, Orc files, or BigQuery tables. Data undergoes type-checking- for example, to prohibit the use of CSV files because they don't perform as well for SQL access."

Ref: https://cloud.google.com/dataplex/docs/introduction#terminology

Comment 9

ID: 1114674 User: Jordan18 Badges: - Relative Date: 2 years, 2 months ago Absolute Date: Fri 05 Jan 2024 17:58 Selected Answer: - Upvotes: 4

why not B?

Comment 10

ID: 1114368 User: Sofiia98 Badges: - Relative Date: 2 years, 2 months ago Absolute Date: Fri 05 Jan 2024 09:49 Selected Answer: - Upvotes: 3

Why not B?

Comment 10.1

ID: 1124852 User: tibuenoc Badges: - Relative Date: 2 years, 1 month ago Absolute Date: Wed 17 Jan 2024 10:31 Selected Answer: - Upvotes: 2

Because it's the best practice is separated zones. One zone for landing, raw and curated.

The answer B - has this part that excluded it "create a single zone to contain landing"

The correct awser is D

Comment 11

ID: 1112405 User: Ed_Kim Badges: - Relative Date: 2 years, 2 months ago Absolute Date: Wed 03 Jan 2024 02:38 Selected Answer: D Upvotes: 2

The answer is D

Comment 12

ID: 1109533 User: e70ea9e Badges: - Relative Date: 2 years, 2 months ago Absolute Date: Sat 30 Dec 2023 09:33 Selected Answer: C Upvotes: 2

Virtual Lake per Data Product: Each virtual lake acts as a self-contained domain for a specific data product, aligning with the data mesh principle of decentralized ownership and responsibility.
Team Autonomy: Teams have full control over their virtual lake, enabling independent development, management, and sharing of their data products.

13. PROFESSIONAL-DATA-ENGINEER Topic 1 Question 168

Sequence
184
Discussion ID
79494
Source URL
https://www.examtopics.com/discussions/google/view/79494-exam-professional-data-engineer-topic-1-question-168/
Posted By
AWSandeep
Posted At
Sept. 2, 2022, 7:01 p.m.

Question

You work for a financial institution that lets customers register online. As new customers register, their user data is sent to Pub/Sub before being ingested into
BigQuery. For security reasons, you decide to redact your customers' Government issued Identification Number while allowing customer service representatives to view the original values when necessary. What should you do?

  • A. Use BigQuery's built-in AEAD encryption to encrypt the SSN column. Save the keys to a new table that is only viewable by permissioned users.
  • B. Use BigQuery column-level security. Set the table permissions so that only members of the Customer Service user group can see the SSN column.
  • C. Before loading the data into BigQuery, use Cloud Data Loss Prevention (DLP) to replace input values with a cryptographic hash.
  • D. Before loading the data into BigQuery, use Cloud Data Loss Prevention (DLP) to replace input values with a cryptographic format-preserving encryption token.

Suggested Answer

D

Answer Description Click to expand


Community Answer Votes

Comments 26 comments Click to expand

Comment 1

ID: 657631 User: AWSandeep Badges: Highly Voted Relative Date: 3 years, 6 months ago Absolute Date: Fri 02 Sep 2022 19:01 Selected Answer: B Upvotes: 12

B. While C and D are intriguing, they don't specify how to enable customer service representatives to receive access to the encryption token.

Comment 1.1

ID: 1320383 User: cloud_rider Badges: - Relative Date: 1 year, 3 months ago Absolute Date: Sat 30 Nov 2024 22:54 Selected Answer: - Upvotes: 1

B will show the values to the customer support service all the time as they have access to it, so no redaction as per the ask. Another thing is the requirement is to view when necessary, so D fits this requirement and format preserving encryption can be reverted when necessary.

Comment 1.2

ID: 1100935 User: MaxNRG Badges: - Relative Date: 2 years, 2 months ago Absolute Date: Tue 19 Dec 2023 20:25 Selected Answer: - Upvotes: 1

B. BigQuery column-level security:

Pros: Granular control over column access, ensures only authorized users see the SSN column.
Cons: Doesn't truly redact the data. The SSN values are still stored in BigQuery, even if hidden from unauthorized users. A potential security breach could expose them.

Comment 1.3

ID: 1053642 User: ffggrre Badges: - Relative Date: 2 years, 4 months ago Absolute Date: Wed 25 Oct 2023 12:48 Selected Answer: - Upvotes: 1

there is no SSN in question, it can be any ID.

Comment 2

ID: 967960 User: Lanro Badges: Highly Voted Relative Date: 2 years, 7 months ago Absolute Date: Mon 31 Jul 2023 12:08 Selected Answer: D Upvotes: 7

I don't see why we should use DLP since we know exactly the column that should be locked or encrypted. On the other hand having a cryptographic representation of SSN helps to aggregate/analyse entries. So I will vote for D, but B is much more easy to implement. Garbage question indeed.

Comment 3

ID: 1303885 User: SamuelTsch Badges: Most Recent Relative Date: 1 year, 4 months ago Absolute Date: Mon 28 Oct 2024 09:42 Selected Answer: D Upvotes: 2

In the question, there is no mention of SSN column.

Comment 3.1

ID: 1303886 User: SamuelTsch Badges: - Relative Date: 1 year, 4 months ago Absolute Date: Mon 28 Oct 2024 09:44 Selected Answer: - Upvotes: 1

also, in the question, "you decide to REDACT ...". Option B does not redact the values.

Comment 4

ID: 1299177 User: MohaSa1 Badges: - Relative Date: 1 year, 4 months ago Absolute Date: Thu 17 Oct 2024 13:14 Selected Answer: D Upvotes: 2

Authorized users can decrypt the FPE tokens back to the original GIINs, D is the best option.

Comment 5

ID: 1295053 User: baimus Badges: - Relative Date: 1 year, 5 months ago Absolute Date: Wed 09 Oct 2024 10:55 Selected Answer: D Upvotes: 2

D (FPE) does indeed allow encryption to be reversed if desired, allowing operatives to review the original key. This makes it preferable to B, as it's also more secure.

Comment 6

ID: 1226851 User: Topg4u Badges: - Relative Date: 1 year, 9 months ago Absolute Date: Sat 08 Jun 2024 19:22 Selected Answer: - Upvotes: 2

D:
SSN is only tied to USA not in any other countries, The question did not mention SSN.

Comment 7

ID: 1100933 User: MaxNRG Badges: - Relative Date: 2 years, 2 months ago Absolute Date: Tue 19 Dec 2023 20:24 Selected Answer: D Upvotes: 3

The best option is D - Before loading the data into BigQuery, use Cloud Data Loss Prevention (DLP) to replace input values with a cryptographic format-preserving encryption token.

The key reasons are:

DLP allows redacting sensitive PII like SSNs before loading into BigQuery. This provides security by default for the raw SSN values.
Using format-preserving encryption keeps the column format intact while still encrypting, allowing business logic relying on SSN format to continue functioning.
The encrypted tokens can be reversed to view original SSNs when required, meeting the access requirement for customer service reps.

Comment 7.1

ID: 1100934 User: MaxNRG Badges: - Relative Date: 2 years, 2 months ago Absolute Date: Tue 19 Dec 2023 20:24 Selected Answer: - Upvotes: 1

Option A does encrypt SSN but requires managing keys separately.

Option B relies on complex IAM policy changes instead of encrypting by default.

Option C hashes irreversibly, preventing customer service reps from viewing original SSNs when required.

Therefore, using DLP format-preserving encryption before BigQuery ingestion balances both security and analytics requirements for SSN data.

Comment 7.1.1

ID: 1100936 User: MaxNRG Badges: - Relative Date: 2 years, 2 months ago Absolute Date: Tue 19 Dec 2023 20:26 Selected Answer: - Upvotes: 2

Why not B. BigQuery column-level security:
Doesn't truly redact the data. The SSN values are still stored in BigQuery, even if hidden from unauthorized users. A potential security breach could expose them.

Comment 8

ID: 1096286 User: Aman47 Badges: - Relative Date: 2 years, 2 months ago Absolute Date: Thu 14 Dec 2023 11:06 Selected Answer: D Upvotes: 3

Even if you provide Column level access control, The Data Owners or other hierarchies above it will also be able to view very sensitive data. Better to just use encryption and decryption. As this data can also never be used for any analytic workloads

Comment 9

ID: 1066926 User: spicebits Badges: - Relative Date: 2 years, 4 months ago Absolute Date: Fri 10 Nov 2023 04:37 Selected Answer: D Upvotes: 3

Answer has to be D. Question says "you decide to redact your customers' Government issued Identification Number while allowing customer service representatives to view the original values when necessary"... Redact... view the original values... D is the only choice.

Comment 10

ID: 1059471 User: Nirca Badges: - Relative Date: 2 years, 4 months ago Absolute Date: Wed 01 Nov 2023 09:41 Selected Answer: B Upvotes: 1

It might not be D!
Since - only the Frame is kept. the data will be changed.
Format Preserving Encryption (FPE), endorsed by NIST, is an advanced encryption technique that transforms data into an encrypted format while preserving its original structure. For instance, a 16-digit credit card number encrypted with FPE will still be a 16-digit number

Comment 10.1

ID: 1109727 User: Helinia Badges: - Relative Date: 2 years, 2 months ago Absolute Date: Sat 30 Dec 2023 13:32 Selected Answer: - Upvotes: 1

No, the value using FPE can be decrypted with key.
"Encrypted values can be re-identified using the original cryptographic key and the entire output value, including surrogate annotation."

https://cloud.google.com/dlp/docs/pseudonymization#supported-methods

Comment 11

ID: 1046741 User: ffggrre Badges: - Relative Date: 2 years, 4 months ago Absolute Date: Wed 18 Oct 2023 10:16 Selected Answer: B Upvotes: 1

Customer service needs to see the original value, not possible with other options.

Comment 12

ID: 1024669 User: kcl10 Badges: - Relative Date: 2 years, 5 months ago Absolute Date: Wed 04 Oct 2023 13:01 Selected Answer: B Upvotes: 1

of course B

Comment 13

ID: 1012322 User: ckanaar Badges: - Relative Date: 2 years, 5 months ago Absolute Date: Wed 20 Sep 2023 14:39 Selected Answer: D Upvotes: 3

I believe the crux to the question is that the cryptographic format-preserving encryption token is re-identifiable, whereas the cryptographic hash is not: https://cloud.google.com/dlp/docs/transformations-reference

Therefore, customer service can view the original values when necessary in case of D.

Comment 13.1

ID: 1013141 User: ckanaar Badges: - Relative Date: 2 years, 5 months ago Absolute Date: Thu 21 Sep 2023 15:55 Selected Answer: - Upvotes: 2

Nevermind, this can actually also be done in the case of answer B. They are both correct, just different implementations. No idea

Comment 14

ID: 964328 User: knith66 Badges: - Relative Date: 2 years, 7 months ago Absolute Date: Thu 27 Jul 2023 06:33 Selected Answer: - Upvotes: 2

the question mentions that "user data is sent to Pub/Sub before being ingested" instead of just saying data goes to big query through pub/sub. So some alteration is expected before being injected into the big query. So option D should work.

Comment 15

ID: 960722 User: sr25 Badges: - Relative Date: 2 years, 7 months ago Absolute Date: Sun 23 Jul 2023 20:22 Selected Answer: D Upvotes: 2

D. The question says giving CSR's access to values "when necessary" - not default access like given in B. D is a better option using the token.

Comment 16

ID: 947796 User: ZZHZZH Badges: - Relative Date: 2 years, 8 months ago Absolute Date: Mon 10 Jul 2023 08:08 Selected Answer: B Upvotes: 1

One of the key requirement is to be able to let authorized personel see the ID. D doesn't specify that.

Comment 17

ID: 897680 User: vaga1 Badges: - Relative Date: 2 years, 10 months ago Absolute Date: Sun 14 May 2023 17:52 Selected Answer: D Upvotes: 2

The answer is between B and D as well described in many comments.

I personally do not see any reason to keep the information available using a token or a mask. It is not a PAN card number, it's just a personal ID. It should not be useful for analytical purposes.

I'm gonna go for D then

Comment 17.1

ID: 897681 User: vaga1 Badges: - Relative Date: 2 years, 10 months ago Absolute Date: Sun 14 May 2023 17:52 Selected Answer: - Upvotes: 1

sorry B

14. PROFESSIONAL-DATA-ENGINEER Topic 1 Question 247

Sequence
230
Discussion ID
130189
Source URL
https://www.examtopics.com/discussions/google/view/130189-exam-professional-data-engineer-topic-1-question-247/
Posted By
scaenruy
Posted At
Jan. 3, 2024, 2:29 p.m.

Question

Your company operates in three domains: airlines, hotels, and ride-hailing services. Each domain has two teams: analytics and data science, which create data assets in BigQuery with the help of a central data platform team. However, as each domain is evolving rapidly, the central data platform team is becoming a bottleneck. This is causing delays in deriving insights from data, and resulting in stale data when pipelines are not kept up to date. You need to design a data mesh architecture by using Dataplex to eliminate the bottleneck. What should you do?

  • A. 1. Create one lake for each team. Inside each lake, create one zone for each domain.
    2. Attach each of the BigQuery datasets created by the individual teams as assets to the respective zone.
    3. Have the central data platform team manage all zones’ data assets.
  • B. 1. Create one lake for each team. Inside each lake, create one zone for each domain.
    2. Attach each of the BigQuery datasets created by the individual teams as assets to the respective zone.
    3. Direct each domain to manage their own zone’s data assets.
  • C. 1. Create one lake for each domain. Inside each lake, create one zone for each team.
    2. Attach each of the BigQuery datasets created by the individual teams as assets to the respective zone.
    3. Direct each domain to manage their own lake’s data assets.
  • D. 1. Create one lake for each domain. Inside each lake, create one zone for each team.
    2. Attach each of the BigQuery datasets created by the individual teams as assets to the respective zone.
    3. Have the central data platform team manage all lakes’ data assets.

Suggested Answer

C

Answer Description Click to expand


Community Answer Votes

Comments 7 comments Click to expand

Comment 1

ID: 1114117 User: raaad Badges: Highly Voted Relative Date: 1 year, 8 months ago Absolute Date: Thu 04 Jul 2024 23:15 Selected Answer: C Upvotes: 7

- each domain should manage their own lake’s data assets

Comment 1.1

ID: 1123942 User: AllenChen123 Badges: - Relative Date: 1 year, 7 months ago Absolute Date: Tue 16 Jul 2024 07:22 Selected Answer: - Upvotes: 4

Agree. https://cloud.google.com/dataplex/docs/introduction#a_domain-centric_data_mesh

Comment 2

ID: 1178675 User: hanoverquay Badges: Most Recent Relative Date: 1 year, 5 months ago Absolute Date: Fri 20 Sep 2024 20:25 Selected Answer: C Upvotes: 1

vote C

Comment 3

ID: 1154469 User: JyoGCP Badges: - Relative Date: 1 year, 6 months ago Absolute Date: Tue 20 Aug 2024 03:54 Selected Answer: C Upvotes: 1

Option C

Comment 4

ID: 1121693 User: Matt_108 Badges: - Relative Date: 1 year, 8 months ago Absolute Date: Sat 13 Jul 2024 13:44 Selected Answer: C Upvotes: 4

Option C - create a lake for each domain, each team manages its own assets

Comment 5

ID: 1119377 User: task_7 Badges: - Relative Date: 1 year, 8 months ago Absolute Date: Thu 11 Jul 2024 05:23 Selected Answer: B Upvotes: 2

Separate lakes for each team
Zones within each lake dedicated to different domains

Comment 6

ID: 1112787 User: scaenruy Badges: - Relative Date: 1 year, 8 months ago Absolute Date: Wed 03 Jul 2024 13:29 Selected Answer: C Upvotes: 1

C.
1. Create one lake for each domain. Inside each lake, create one zone for each team.
2. Attach each of the BigQuery datasets created by the individual teams as assets to the respective zone.
3. Direct each domain to manage their own lake’s data assets.

15. PROFESSIONAL-DATA-ENGINEER Topic 1 Question 297

Sequence
251
Discussion ID
130313
Source URL
https://www.examtopics.com/discussions/google/view/130313-exam-professional-data-engineer-topic-1-question-297/
Posted By
scaenruy
Posted At
Jan. 4, 2024, 12:15 p.m.

Question

You migrated your on-premises Apache Hadoop Distributed File System (HDFS) data lake to Cloud Storage. The data scientist team needs to process the data by using Apache Spark and SQL. Security policies need to be enforced at the column level. You need a cost-effective solution that can scale into a data mesh. What should you do?

  • A. 1. Deploy a long-living Dataproc cluster with Apache Hive and Ranger enabled.
    2. Configure Ranger for column level security.
    3. Process with Dataproc Spark or Hive SQL.
  • B. 1. Define a BigLake table.
    2. Create a taxonomy of policy tags in Data Catalog.
    3. Add policy tags to columns.
    4. Process with the Spark-BigQuery connector or BigQuery SQL.
  • C. 1. Load the data to BigQuery tables.
    2. Create a taxonomy of policy tags in Data Catalog.
    3. Add policy tags to columns.
    4. Process with the Spark-BigQuery connector or BigQuery SQL.
  • D. 1. Apply an Identity and Access Management (IAM) policy at the file level in Cloud Storage.
    2. Define a BigQuery external table for SQL processing.
    3. Use Dataproc Spark to process the Cloud Storage files.

Suggested Answer

B

Answer Description Click to expand


Community Answer Votes

Comments 6 comments Click to expand

Comment 1

ID: 1119989 User: raaad Badges: Highly Voted Relative Date: 1 year, 8 months ago Absolute Date: Thu 11 Jul 2024 17:03 Selected Answer: B Upvotes: 18

- BigLake Integration: BigLake allows you to define tables on top of data in Cloud Storage, providing a bridge between data lake storage and BigQuery's powerful analytics capabilities. This approach is cost-effective and scalable.
- Data Catalog for Governance: Creating a taxonomy of policy tags in Google Cloud's Data Catalog and applying these tags to specific columns in your BigLake tables enables fine-grained, column-level access control.
- Processing with Spark and SQL: The Spark-BigQuery connector allows data scientists to process data using Apache Spark directly against BigQuery (and BigLake tables). This supports both Spark and SQL processing needs.
- Scalability into a Data Mesh: BigLake and Data Catalog are designed to scale and support the data mesh architecture, which involves decentralized data ownership and governance.

Comment 2

ID: 1155776 User: JyoGCP Badges: Most Recent Relative Date: 1 year, 6 months ago Absolute Date: Wed 21 Aug 2024 18:19 Selected Answer: B Upvotes: 1

Going with 'B' based on the comments

Comment 3

ID: 1121926 User: Matt_108 Badges: - Relative Date: 1 year, 8 months ago Absolute Date: Sat 13 Jul 2024 17:41 Selected Answer: B Upvotes: 1

Option B, agree with comments explanation

Comment 4

ID: 1116291 User: Jordan18 Badges: - Relative Date: 1 year, 8 months ago Absolute Date: Mon 08 Jul 2024 00:57 Selected Answer: B Upvotes: 4

BigLake leverages existing Cloud Storage infrastructure, eliminating the need for a dedicated Dataproc cluster, reducing costs significantly.

Comment 5

ID: 1113616 User: scaenruy Badges: - Relative Date: 1 year, 8 months ago Absolute Date: Thu 04 Jul 2024 11:15 Selected Answer: C Upvotes: 1

C.
1. Load the data to BigQuery tables.
2. Create a taxonomy of policy tags in Data Catalog.
3. Add policy tags to columns.
4. Process with the Spark-BigQuery connector or BigQuery SQL.

Comment 5.1

ID: 1119991 User: raaad Badges: - Relative Date: 1 year, 8 months ago Absolute Date: Thu 11 Jul 2024 17:03 Selected Answer: - Upvotes: 7

- Option B offers a serverless approach that integrates Cloud Storage (as a data lake), BigLake (for table definition), Data Catalog (for data mesh), and BigQuery (for analytics), all of which are essential components of a flexible, scalable, and secure data platform.

16. PROFESSIONAL-DATA-ENGINEER Topic 1 Question 107

Sequence
279
Discussion ID
79778
Source URL
https://www.examtopics.com/discussions/google/view/79778-exam-professional-data-engineer-topic-1-question-107/
Posted By
AWSandeep
Posted At
Sept. 3, 2022, 2:07 p.m.

Question

You work for a shipping company that uses handheld scanners to read shipping labels. Your company has strict data privacy standards that require scanners to only transmit tracking numbers when events are sent to Kafka topics. A recent software update caused the scanners to accidentally transmit recipients' personally identifiable information (PII) to analytics systems, which violates user privacy rules. You want to quickly build a scalable solution using cloud-native managed services to prevent exposure of PII to the analytics systems. What should you do?

  • A. Create an authorized view in BigQuery to restrict access to tables with sensitive data.
  • B. Install a third-party data validation tool on Compute Engine virtual machines to check the incoming data for sensitive information.
  • C. Use Cloud Logging to analyze the data passed through the total pipeline to identify transactions that may contain sensitive information.
  • D. Build a Cloud Function that reads the topics and makes a call to the Cloud Data Loss Prevention (Cloud DLP) API. Use the tagging and confidence levels to either pass or quarantine the data in a bucket for review.

Suggested Answer

D

Answer Description Click to expand


Community Answer Votes

Comments 5 comments Click to expand

Comment 1

ID: 758867 User: dconesoko Badges: Highly Voted Relative Date: 2 years, 2 months ago Absolute Date: Wed 27 Dec 2023 18:48 Selected Answer: D Upvotes: 5

The cloud function with DLP seems the best option

Comment 2

ID: 962368 User: PhilipKoku Badges: Most Recent Relative Date: 1 year, 7 months ago Absolute Date: Thu 25 Jul 2024 07:28 Selected Answer: D Upvotes: 1

DLP is required

Comment 3

ID: 663864 User: HarshKothari21 Badges: - Relative Date: 2 years, 6 months ago Absolute Date: Fri 08 Sep 2023 19:45 Selected Answer: D Upvotes: 1

D option

Comment 4

ID: 658417 User: AWSandeep Badges: - Relative Date: 2 years, 6 months ago Absolute Date: Sun 03 Sep 2023 14:07 Selected Answer: D Upvotes: 2

D. Build a Cloud Function that reads the topics and makes a call to the Cloud Data Loss Prevention (Cloud DLP) API. Use the tagging and confidence levels to either pass or quarantine the data in a bucket for review.

Comment 4.1

ID: 762272 User: AzureDP900 Badges: - Relative Date: 2 years, 2 months ago Absolute Date: Sat 30 Dec 2023 20:45 Selected Answer: - Upvotes: 1

Agreed

17. PROFESSIONAL-DATA-ENGINEER Topic 1 Question 288

Sequence
288
Discussion ID
130291
Source URL
https://www.examtopics.com/discussions/google/view/130291-exam-professional-data-engineer-topic-1-question-288/
Posted By
scaenruy
Posted At
Jan. 4, 2024, 11:10 a.m.

Question

You are part of a healthcare organization where data is organized and managed by respective data owners in various storage services. As a result of this decentralized ecosystem, discovering and managing data has become difficult. You need to quickly identify and implement a cost-optimized solution to assist your organization with the following:

• Data management and discovery
• Data lineage tracking
• Data quality validation

How should you build the solution?

  • A. Use BigLake to convert the current solution into a data lake architecture.
  • B. Build a new data discovery tool on Google Kubernetes Engine that helps with new source onboarding and data lineage tracking.
  • C. Use BigQuery to track data lineage, and use Dataprep to manage data and perform data quality validation.
  • D. Use Dataplex to manage data, track data lineage, and perform data quality validation.

Suggested Answer

D

Answer Description Click to expand


Community Answer Votes

Comments 8 comments Click to expand

Comment 1

ID: 1252607 User: 987af6b Badges: - Relative Date: 1 year, 7 months ago Absolute Date: Sun 21 Jul 2024 18:52 Selected Answer: D Upvotes: 1

D. Dataplex

Comment 2

ID: 1231870 User: fitri001 Badges: - Relative Date: 1 year, 8 months ago Absolute Date: Mon 17 Jun 2024 12:23 Selected Answer: D Upvotes: 2

Option D, no doubt

Comment 3

ID: 1174319 User: hanoverquay Badges: - Relative Date: 1 year, 12 months ago Absolute Date: Fri 15 Mar 2024 15:43 Selected Answer: D Upvotes: 2

Option D, no doubt

Comment 4

ID: 1155702 User: JyoGCP Badges: - Relative Date: 2 years ago Absolute Date: Wed 21 Feb 2024 17:34 Selected Answer: D Upvotes: 2

Option D

Comment 5

ID: 1121900 User: Matt_108 Badges: - Relative Date: 2 years, 1 month ago Absolute Date: Sat 13 Jan 2024 18:22 Selected Answer: D Upvotes: 3

Clearly D

Comment 6

ID: 1118379 User: Sofiia98 Badges: - Relative Date: 2 years, 2 months ago Absolute Date: Wed 10 Jan 2024 09:51 Selected Answer: D Upvotes: 2

Agree with Dataplex option

Comment 7

ID: 1117938 User: raaad Badges: - Relative Date: 2 years, 2 months ago Absolute Date: Wed 10 Jan 2024 00:56 Selected Answer: D Upvotes: 4

Straight forward

Comment 8

ID: 1113521 User: scaenruy Badges: - Relative Date: 2 years, 2 months ago Absolute Date: Thu 04 Jan 2024 11:10 Selected Answer: D Upvotes: 1

D. Use Dataplex to manage data, track data lineage, and perform data quality validation.

18. PROFESSIONAL-DATA-ENGINEER Topic 1 Question 250

Sequence
318
Discussion ID
130198
Source URL
https://www.examtopics.com/discussions/google/view/130198-exam-professional-data-engineer-topic-1-question-250/
Posted By
scaenruy
Posted At
Jan. 3, 2024, 3:35 p.m.

Question

Your company's data platform ingests CSV file dumps of booking and user profile data from upstream sources into Cloud Storage. The data analyst team wants to join these datasets on the email field available in both the datasets to perform analysis. However, personally identifiable information (PII) should not be accessible to the analysts. You need to de-identify the email field in both the datasets before loading them into BigQuery for analysts. What should you do?

  • A. 1. Create a pipeline to de-identify the email field by using recordTransformations in Cloud Data Loss Prevention (Cloud DLP) with masking as the de-identification transformations type.
    2. Load the booking and user profile data into a BigQuery table.
  • B. 1. Create a pipeline to de-identify the email field by using recordTransformations in Cloud DLP with format-preserving encryption with FFX as the de-identification transformation type.
    2. Load the booking and user profile data into a BigQuery table.
  • C. 1. Load the CSV files from Cloud Storage into a BigQuery table, and enable dynamic data masking.
    2. Create a policy tag with the email mask as the data masking rule.
    3. Assign the policy to the email field in both tables. A
    4. Assign the Identity and Access Management bigquerydatapolicy.maskedReader role for the BigQuery tables to the analysts.
  • D. 1. Load the CSV files from Cloud Storage into a BigQuery table, and enable dynamic data masking.
    2. Create a policy tag with the default masking value as the data masking rule.
    3. Assign the policy to the email field in both tables.
    4. Assign the Identity and Access Management bigquerydatapolicy.maskedReader role for the BigQuery tables to the analysts

Suggested Answer

B

Answer Description Click to expand


Community Answer Votes

Comments 16 comments Click to expand

Comment 1

ID: 1131572 User: lipa31 Badges: Highly Voted Relative Date: 2 years, 1 month ago Absolute Date: Thu 25 Jan 2024 11:39 Selected Answer: B Upvotes: 15

Format-preserving encryption (FPE) with FFX in Cloud DLP is a strong choice for de-identifying PII like email addresses. FPE maintains the format of the data and ensures that the same input results in the same encrypted output consistently. This means the email fields in both datasets can be encrypted to the same value, allowing for accurate joins in BigQuery while keeping the actual email addresses hidden.

Comment 2

ID: 1115773 User: Smakyel79 Badges: Highly Voted Relative Date: 2 years, 2 months ago Absolute Date: Sun 07 Jan 2024 12:36 Selected Answer: - Upvotes: 5

As it states "You need to de-identify the email field in both the datasets before loading them into BigQuery for analysts" data masking should not be an option as the data would stored unmasked in BigQuery?

Comment 3

ID: 1230310 User: Anudeep58 Badges: Most Recent Relative Date: 1 year, 9 months ago Absolute Date: Fri 14 Jun 2024 10:37 Selected Answer: B Upvotes: 4

Option A:
Masking: Simple masking might not preserve the uniqueness and joinability of the email field, making it difficult to perform accurate joins between datasets.
Option C and D:
Dynamic Data Masking: These options involve masking the email field dynamically within BigQuery, which does not address the requirement to de-identify data before loading into BigQuery. Additionally, dynamic masking does not prevent access to the actual email data before it is loaded into BigQuery, potentially exposing PII during the data ingestion process.

Comment 4

ID: 1212428 User: chrissamharris Badges: - Relative Date: 1 year, 10 months ago Absolute Date: Thu 16 May 2024 15:20 Selected Answer: B Upvotes: 2

format-preserving encryption with FFX is required as the analysts want to perform JOINs

Comment 5

ID: 1154472 User: JyoGCP Badges: - Relative Date: 2 years ago Absolute Date: Tue 20 Feb 2024 04:57 Selected Answer: B Upvotes: 3

Option B
https://cloud.google.com/sensitive-data-protection/docs/pseudonymization

Comment 6

ID: 1152501 User: ML6 Badges: - Relative Date: 2 years ago Absolute Date: Sat 17 Feb 2024 12:12 Selected Answer: B Upvotes: 4

A) masking = replace with a surrogate character like # or * = output not unique, so cannot apply joins
C and D) question specifies to de-identify BEFORE loading into BQ, whereas these options perform dynamic masking IN BigQuery.

Therefore, only valid option is B.

Comment 7

ID: 1121698 User: Matt_108 Badges: - Relative Date: 2 years, 1 month ago Absolute Date: Sat 13 Jan 2024 14:56 Selected Answer: C Upvotes: 1

Option C. The need is to just mask the data to Analyst, without modifying the underlying data. Moreover, it's stored on 2 separate tables and the analysts need to be able to perform joins based on the masked data. Dynamic masking is the right module and the right masking rule is email mask (https://cloud.google.com/bigquery/docs/column-data-masking-intro#masking_options) which guarantees the join capabilities join

Comment 8

ID: 1119508 User: task_7 Badges: - Relative Date: 2 years, 2 months ago Absolute Date: Thu 11 Jan 2024 09:27 Selected Answer: B Upvotes: 5

A wouldn't preserve the email format
C&D maskedReader roles still grant access to the underlying values.
the only option is B

Comment 8.1

ID: 1120498 User: alfguemat Badges: - Relative Date: 2 years, 2 months ago Absolute Date: Fri 12 Jan 2024 07:48 Selected Answer: - Upvotes: 1

I dont't know why preserve email format is necessary to perform the join. A could be valid.

Comment 8.1.1

ID: 1141634 User: dduenas Badges: - Relative Date: 2 years, 1 month ago Absolute Date: Tue 06 Feb 2024 01:15 Selected Answer: - Upvotes: 1

masking only replace by specific characters, doing the field not unique and not ready for joins.

Comment 9

ID: 1117342 User: Sofiia98 Badges: - Relative Date: 2 years, 2 months ago Absolute Date: Tue 09 Jan 2024 10:32 Selected Answer: C Upvotes: 1

I will go for C, because there is a separate type of masking for emails, so whe to use the dafault? https://cloud.google.com/bigquery/docs/column-data-masking-intro#masking_options

Comment 10

ID: 1117126 User: GCP001 Badges: - Relative Date: 2 years, 2 months ago Absolute Date: Tue 09 Jan 2024 01:11 Selected Answer: C Upvotes: 1

data masking with BQ is correct with email masking rule.
Ref - https://cloud.google.com/bigquery/docs/column-data-masking-intro

Comment 10.1

ID: 1143407 User: tibuenoc Badges: - Relative Date: 2 years, 1 month ago Absolute Date: Wed 07 Feb 2024 15:37 Selected Answer: - Upvotes: 1

should be correct if they want to access tables and it's not valid for datasets

Comment 11

ID: 1115398 User: Jordan18 Badges: - Relative Date: 2 years, 2 months ago Absolute Date: Sat 06 Jan 2024 20:32 Selected Answer: - Upvotes: 2

why not B?

Comment 12

ID: 1114136 User: raaad Badges: - Relative Date: 2 years, 2 months ago Absolute Date: Fri 05 Jan 2024 01:04 Selected Answer: C Upvotes: 2

- The reason option C works well is that dynamic data masking in BigQuery allows the underlying data to remain unaltered (thus preserving the ability to join on this field), while also preventing analysts from viewing the actual PII.
- The analysts can query and join the data as needed for their analysis, but when they access the data, the email field will be masked according to the policy tag, and they will only see the masked version.

Comment 13

ID: 1112854 User: scaenruy Badges: - Relative Date: 2 years, 2 months ago Absolute Date: Wed 03 Jan 2024 15:35 Selected Answer: D Upvotes: 1

D. 1. Load the CSV files from Cloud Storage into a BigQuery table, and enable dynamic data masking.
2. Create a policy tag with the default masking value as the data masking rule.
3. Assign the policy to the email field in both tables.
4. Assign the Identity and Access Management bigquerydatapolicy.maskedReader role for the BigQuery tables to the analysts

19. PROFESSIONAL-DATA-ENGINEER Topic 1 Question 240

Sequence
320
Discussion ID
130183
Source URL
https://www.examtopics.com/discussions/google/view/130183-exam-professional-data-engineer-topic-1-question-240/
Posted By
scaenruy
Posted At
Jan. 3, 2024, 2:05 p.m.

Question

You are designing a data mesh on Google Cloud by using Dataplex to manage data in BigQuery and Cloud Storage. You want to simplify data asset permissions. You are creating a customer virtual lake with two user groups:

• Data engineers, which require full data lake access
• Analytic users, which require access to curated data

You need to assign access rights to these two groups. What should you do?

  • A. 1. Grant the dataplex.dataOwner role to the data engineer group on the customer data lake.
    2. Grant the dataplex.dataReader role to the analytic user group on the customer curated zone.
  • B. 1. Grant the dataplex.dataReader role to the data engineer group on the customer data lake.
    2. Grant the dataplex.dataOwner to the analytic user group on the customer curated zone.
  • C. 1. Grant the bigquery.dataOwner role on BigQuery datasets and the storage.objectCreator role on Cloud Storage buckets to data engineers.
    2. Grant the bigquery.dataViewer role on BigQuery datasets and the storage.objectViewer role on Cloud Storage buckets to analytic users.
  • D. 1. Grant the bigquery.dataViewer role on BigQuery datasets and the storage.objectViewer role on Cloud Storage buckets to data engineers.
    2. Grant the bigquery.dataOwner role on BigQuery datasets and the storage.objectEditor role on Cloud Storage buckets to analytic users.

Suggested Answer

A

Answer Description Click to expand


Community Answer Votes

Comments 7 comments Click to expand

Comment 1

ID: 1114065 User: raaad Badges: Highly Voted Relative Date: 2 years, 2 months ago Absolute Date: Thu 04 Jan 2024 21:50 Selected Answer: A Upvotes: 9

- dataplex.dataOwner: Grants full control over data assets, including reading, writing, managing, and granting access to others.
- dataplex.dataReader: Allows users to read data but not modify it.

Comment 1.1

ID: 1122250 User: AllenChen123 Badges: - Relative Date: 2 years, 1 month ago Absolute Date: Sun 14 Jan 2024 05:13 Selected Answer: - Upvotes: 4

Yes, https://cloud.google.com/dataplex/docs/lake-security#data-roles
Dataplex maps its roles to the data roles for each underlying storage resource (Cloud Storage, BigQuery).
^ simplify the permissions.

Comment 2

ID: 1213505 User: josech Badges: Most Recent Relative Date: 1 year, 9 months ago Absolute Date: Sun 19 May 2024 00:47 Selected Answer: C Upvotes: 1

The quetion is for BigQuery AND Cloud Storage for a Data Lake, so you should assign IAM permissions for both of them. C is correct.

Comment 3

ID: 1154435 User: JyoGCP Badges: - Relative Date: 2 years ago Absolute Date: Tue 20 Feb 2024 03:34 Selected Answer: A Upvotes: 1

Option A

Comment 4

ID: 1122595 User: qq589539483084gfrgrgfr Badges: - Relative Date: 2 years, 1 month ago Absolute Date: Sun 14 Jan 2024 15:26 Selected Answer: A Upvotes: 3

A correct answer

Comment 5

ID: 1121676 User: Matt_108 Badges: - Relative Date: 2 years, 1 month ago Absolute Date: Sat 13 Jan 2024 14:17 Selected Answer: A Upvotes: 2

Option A clearly correct

Comment 6

ID: 1112770 User: scaenruy Badges: - Relative Date: 2 years, 2 months ago Absolute Date: Wed 03 Jan 2024 14:05 Selected Answer: A Upvotes: 1

A.
1. Grant the dataplex.dataOwner role to the data engineer group on the customer data lake.
2. Grant the dataplex.dataReader role to the analytic user group on the customer curated zone.