0% found this document useful (0 votes)
81 views93 pages

Cloud Solution Architecture Planning Guide

The document outlines the preparation for a Professional Cloud Architect journey, focusing on designing and planning cloud solution architectures. It details the business and technical requirements for Cymbal Direct, including the need for scalability, managed services, and secure partner integration. Additionally, it discusses potential solutions, migration strategies, and best practices for utilizing Google Cloud services like Compute Engine and Kubernetes.

Uploaded by

NSaudagarN
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
81 views93 pages

Cloud Solution Architecture Planning Guide

The document outlines the preparation for a Professional Cloud Architect journey, focusing on designing and planning cloud solution architectures. It details the business and technical requirements for Cymbal Direct, including the need for scalability, managed services, and secure partner integration. Additionally, it discusses potential solutions, migration strategies, and best practices for utilizing Google Cloud services like Compute Engine and Kubernetes.

Uploaded by

NSaudagarN
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Preparing for Your

Professional
Cloud Architect
Journey
Module 1: Designing and Planning a Cloud Solution
Architecture
Week 2 agenda
Diagnostic Questions
for exam guide Section
1: Designing and
Google Compute
Dataproc planning a cloud
Engine & Persistent
solution architecture
Disks

1 2 3 4 5 6

Designing and planning


a cloud solution QUIZ Dataflow
architecture
Designing and
planning a cloud
solution architecture
Define systems in scope for a cloud migration
… and / or decide on “cloud first” approach

Delivery by Drone Purchase & Product APIs Social Media Highlighting

● Their website frontend, pilot, ● APIs are simply built into ● Single SuSE linux VM
and truck management monolithic apps, and were ● MySQL DB
systems run on Kubernetes. not designed for partner
● Redis
integration.
● Positional data for drone and ● Python
truck location is kept in a ● APIs are running on Ubuntu
MongoDB database clusters linux VMs

● Drones stream video to virtual


machines via stateful
connection
Cymbal Direct’s business requirements

● Scale to handle additional demand when expanding into test markets


● Streamline development
● Spend developer time on core business functionality as much as possible
● Let partners order directly via API
● Deploy the social media highlighting service and ensure appropriate content
Cymbal Direct’s technical requirements

● Managed services
● Container-based workloads
● Highly scalable environment
● Standardization where possible
● Existing virtualization infrastructure refactored over time
● Secure partner integration
● Streaming IoT data
Putting it together: Existing environment

Existing environment Technical requirements Business requirements Proposed product/ solution

Website frontend, pilot, and truck


management systems run on
Kubernetes

* One row of a much larger spreadsheet


Putting it together: Technical requirements

Existing environment Technical requirements Business requirements Proposed product/ solution


(does it…?)

Website frontend, pilot, and truck ● Move to managed services


management systems run on wherever possible
Kubernetes ● Ensure that developers can
deploy container based
workloads to testing and
production environments in a
highly scalable environment.
● Standardize on containers
where possible

* One row of a much larger spreadsheet


Putting it together

Existing environment Technical requirements Business requirements Proposed product/ solution


(does it…?) (does it…?)

Move to managed services

?
Website frontend, pilot, and truck ● ● Easily scale to handle
management systems run on wherever possible additional demand when
Kubernetes ● Ensure that developers can needed?
deploy container based ● Streamline development?
workloads to testing and
production environments in a
highly scalable environment.
● Standardize on containers
where possible

* One row of a much larger spreadsheet


Potential solutions

Existing environment Technical requirements Business requirements Proposed product/ solution


(does it…?) (does it…?)

Website frontend, pilot, and truck ● Move to managed services ● Easily scale to handle ● Global HTTP(s) Load Balancer
management systems run on wherever possible additional demand when
● GKE
Kubernetes ● Ensure that developers can needed?
● Separate projects
deploy container based ● Streamline development?
workloads to testing and ● Migration type: lift and shift
production environments in a ● Replace GKE with Cloud Run
highly scalable environment. for website (future)
● Standardize on containers
where possible

* One row of a much larger spreadsheet


Decision flow diagram
Compute engine

App engine
Potential options
________________
Component in existing Container Based Cloud run

environment ● Compute Engine


● App Engine Yes GKE
Example: Web front end VMs
● Cloud Run
● GKE Streamline
Cloud run
Development

Yes GKE

Scale Cloud run


Best potential option
________________
Yes GKE
Check to make sure there no
limiting factors
Serverless Service Cloud run

*standard
Planning for migration and the future

On premises Kubernetes
Lift
Phase 1 and
Shift
Google Kubernetes Engine
(GKE)

Modernize Phase 2

Cloud run
Migration guide & best practices

● Types of migrations and their use-cases


○ For example, “If the current app isn't meeting
your goals—for example, you don't want to
maintain it, it's too costly to migrate using one of
the previously mentioned approaches, or it's not
supported on Google Cloud—you can do a
rebuild migration.”

● Building inventory of workloads in scope


○ … along with their dependencies!

● Best practices for validating a migration plan.


Google Compute Engine
(GCE)
Google Compute Engine

Exam Tips: GCE is a basic IaaS service, but there are lots
Infrastructure as a Service (IaaS)
of details you’re expected to know:
● Differences between PD images / snapshots / VM
images.
● vCPUs (cores) and Memory (RAM) ● How to troubleshoot VM not booting up properly
● Custom image vs public image + startup scripts
● Persistent disks ● VM price differ between regions
● PDs are network-attach devices and - as such -
● Networking consume VM bandwidth.
● VM network performance scales with # of vCPUs.
● Linux or Windows ● etc…
Compute Engine - how to differentiate between families?
Scale-out
Best TCO Balanced Workload-Optimized
Optimized
•Web Serving •Enterprise apps •Scale-out •EDA •SAP HANA •ML
•Steady-state LOB •Medium Workloads •HPC •Largest in •HPC
apps databases •Web Serving •Scientific memory DBs •Massive
•Dev & Test •Web & App •Containerized Modeling •Real-time data parallelized
environments Serving microservices •AAA Gaming analytics computation
•Small prod •In-memory cache
environments

Best Perf/$ for


Highest Highest
Cost savings a Leading perf scale out Most memory on
performance performance
priority and perf/$ workloads Compute Engine
CPUs GPUs

General ScaleOut Compute- Memory- Accelerator-


Cost-Optimized
Purpose optimized Optimized Optimized Optimized
(E2)
(N2 and N2D) Tau (T2D, T2A) (C2, C2D) (M1, M2, M3) (A2)
Compute Engine: Max shapes by machine type
Exam Tip: Custom machines can be used only for
some VM families & up to 224vCPU/896 GB RAM Instances (VMs) are available in a wide range
of shapes, from 0.2 to 416 vCPUs and from 1
GB to ~ 12 TB RAM.
E2, N1, N2, and N2D machine types provide
custom shapes as well.

RAM GiB vCPUs


Compute Engine VPC External IP
Network perspective
Exam Tips:
● Network bandwidth limited &
dependent on vCPU count (up to

ing

s
res
~32Gbps for N2s + Tier1 extends

res

ing
s
further)
● You can expect the best network
performance for traffic within the same 1 1
zone, using internal IP addresses.
● Remember about multi-NIC VMs (up to
1 .. 8
8)
● Storage is a network resource! => 1 1
GCE Instance
Network bandwidth shared between
network AND disk activity

ess

egr
ess
egr
40%

When PD and Net egress


network
compete.
60%
External IP
VPC
PD writes
Link
Compute Engine: Metadata Server

The metadata server stores information


about the instance or project. IP address and
DNS

● Metadata request/response never leaves Custom


the physical host. SSH keys

● Metadata information is encrypted on Startup and


the way to the virtual machine host. shutdown scripts

● Metadata server can generate a signed Custom


token for apps to verify the instance metadata
identity.
Service
account

Maintenance
events
Compute Engine: Spot (Preemptible) VMs
Made for batch, fault-tolerant, and high throughput computing

Super-low-cost, short-term instances Ideal for a variety of stateless, fault-tolerant


workloads
● Up to 91% less than standard instances
● Genomics, pharmaceuticals
● No maximum duration, may be preempted
with 30-seconds notice (preemptible: max ● Physics, math, computational chemistry
24 hrs)
● Data processing (for example, with Hadoop or
● Simple to use with graceful termination Cloud DataProc)

● Image handling, rendering, and media


transcoding
Exam Tips:
● Those use-cases usually pop up at ● Monte Carlo simulations
the exam with regards to Spot VMs /
● Financial services
Preemptibles.
● Can also be used in GKE clusters!
Compute Engine: automate start & stop activities
Executed from metadata, either directly or from file:
● Startup:
○ gcloud compute instances create VM_NAME \ Exam Tips:
--image-project=debian-cloud \ ● Startup / shutdown scripts are
--image-family=debian-10 \ best-effort only!
--metadata=startup-script='#! /bin/bash ● Startup / shutdown scripts are always
apt update run by root (Linux) / System (Windows)
apt -y install apache2 ● Shutdown scripts are especially useful
cat <<EOF > /var/www/html/[Link] for:
<html><body><p>Linux startup script added directly.</p></body></html> ○ MIGs (to copy back processed data or
EOF' logs before a VM goes down).
○ Spot / Preemptible VMs, which are
● Shutdown: much more vulnerable to be stopped.
○ gcloud compute instances create example-instance ● Startup / shutdown scripts can be set on
--metadata-from-file=shutdown-script=FILE_PATH VM or project (!!!) level -> will trigger for
every VM. VM-level always take
precedence (if exists, project-level script
To see output of startup/shutdown script:
is not executed)
● gcloud compute instances create example-instance --metadata
● Shutdown scripts have timeouts:
shutdown-script="#! /bin/bash
○ 90s for standard instances
> # Shuts down Apache server
○ 30s for Spot / Preemptible instances
> /etc/init.d/apache2 stop"
Compute Engine creation
public OS image vs custom OS image vs snapshot vs machine image

Exam Tips:
● Custom images should be centralized and controlled from lifecycle perspective (know what are image families and
image states
● Public / Custom OS image IS NOT the same as “machine image”
● You can create a VM based on all of those options (public / custom OS image, snapshot, existing disk, machine
image)
Shielded VMs Exam Tips: Using Shielded VMs is a best practice in
GCP!
Secure vTPM Integrity Result/implications
Boot Monitoring

ON ON ON Most secure. Allows for use of vTPM for data encryption using vTPM protected key, Secure Boot to prevent
malicious rootkits and bootkits, and Integrity Monitoring to alert to any changes in boot process. Secure
Boot may not be compatible with customers drivers or other software.

OFF ON ON Default when creating a GCP VM. Allows for use of vTPM for data encryption using vTPM protected key
and Integrity Monitoring to alert to any changes in boot process. If customer has unsigned drivers or low
level software this is the most secure option as Secure Boot would not be compatible.

OFF OFF OFF Least secure. No benefits of Shielded VM. This is not recommended.
Sole-Tenant Nodes
Regular VMs on regular machines, dedicated specifically to your workloads.

Dedicated hardware

Mix-and-match VMs to
consume host resources

Full access to host resources


for 10% premium*

*10% Premium based on on-demand price


Quick Start for Sole-Tenant Nodes (1/2)
Each Sole-Tenant Node has a 1:1 mapping to a
physical Host and represents a reserved host.

Step 1: Reserve Sole-Tenant Node(s)


// 1. CREATE NODE TEMPLATE
My-Node-Template $ gcloud compute sole-tenancy \
node-templates create my-node-template \
--node-type n1-node-96-624 \
--region us-central1

// 2. CREATE NODE GROUP OF 3 NODES


us-central1-c us-central1-f $ gcloud compute sole-tenancy \
node-groups create my-node-group-1 \
--node-template my-node-template \
--target-size 3 \
--zone us-central1-c

My-Node-Group-1 My-Node-Group-2 // 2b. [FOR ILLUSTRATION] CREATE ANOTHER


$ gcloud compute sole-tenancy \
env:IN:dev node-groups create my-node-group-2 \
--node-template my-node-template \
--target-size 3 \
--zone us-central1-f
Quick Start for Sole-Tenant Nodes (2/2)
Exam Tips: More info on provisioning VMs on sole-tenant nodes can be found
here.

Step 1: Reserve Sole-Tenant Node(s) 2. Schedule Instance(s)


VM
My-Node-Template 12 vCPU

3 ways to schedule:
// SCHEDULE ONTO A SPECIFIC NODE
us-central1-c $ gcloud compute instances create \
us-central1-f
INSTANCE_NAME --node=NODE_NAME

// SCHEDULE ON ANY NODE IN NODE GROUP


$ gcloud compute instances create \
INSTANCE_NAME \
--node-group=NODE_GROUP_NAME

My-Node-Group-1 My-Node-Group-2 // SCHEDULE ONTO ANY HOST WITH


// MATCHING LABELS
env:IN:dev $ gcloud compute instances create
INSTANCE_NAME --zone ZONE
--node-affinity-file=[Link]
Sole-Tenant Nodes: Using Node Affinity Labels

us-central1-c us-west1-a us-east1-b


env:IN:dev

// SCHEDULE ONTO ANY HOST WITH


// MATCHING LABELS
$ gcloud compute instances create
My-Node-Group-1 My-Node-Group-2 My-Node-Group-3 INSTANCE_NAME --zone ZONE
--node-affinity-file=[Link]

us-central1-c us-west1-a us-east1-b // SCHEDULE ONTO ANY HOST WITH


env:IN:prod

// IN MY-NODE-GROUP-1
$ gcloud compute instances create
INSTANCE_NAME --zone ZONE
--node-group my-node-group-1
My-Node-Group-4 My-Node-Group-5 My-Node-Group-6

workload:IN:backend workload:IN:frontend
Managed Instance Groups: Run VMs at Scale
Up to thousands of VMs
Works with load balancing

Autohealing

High availability

Multi-zone Group

Scalability Autoscaling

Update orchestration Auto-updating

Preserved state Stateful

Exam Tips: pros & cons of “ready” custom OS image vs public image + startup scripts
Stateful vs stateless
And why stateless is usually preferred…

Exam Tips:
● Here a look at this document.
● Prefer stateless. Use stateful only when
necessary, eg:
○ Databases
○ Data processing apps (Kafka etc)
○ Legacy monoliths

Stateful Stateless (PREFERRED!)


Each server retains information about it’s client sessions, Server does not retain any information about the client
such as the current state of an application or the content sessions. Each request made by the client is treated as an
of a user shopping cart. independent transaction, and the server does not maintain
any memory of previous requests.
Not perfect if we have multiple backends that can serve Greater scalability and flexibility.
the requests…
Can scale up easily. Can’t scale down easily since each Ability to scale up & down easily
server keeps its’ state.
Choosing instance groups for Compute Engine

Properties of Feature
Type of Instance Group Exam Tip:
Instances
● Unmanaged are used to group
Unmanaged Heterogeneous EXISTING, different VMs under one
“umbrella” and balance traffic to
Managed Homogeneous Instance Templates healthy ones only. For example,
Autoscaling used in lift&shift migrations.
● You can’t update existing instance
Zonal Same zone Latency consistency template (need to create a new
one)
Regional Different zones Reliability
● Know the difference between
scale-out and scale-up!
MIG - Autoscaling

CPU Utilization External HTTPS Capacity

Treats the target CPU Autoscaling works with


utilization level as a fraction of maximum backend utilization
the average use of all vCPUs and maximum requests per
over time in the instance second/instance
group

Cloud Monitoring Metrics Schedules

Per Instance or Per Group Additional autoscaler


Standard or custom metrics Up to 128 schedules
Not for log-based metrics Min instances
Duration With scale-in controls
Start time & Recurrence
Without scale-in controls
Updating MIGs
= implementing new image versions

Exam Tip: Know WELL how to rollout new versions to MIGs, incl. canary & rollback strategies
Compute Engine: (most important) Organization Policy Constraints
Supported
Constraint Description
Prefixes
Disable VM serial disables serial port access to Compute Engine VMs belonging to the organization, project, or "is:"
port access folder
constraints/[Link]
Disable SSH in disables the SSH-in-browser tool in the Cloud Console. When enforced, the SSH-in-browser "is:"
browser button is disabled.
constraints/[Link]
Require OS Login enables OS Login on all newly created Projects. All VM instances created in new projects will "is:"
have OS Login enabled.
constraints/[Link]
Shielded VMs when set to True, requires that all new Compute Engine VM instances use Shielded disk "is:"
images with Secure Boot, vTPM, and Integrity Monitoring options enabled. Secure Boot can
be disabled after creation, if desired.
constraints/[Link]
Restrict VPC peering list constraint defines the set of VPC networks that are allowed to be peered with the VPC "is:", "under:"
usage networks belonging to this project, folder, or organization.
constraints/[Link]
Compute Engine: (most important) Organization Policy Constraints

Supported
Constraint Description
Prefixes
Skip default network Skips the creation of the default network and related resources during Project resource "is:"
creation creation if set to True.
constraints/[Link]
Define trusted image defines the set of projects that can be used for image storage and disk instantiation for "is:"
projects Compute Engine.
constraints/[Link]
Define allowed Defines the set of Compute Engine VM instances that are allowed to use external IP "is:"
external IPs for VM addresses.
instances constraints/[Link]
VM Pricing and cost optimization
Sustained Use Discounts (SUD) Per second billing
Up to 30% savings on Compute Engine and Cloud SQL Up to 38% savings by paying per second, not per hour

Committed Use Discounts (CUD) Network Service Tiers


Up to 70% savings without upfront fees or instance-type lock-in Pick performance and get 70% more bandwidth than other clouds,
or pick cost savings and save up to 9% compared to other clouds
Spot / Preemptible VM instances
Up to 91% savings on workloads that can be interrupted, like data Rightsizing (incl. choosing optimal GCE
mining and data processing
families) and Custom Machine Types

Exam Tips:
● Common pattern for optimization costs for unused PDs: you
can create a snapshot, and delete the disk to reduce the
maintenance cost of that disk by 35% to 92%.
● For premium OS, you’re billed for license per vCPU per
second.
● Bring Your Own License is an option for some OSes
● Use Extended memory to save on OS license costs.
Migrate for Compute Engine
Lift&Shift your VMWare, AWS, Azure workloads to GCE

● Purpose-built, enterprise-grade
● Migrate from on-prem or other clouds
● Proven at scale, having migrated customers w/ thousands of
workloads
● Success across healthcare, energy, government, manufacturing,
and more

Agentless Streaming Frictionless


Nothing to install on source Migrate storage while apps Automate migration and
machines run in-cloud conversion
Minimize complexity, reduce in GCP Reduce touch points for IT,
IT labor requirements by 5+ Eliminate long upfront data provide uninterrupted
hours per server, keep transfers and unpredictable experience for line of
migrations on track. maintenance windows, business owners and end
enabling fast time-to-cloud users.
and reduced downtime.
Persistent Disks best practices, tips & tricks
Exam Tips:
● Use “--no-boot-disk-auto-delete” parameter if you don’t want boot / OS disk to be deleted if a VM gets deleted.
● CMEK and CSEK can be used to encrypt PDs. Have a look at how to use CSEK for a PD.
● Avoid using ext3 filesystems in Linux (poor performance under heavy write loads). Prefer ext4.
● You can share a PD across multiple VMs at the same time in read-only mode.
○ If read-write required, prefer a managed solution such as Filestore or utilize GCS.
● You can share a SSD PD across two N2 VMs at the same time in read-write mode. In Preview as of Q1 ‘23 -> should NOT
be covered on the exam.
● To recover from a corrupted disk / Os not booting properly, follow this procedure.
○ On high-level, just attach the corrupted disk as non-boot disk to another VM and troubleshoot.
● For special use-cases (app needs a RAM disk with exceptionally low latency and high throughput and cannot just use
the VM memory), you can create a tmpfs filesystem by allocating some VM memory as a RAM disk.
● If needed, you can attach 1-24 local SSDs (ephemeral = data is lost if VM stops; each 375 GB and physically attached
to the server that hosts your VM instance). Local SSDs are NOT the same as PD SSDs!!! More information here.
QUIZ time!
Cloud Dataproc
The benefits of Hadoop/Spark on Cloud

On premises On compute engine Cloud Dataproc


Custom code Custom code Custom code

Monitoring/Health Monitoring/Health Monitoring/Health

Dev integration Dev integration Dev integration

Scaling Scaling Scaling

Job submission Job submission Job submission

GCP connectivity GCP connectivity GCP connectivity

Deployment Deployment Deployment

Creation Creation Creation

Self-managed Google managed

Exam Tip: if exam question mentions Apache Hadoop / Spark / Pig / Hive, plus it’s
clear that the customer already invested in building the pipelines in on-premises and
does not want to lose it, you should probably go with Dataproc.
Flexible compute: Split clusters and jobs

Cluster 1
Cluster 2
Job 1
Job 3
Job 2

Spark / Hadoop Cluster

Job 1
Cluster 3
Job 2
Job 4
Job 3

Job 4

Exam Tips:
● When thinking about Dataproc, you should really think about per-job, ephemeral, auto-scaling clusters with
auto-shutdown after the task is completed.
● Using Spot/Preemptible VMs for secondary Dataproc workers is a common pattern.
● Switching from HDFS to GCS is also a best practice in most cases.
Proprietary + Confidential

Dataproc : - Managed Spark and Hadoop service that lets you take advantage of
open source data tools for batch processing, querying, streaming, and machine learning.

Cluster node options: HDFS: Data storage:


Single node (for Use Cloud Storage Don’t use hdfs to store
experimentation) for a stateless solution. input/output data; use it for
Standard (1 primary only) temporary working storage.
HBASE:
High availability (3 primaries) Disk performance SCALES
Use Cloud Bigtable
with size!
Benefits: for a stateless solution.
Hadoop: familiar Cloud storage:
Objectives:
Automated cluster mgmt Match your data location
Shut down the cluster
Fast cluster resize with your compute location;
when it is not actually
Flexible VM configurations zone matters.
running jobs.
Machine types and
Start a cluster per job or for
preemptible VMs
a particular kind of work.
Proprietary + Confidential

Dataproc Spark RDD pipeline operations


use lazy execution

Transformations are "lazy" Actions: "Do it now"

Anonymous
functions

TIP
Spark can wait till all the requests are in before applying resources.
Proprietary + Confidential
Cloud Dataflow
Cloud Dataflow Stream Analytics
● Works with Cloud Pub/Sub to deliver stream analytics
● Real-time data processing with “exactly-once” semantic

The fully-managed, serverless,


Unified with Batch
auto-optimized data processor that
● No more Lambda architecture
simplifies development and management
● Apache Beam provides unified batch & streaming
of stream and batch pipelines
● Reuse skills, tools and code

Open Source ensures Portability


Exam Tips:
● if exam question mentions Apache Beam -> ● Pipelines written in Beam API are portable
most probably answer is Dataflow. ● Runners include Dataflow, Flink, Samza and Spark

● When you’re starting from scratch with ETL,


Dataflow is preferred over Dataproc! Auto-optimizations
● No more cluster management
● KEY thing about Dataflow: it’s able to serve ● Submit a job and Dataflow auto-optimizes resources
BOTH batch and streaming within a SINGLE ● Makes pipelines faster and cost-effective
pipeline.
Proprietary + Confidential

Dataflow :- fully-managed, scalable data processing service for executing batch,


stream, and ETL processing patterns.
Write Java or Python code You can get input from any Dataflow supports side
and deploy it to Dataflow, of several sources, and you inputs, which allows
which then executes the can write output to any of different transformations
pipeline. several sinks. The pipeline on data in the same pipeline.
code remains the same.
Open-source API (Apache
Beam) can also be executed You can put code inside TIP
on Flink, Spark, etc. a servlet, deploy it to App
For Dataflow users,
Engine, and schedule a
Parallel tasks are autoscaled use roles to limit
cron task queue in App
by execution framework. access to only
Engine to execute the
Dataflow Resources,
Same code does real-time pipeline periodically.
not just the Project.
and batch.
Proprietary + Confidential

Dataflow does ingest, transform,


and load on Batch or Stream
Any combination of basic and
custom transformations

Batch

Filtered

Filtered and grouped


Stream

Filtered, grouped, and windowed


Data Sources and Sinks for Dataflow Proprietary + Confidential

Data Sinks
Sources

BigQuery BigQuery
Cloud
Dataflow
Cloud Cloud
Storage Storage

Cloud Cloud
Pub/Sub Pub/Sub

Cloud Cloud
Bigtable Bigtable

Cloud Cloud
Datastore Datastore

Exam Tips:
● Dataflow does NOT store data! There is always a Source and a Sink
Dataflow: Google Provides Templates for different use-cases

Pipeline Graph
List of templates

Template description
and usage instructions
Proprietary + Confidential

Example architecture for data analytics

Tableau

QlikView
Proprietary + Confidential

Comparison : Dataflow & Dataproc

Parameter Dataflow Dataproc


Purpose Managed data processing service Fully managed open sourced clusters
with on demand integration.
Cluster provisioning Automatic provisioning of clusters Manual provisioning of clusters
Scaling Horizontal scaling of worker resources Horizontal & vertical scaling both
for maximum utilization of resources supported
System integration Using Apache Beam Using Apache Spark & Hadoop
Ease of use Comparatively difficult to use Ease of use
Uniqueness Batch & stream job processing of data Data science / Machine learning
ecosystem
Database replication Full table No database replication
Type of service It is a serverless application Google cloud engine deployed
Optimized for Writing code to develop applications Writing code to develop applications
which need to process data at scale which need to process data at scale
leveraging spark.
[optional] Links to useful
materials
Optional materials 1
[ READING ]
● get a feeling of the differences between PD snapshots, images and machine images (important from exam
perspective: it's good to know what is global/regional, what can be used to create VMs, how to share those resources
between projects / regions etc).
● What is GCP metadata server?
● Sole-tenant nodes
● How stateful workloads are different from stateless workloads
● How to achieve HA with Regional Persistent Disks and what a “--force-attach” is.
● Image management best practices | Compute Engine Documentation | Google Cloud
● Best practices for persistent disk snapshots | Compute Engine Documentation | Google Cloud
● Encrypt disks with customer-supplied encryption keys | Compute Engine Documentation | Google Cloud
Optional materials 2
[ VIDEOS ]
● Networking 102 (Cloud Routing and VPC Peering): Cloud OnAir: CE Chat: Google Cloud Networking 102 - Cloud
Routing and VPC Peering
● What is Persistent Disk?: What is Persistent Disk? #GCPSketchnote
● Introduction to Virtual Machines (Next '19): Introduction to Virtual Machines (Cloud Next '19)
● Best Practices for GCE: Best Practices for GCE Enterprise Deployments (Cloud Next '19)
● VM Manager overview: What is VM Manager?
● GCE Managed Instance Groups: Using managed instance groups
● All you need to know about Migrate for Compute Engine: Migrate for Compute Engine
● Effective autoscaling with Managed Instance Groups: Effective autoscaling with managed instance groups
● Shared VPC: Level Up From Zero Episode 4: Shared VPC
● BeyondCorp overview: BeyondCorp Enterprise in a minute
● App Engine introduction: Get to know Google App Engine
● Pub/Sub overview: Cloud Pub/Sub Overview - ep. 1
● Cloud security basics: Top 3 access risks in Cloud Security
● What is DNS?: What is DNS? | How a DNS Server (Domain Name System) works | DNS Explained
Optional materials 3
[ PODCASTS ]
● Firestore intro, plus differences between SQL and NoSQL databases
● BeyondCorp

[ DEEP DIVES ]
● What is envelope encryption?
● Stateful Managed Instance Groups
● Key Management Service deep dive
● BeyondProd security model (evolution of BeyondCorp model)
Diagnostic Questions
for Exam Guide Section 1: Designing
and planning a cloud solution
architecture
PCA Exam Guide Section 1:
Designing and planning a cloud solution architecture

1.1
Designing a solution infrastructure that
meets business requirements

1.2
Designing a solution infrastructure that
meets technical requirements

1.3
Designing network, storage, and compute
resources

1.4 Creating a migration plan

1.5 Envisioning future solution improvements


Designing a solution infrastructure
1.1 that meets business requirements

Considerations include:
● Business use cases and product strategy
● Cost optimization
● Supporting the application design
● Integration with external systems
● Movement of data
● Design decision trade-offs
● Build, buy, modify, or deprecate
● Success measurements (e.g., key performance
indicators [KPI], return on investment [ROI], metrics)
● Compliance and observability
1.1 Diagnostic Question 01 Discussion

Cymbal Direct drones continuously send A. Ingest data with IoT Core, process it with Dataprep, and store it in a
data during deliveries. You need to Coldline Cloud Storage bucket.
process and analyze the incoming
B. Ingest data with IoT Core, and then publish to Pub/Sub. Use Dataflow
telemetry data. After processing, the
to process the data, and store it in a Nearline Cloud Storage bucket.
data should be retained, but it will only
be accessed once every month or two. C. Ingest data with IoT Core, and then publish to Pub/Sub. Use BigQuery
Your CIO has issued a directive to to process the data, and store it in a Standard Cloud Storage
incorporate managed services wherever bucket.
possible. You want a cost-effective D. Ingest data with IoT Core, and then store it in BigQuery.
solution to process the incoming
streams of data.

What should you do?


1.1 Diagnostic Question 01 Discussion

Cymbal Direct drones continuously send A. Ingest data with IoT Core, process it with Dataprep, and store it in a
data during deliveries. You need to Coldline Cloud Storage bucket.
process and analyze the incoming
B. Ingest data with IoT Core, and then publish to Pub/Sub. Use Dataflow
telemetry data. After processing, the
to process the data, and store it in a Nearline Cloud Storage bucket.
data should be retained, but it will only
be accessed once every month or two. C. Ingest data with IoT Core, and then publish to Pub/Sub. Use BigQuery
Your CIO has issued a directive to to process the data, and store it in a Standard Cloud Storage
incorporate managed services wherever bucket.
possible. You want a cost-effective D. Ingest data with IoT Core, and then store it in BigQuery.
solution to process the incoming
streams of data.

What should you do?


1.1 Diagnostic Question 02 Discussion

Customers need to have a good A. Eighty-five percent of customers are satisfied users
experience when accessing your web B. Eighty-five percent of requests succeed when
application so they will continue to use aggregated over 1 minute
your service. You want to define key
C. Low latency for > 85% of requests when aggregated
performance indicators (KPIs) to
over 1 minute
establish a service level objective (SLO).
D. Eighty-five percent of requests are successful

Which KPI could you use?


1.1 Diagnostic Question 02 Discussion

Customers need to have a good A. Eighty-five percent of customers are satisfied users
experience when accessing your web B. Eighty-five percent of requests succeed when
application so they will continue to use aggregated over 1 minute
your service. You want to define key
C. Low latency for > 85% of requests when aggregated
performance indicators (KPIs) to
over 1 minute
establish a service level objective (SLO).
D. Eighty-five percent of requests are successful

Which KPI could you use?


Designing a solution infrastructure
1.1 that meets business requirements

Resources to start your journey

Google Cloud Architecture Framework: System design


SRE Books
Designing a solution infrastructure
1.2 that meets technical requirements

Considerations include:
● High availability and failover design
● Elasticity of cloud resources with respect to quotas and limits
● Scalability to meet growth requirements
● Performance and latency
1.2 Diagnostic Question 03 Discussion

Cymbal Direct developers have written a A. Stop the instance, and then use the
new application. Based on initial usage command gcloud compute instances
estimates, you decide to run the application set-machine-type VM_NAME --machine-type e2-standard-8. Start
on Compute Engine instances with 15 Gb the instance again.
of RAM and 4 CPUs. These instances store
B. Stop the instance, and then use the command gcloud compute instances
persistent data locally. After the
set-machine-type VM_NAME --machine-type e2-standard-8. Set the
application runs for several months,
instance’s metadata to: preemptible: true. Start the instance again.
historical data indicates that the
application requires 30 Gb of RAM. C. Stop the instance, and then use the command gcloud compute instances
Cymbal Direct management wants you to set-machine-type VM_NAME --machine-type 2-custom-4-30720.
make adjustments that will minimize costs. Start the instance again.
D. Stop the instance, and then use the command gcloud compute instances
What should you do? set-machine-type VM_NAME --machine-type 2-custom-4-30720. Set
the instance’s metadata to: preemptible: true. Start the instance again.
1.2 Diagnostic Question 03 Discussion

Cymbal Direct developers have written a A. Stop the instance, and then use the
new application. Based on initial usage command gcloud compute instances
estimates, you decide to run the application set-machine-type VM_NAME --machine-type e2-standard-8. Start
on Compute Engine instances with 15 Gb the instance again.
of RAM and 4 CPUs. These instances store
B. Stop the instance, and then use the command gcloud compute instances
persistent data locally. After the
set-machine-type VM_NAME --machine-type e2-standard-8. Set the
application runs for several months,
instance’s metadata to: preemptible: true. Start the instance again.
historical data indicates that the
application requires 30 Gb of RAM. C. Stop the instance, and then use the command gcloud compute instances
Cymbal Direct management wants you to set-machine-type VM_NAME --machine-type 2-custom-4-30720.
make adjustments that will minimize costs. Start the instance again.
D. Stop the instance, and then use the command gcloud compute instances
What should you do? set-machine-type VM_NAME --machine-type 2-custom-4-30720. Set
the instance’s metadata to: preemptible: true. Start the instance again.
Designing a solution infrastructure
1.2 that meets technical requirements

Resources to start your journey

Google Cloud Architecture Framework: System design


Designing network, storage, and
1.3 compute resources

Considerations include:
● Integration with on-premises/multicloud environments
● Cloud-native networking (VPC, peering, firewalls, container networking)
● Choosing data processing technologies
● Choosing appropriate storage types (e.g., object, file, databases)
● Choosing compute resources (e.g., preemptible, custom machine type,
specialized workload)
● Mapping compute needs to platform products
1.3 Diagnostic Question 04 Discussion

You are creating a new project. You plan to A. Create a new project, leave the default
set up a Dedicated interconnect between network in place, and then use the default
two of your data centers in the near future 10.x.x.x network range to create subnets in your desired regions.
and want to ensure that your resources are B. Create a new project, delete the default VPC network, set up an auto
only deployed to the same regions where mode VPC network, and then use the default 10.x.x.x network range to
your data centers are located. You need to create subnets in your desired regions.
make sure that you don’t have any
C. Create a new project, delete the default VPC network, set up a custom
overlapping IP addresses that could
mode VPC network, and then use IP addresses in the 172.16.x.x
cause conflicts when you set up the
address range to create subnets in your desired regions.
interconnect. You want to use RFC 1918
class B address space. D. Create a new project, delete the default VPC network, set up the
network in custom mode, and then use IP addresses in the 192.168.x.x
address range to create subnets in your desired zones. Use VPC
What should you do? Network Peering to connect the zones in the same region to create
regional networks.
1.3 Diagnostic Question 04 Discussion

You are creating a new project. You plan to A. Create a new project, leave the default
set up a Dedicated interconnect between network in place, and then use the default
two of your data centers in the near future 10.x.x.x network range to create subnets in your desired regions.
and want to ensure that your resources are B. Create a new project, delete the default VPC network, set up an auto
only deployed to the same regions where mode VPC network, and then use the default 10.x.x.x network range to
your data centers are located. You need to create subnets in your desired regions.
make sure that you don’t have any
C. Create a new project, delete the default VPC network, set up a custom
overlapping IP addresses that could
mode VPC network, and then use IP addresses in the 172.16.x.x
cause conflicts when you set up the
address range to create subnets in your desired regions.
interconnect. You want to use RFC 1918
class B address space. D. Create a new project, delete the default VPC network, set up the
network in custom mode, and then use IP addresses in the 192.168.x.x
address range to create subnets in your desired zones. Use VPC
What should you do? Network Peering to connect the zones in the same region to create
regional networks.
1.3 Diagnostic Question 05 Discussion

Cymbal Direct is working with Cymbal Retail, a A. Verify that the subnet range
separate, autonomous division of Cymbal with Cymbal Retail is using doesn’t
different staff, networking teams, and data overlap with Cymbal Direct’s subnet
center. Cymbal Direct and Cymbal Retail are range, and then enable VPC Network
not in the same Google Cloud organization. Peering for the project.
Cymbal Retail needs access to Cymbal Direct’s B. If Cymbal Retail does not have access to a Google Cloud data
web application for making bulk orders, but the center, use Carrier Peering to connect the two networks.
application will not be available on the
C. Specify Cymbal Direct’s project as the Shared VPC host project,
public internet. You want to ensure that
and then configure Cymbal Retail’s project as a service project.
Cymbal Retail has access to your
application with low latency. You also want to D. Verify that the subnet Cymbal Retail is using has the same IP
avoid egress network charges if possible. address range with Cymbal Direct’s subnet range, and then enable
VPC Network Peering for the project.

What should you do?


1.3 Diagnostic Question 05 Discussion

Cymbal Direct is working with Cymbal Retail, a A. Verify that the subnet range
separate, autonomous division of Cymbal with Cymbal Retail is using doesn’t
different staff, networking teams, and data overlap with Cymbal Direct’s subnet
center. Cymbal Direct and Cymbal Retail are range, and then enable VPC Network
not in the same Google Cloud organization. Peering for the project.
Cymbal Retail needs access to Cymbal Direct’s B. If Cymbal Retail does not have access to a Google Cloud data
web application for making bulk orders, but the center, use Carrier Peering to connect the two networks.
application will not be available on the
C. Specify Cymbal Direct’s project as the Shared VPC host project,
public internet. You want to ensure that
and then configure Cymbal Retail’s project as a service project.
Cymbal Retail has access to your
application with low latency. You also want to D. Verify that the subnet Cymbal Retail is using has the same IP
avoid egress network charges if possible. address range with Cymbal Direct’s subnet range, and then enable
VPC Network Peering for the project.

What should you do?


1.3 Diagnostic Question 06 Discussion

Cymbal Direct's employees will use A. Order a Dedicated Interconnect from a Google Cloud partner, and ensure
Google Workspace. Your current that proper routes are configured.
on-premises network cannot meet B. Connect the network to a Google point of presence, and enable Direct
the requirements to connect to Peering.
Google's public infrastructure.
C. Order a Partner Interconnect from a Google Cloud partner, and ensure
that proper routes are configured.
What should you do? D. Connect the on-premises network to Google’s public infrastructure via a
partner that supports Carrier Peering.
1.3 Diagnostic Question 06 Discussion

Cymbal Direct's employees will use A. Order a Dedicated Interconnect from a Google Cloud partner, and ensure
Google Workspace. Your current that proper routes are configured.
on-premises network cannot meet B. Connect the network to a Google point of presence, and enable Direct
the requirements to connect to Peering.
Google's public infrastructure.
C. Order a Partner Interconnect from a Google Cloud partner, and ensure
that proper routes are configured.
What should you do? D. Connect the on-premises network to Google’s public infrastructure via a
partner that supports Carrier Peering.
1.3 Diagnostic Question 07 Discussion

Cymbal Direct is evaluating database A. Extract the data from MongoDB. Insert the data into
options to store the analytics data from its Firestore using Datastore mode.
experimental drone deliveries. You're B. Create a Bigtable instance, extract the data from MongoDB,
currently using a small cluster of MongoDB and insert the data into Bigtable.
NoSQL database servers. You want to move
to a managed NoSQL database service C. Extract the data from MongoDB. Insert the data into
Firestore using Native mode.
with consistent low latency that can
scale throughput seamlessly and can D. Extract the data from MongoDB, and insert the
handle the petabytes of data you expect data into BigQuery.
after expanding to additional markets.

What should you do?


1.3 Diagnostic Question 07 Discussion

Cymbal Direct is evaluating database A. Extract the data from MongoDB. Insert the data into
options to store the analytics data from its Firestore using Datastore mode.
experimental drone deliveries. You're B. Create a Bigtable instance, extract the data from MongoDB,
currently using a small cluster of MongoDB and insert the data into Bigtable.
NoSQL database servers. You want to move
to a managed NoSQL database service C. Extract the data from MongoDB. Insert the data into
Firestore using Native mode.
with consistent low latency that can
scale throughput seamlessly and can D. Extract the data from MongoDB, and insert the
handle the petabytes of data you expect data into BigQuery.
after expanding to additional markets.

What should you do?


Designing network, storage, and
1.3 compute resources

Resources to start your journey

Choose and manage compute | Architecture


Framework | Google Cloud
Design your network infrastructure | Architecture
Framework | Google Cloud
Select and implement a storage strategy |
Architecture Framework | Google Cloud
Google Cloud documentation
1.4 Creating a migration plan

Considerations include:
● Integrating solutions with existing systems
● Migrating systems and data to support the solution
● Software license mapping
● Network planning
● Testing and proofs of concept
● Dependency management planning
1.3 Diagnostic Question 08 Discussion

You are working with a client who is using A. In Cloud Shell, create a YAML file defining your Deployment called
Google Kubernetes Engine (GKE) to [Link]. Create a Deployment in GKE by running the command
migrate applications from a virtual kubectl apply -f [Link]
machine–based environment to a B. In Cloud Shell, create a YAML file defining your Container called [Link].
microservices-based architecture. Your Create a Container in GKE by running the command gcloud builds submit
client has a complex legacy application –config [Link] .
that stores a significant amount of data on C. In Cloud Shell, create a YAML file defining your StatefulSet called
the file system of its VM. You do not want [Link]. Create a StatefulSet in GKE by running the command
to re-write the application to use an external kubectl apply -f [Link]
service to store the file system data. D. In Cloud Shell, create a YAML file defining your Pod called [Link]. Create a
Pod in GKE by running the command kubectl apply -f [Link]

What should you do?


1.3 Diagnostic Question 08 Discussion

You are working with a client who is using A. In Cloud Shell, create a YAML file defining your Deployment called
Google Kubernetes Engine (GKE) to [Link]. Create a Deployment in GKE by running the command
migrate applications from a virtual kubectl apply -f [Link]
machine–based environment to a B. In Cloud Shell, create a YAML file defining your Container called [Link].
microservices-based architecture. Your Create a Container in GKE by running the command gcloud builds submit
client has a complex legacy application –config [Link] .
that stores a significant amount of data on C. In Cloud Shell, create a YAML file defining your StatefulSet called
the file system of its VM. You do not want [Link]. Create a StatefulSet in GKE by running the command
to re-write the application to use an external kubectl apply -f [Link]
service to store the file system data. D. In Cloud Shell, create a YAML file defining your Pod called [Link]. Create a
Pod in GKE by running the command kubectl apply -f [Link]

What should you do?


1.4 Diagnostic Question 09 Discussion

You are working in a mixed environment of A. Manually create a GKE cluster, and then use Migrate to
VMs and Kubernetes. Some of your Containers (Migrate for Anthos) to set up the cluster, import VMs,
resources are on-premises, and some and convert them to containers.
are in Google Cloud. Using containers as B. Use Migrate to Containers (Migrate for Anthos) to automate the
a part of your CI/CD pipeline has sped up creation of Compute Engine instances to import VMs and convert
releases significantly. You want to start them to containers.
migrating some of those VMs to
C. Manually create a GKE cluster. Use Cloud Build to import VMs and
containers so you can get similar benefits.
convert them to containers.
You want to automate the migration
process where possible. D. Use Migrate for Compute Engine to import VMs and convert them
to containers.
What should you do?
1.4 Diagnostic Question 09 Discussion

You are working in a mixed environment of A. Manually create a GKE cluster, and then use Migrate to
VMs and Kubernetes. Some of your Containers (Migrate for Anthos) to set up the cluster, import VMs,
resources are on-premises, and some and convert them to containers.
are in Google Cloud. Using containers as B. Use Migrate to Containers (Migrate for Anthos) to automate the
a part of your CI/CD pipeline has sped up creation of Compute Engine instances to import VMs and convert
releases significantly. You want to start them to containers.
migrating some of those VMs to
C. Manually create a GKE cluster. Use Cloud Build to import VMs and
containers so you can get similar benefits.
convert them to containers.
You want to automate the migration
process where possible. D. Use Migrate for Compute Engine to import VMs and convert them
to containers.
What should you do?
1.4 Creating a migration plan

Resources to start your journey

Migrate to Containers | Google Cloud


Migration to Google Cloud: Choosing your migration path
Migrating to the cloud: a guide and checklist
Cloud Migration Products & Services
Application Migration | Google Cloud
1.5 Envisioning future solution improvements

Considerations include:
● Cloud and technology improvements
● Evolution of business needs
● Evangelism and advocacy
1.5 Diagnostic Question 10 Discussion

Cymbal Direct has created a proof of


concept for a social integration service A. Move the existing codebase and VM provisioning scripts to git, and
that highlights images of its products attach external persistent volumes to the VMs.
from social media. The proof of concept B. Make sure that the application declares any dependent requirements in a
is a monolithic application running on a [Link] or equivalent statement so that they can be referenced in
single SuSE Linux virtual machine (VM). a startup script. Specify the startup script in a managed instance group
The current version requires template, and use an autoscaling policy.
increasing the VM’s CPU and RAM in C. Make sure that the application declares any dependent requirements in a
order to scale. You would like to [Link] or equivalent statement so that they can be referenced in
refactor the VM so that you can scale a startup script, and attach external persistent volumes to the VMs.
out instead of scaling up.
D. Use containers instead of VMs, and use a GKE autoscaling
What should you do? deployment.
1.5 Diagnostic Question 10 Discussion

Cymbal Direct has created a proof of


concept for a social integration service A. Move the existing codebase and VM provisioning scripts to git, and
that highlights images of its products attach external persistent volumes to the VMs.
from social media. The proof of concept B. Make sure that the application declares any dependent requirements in a
is a monolithic application running on a [Link] or equivalent statement so that they can be referenced in
single SuSE Linux virtual machine (VM). a startup script. Specify the startup script in a managed instance group
The current version requires template, and use an autoscaling policy.
increasing the VM’s CPU and RAM in C. Make sure that the application declares any dependent requirements in a
order to scale. You would like to [Link] or equivalent statement so that they can be referenced in
refactor the VM so that you can scale a startup script, and attach external persistent volumes to the VMs.
out instead of scaling up.
D. Use containers instead of VMs, and use a GKE autoscaling
What should you do? deployment.
Envisioning future solution
1.5 improvements

Resources to start your journey

Twelve-factor app development on Google Cloud |


Cloud Architecture Center
Make sure to…
Enjoy the journey as much
as the destination!

You might also like