NOSQL Database Architecture : Redis 8.
Both Redis Enterprise Cluster and Cassandra have sym-
metric architectures where all nodes have the same roles
1. What does the data partitioning strategy of Redis favor ?
and responsibilities. However, there is a di�erence between
A. It favors an equitable data distribution between them. What is this di�erence ?
the nodes
A. Cassandra distributes data across nodes in a
B. It favors storing related data together on the same ring, while Redis uses sharding to partition data,
node with no inherent order between the nodes.
C. It favors maximizing data redundancy across all B. Cassandra replicates data across nodes in a linear se-
nodes quence, while Redis uses hierarchical data distribu-
D. It favors minimizing the number of nodes storing tion.
each piece of data C. Cassandra uses random partitioning without a spe-
2. Which sustainability bene�t is most associated with the ci�c structure, while Redis distributes data using
type of storage used by Redis ? consistent hashing in a ring.
A. Lower hardware costs due to less need for memory D. Cassandra clusters nodes based on geographical
B. Improved data durability and reliability proximity, while Redis partitions data strictly based
on key ranges.
C. Enhanced scalability with reduced carbon footprint
D. Reduced energy consumption due to lower disk 9. How does Redis Enterprise Cluster support multi-tenancy ?
I/O A. by assigning each user a separate physical server for
their databases.
3. What type of data storage does Redis primarily use ?
B. by allowing multiple databases from di�erent
A. Redis primarily uses disk-based storage.
applications and users to run on the same clus-
B. Redis uses hybrid in-memory and disk-based storage. ter and node while being fully isolated.
C. Redis primarily uses in-memory data storage. C. by using a single database instance shared among all
D. Redis uses cloud-based storage exclusively. users without isolation.
4. Which mechanisms does Redis use to ensure data persis- D. by dynamically creating and deleting databases based
tence ? on user demand without providing isolation.
A. Redis uses logs to persist all update operations. 10. How does the shared-nothing architecture of Redis Enter-
B. Redis uses only in-memory storage without any per- prise Cluster enhance its performance and scalability com-
sistence. pared to other database systems ?
C. Redis regularly creates snapshots of the data- A. it relies on a global lock manager to coordinate ope-
base. rations across nodes.
D. Redis relies on an external database for data persis- B. it partitions data across nodes, but requires regular
tence. synchronization to maintain consistency.
E. Redis creates backups only upon manual user re- C. it uses a distributed cache to synchronize data bet-
quest. ween nodes in real-time.
5. How does Redis HA Clustered Database achieve high avai- D. it ensures each node operates independently wi-
lability ? thout sharing memory or storage.
A. by using a master-worker model with manual failo- 11. What other systems use the primary-secondary principle
ver. for replicas ?
B. by using consistent hashing and distributing data A. MongoDB
evenly across nodes, with replicas on the same rack B. Kafka
for faster access. C. Hadoop
C. by using replica sets and automatically promo- D. Cassandra
ting replicas to primary if the primary fails. 12. How does the primary-secondary replication system di�er
D. By using multiple master nodes from the use of identical replicas ?
6. What similar mechanism to the separation of data-path and A. It improves availability
control-path is used in the architectures we already saw in B. It complicates the scaling process
class ? C. It enhances consistency
A. The separation between YARN ResourceMana- D. It reduces redundancy
ger and NodeManager 13. What main role does the zero-latency proxy play in the Re-
B. The separation between Cassandra Coordinator dis Enterprise Cluster architecture ?
nodes and Storage nodes
A. exposes the database endpoint to database
C. The separation between MongoDB Con�g servers clients while masking behind-the-scenes activi-
and Query routers ties
D. The separation between Neo4j Core servers and Read B. handles all write operations and propagates changes
replicas to the nodes.
7. Which access option in Redis is similar to Cassandra’s me- C. manages the cluster’s resource allocation and load
thod of connecting to speci�c nodes for direct data opera- balancing.
tions ? D. is responsible for data persistence by creating regular
A. Redis Database endpoint snapshots.
B. Redis Sentinel API Accepté aussi 14. How does automatic scaling work in Redis ?
C. Redis Primary/Replica URL A. It requires manual intervention to distribute data
D. Redis OSS Cluster API across nodes.
1
B. It is based on a �xed schedule, regardless of resource 19. How can the cultural barriers to data governance adoption
usage. within an organization be overcome ?
C. It is triggered when prede�ned resource limits A. Cultural barriers are minimal and can be ignored ; fo-
are reached cusing on technological enforcement is su�cient to
D. It only occurs when the system is restarted. ensure compliance.
15. How can Redis compare to Cassandra in terms of quality B. Overcoming barriers requires implementing stricter
attributes ? Select all that applies. policies and monitoring compliance.
A. Redis provides a better consistency thanks to C. Cultural barriers can be addressed by outsourcing
the leader-follower strategy for replication data governance responsibilities to external consul-
tants who manage the process without involving in-
B. Redis provides a better failover handling thanks to
ternal sta�.
the automatic switching from the primary to the se-
condary replicas D. Through education and training, involving sta-
keholders in the governance process, and de-
C. Cassandra provides a better failover strategy
monstrating the bene�ts of data governance
thanks to its peer to peer architecture with in-
through pilot projects
tegrated replicas
D. Cassandra provides a more balanced data distribution 20. How can data governance be used to help the company
thanks to its range-based partitionning strategy reach its sustainability goals ? (multiple answers)
E. Redis provides a better performance thanks to the A. By ensuring compliance with environmental re-
absence of a coordinator node (found in Cassandra), gulations.
which makes the data access direct. B. By supporting data-driven decision-making for
sustainable practices.
Data Governance C. By increasing data redundancy to ensure data safety.
D. By outsourcing data management to reduce internal
16. In the context of data governance, why is it critical to ba- costs.
lance data accessibility with data security, and how can this E. By integrating sustainability goals in the gover-
balance be achieved in a large-scale data environment ? nance policies of the company.
A. It ensures that authorized users can e�ciently
access necessary data while protecting sensitive
information from unauthorized access. This ba-
| Bon travail
lance can be achieved through role-based ac-
cess control, encryption, and data masking tech-
niques.
B. It is critical to balance data accessibility with data se-
curity to comply with regulatory requirements, and
this can be achieved by implementing a single-layer
security protocol.
C. The balance is crucial to prevent data breaches by res-
tricting data access to a minimum, achieved through
the physical separation of databases.
D. It is necessary to enhance user productivity by allo-
wing unrestricted data access, which can be achieved
through frequent password updates.
17. What are the key roles and responsibilities that should be
established within an organization to ensure e�ective data
governance ?
A. Key roles include appointing a data governance o�-
cer responsible for all data-related decisions, delega-
ting the rest to departmental managers.
B. Key roles and responsibilities should be divided
between strategic, tactical and technical o�cers.
C. Establishing a centralized IT team to manage all as-
pects of data governance without involving business
units or data owners.
D. Creating roles for data security o�cers who focus on
protecting data.
18. What methods can be used to conduct Data Quality assess-
ments ?
A. We should conduct regular data pro�ling, vali-
dation, and cleansing.
B. We should rely on automated data entry from reliable
sources to maintain quality.
C. We can use frequent random sampling.
D. We should focus on user feedbacks to detect the ano-
malies with the data.
2
E XA ME N - S E SSI ON P R IN C IPA L E
BIG DATA
REDIS Database
Redis is an in-memory data store used by millions of developers as a cache, vector
database, document database, streaming engine, and message broker. Redis has built-in
replication and different levels of on-disk persistence. It supports complex data
types (e.g., strings, hashes, lists, sets, sorted sets, and JSON), with atomic operations
defined on those data types.
REDIS Architecture
Redis architecture1 contains two main processes: Redis client and Redis Server. Redis
client and server can be in the same computer or in two different computers.
Redis server is responsible for storing data in memory. It handles all kinds of
management and forms the major part of architecture. Redis client can be Redis console
client or any other programming language’s Redis API.
Redis stores everything in primary memory. Primary memory is volatile and therefore we
will lose all stored data once we restart our Redis server or computer. Therefore, for data
persistence, Redis supports the following mechanisms:
• RDB: At specified intervals, RDB makes a copy of all the data in memory and stores
them in permanent storage.
• AOF: AOF logs all write operations received by the server, thereby making all data
persistent.
• SAVE command: Redis server can be forced to create a RDB snapshot any time
using the SAVE command.
Redis also supports replication for fault-tolerance and data accessibility. To enhance
storage capacity, you can also group two/more Redis servers to form a cluster.
REDIS Cluster Architecture
Redis Enterprise2 can be either a single Redis server database or a cluster. This allows a
Redis Enterprise database to either scale horizontally across many servers through
sharding or to copy data, which ensures high availability with Redis Enterprise
replicas. Sharding is a type of database partitioning that separates large databases into
smaller, faster, and more easily managed parts. These smaller parts are called data
shards. With sharding or partitioning, you are not restricted to storing data on the
memory of a single computer. Another advantage of sharding is being able to use the
computational power of multiple cores.
1 [Link]
2 [Link]
G L4 - I NSAT | 1/4 | 20 2 3 -20 24
In Redis Enterprise, a cluster is a set of cloud instances, virtual machine/container nodes,
or bare-metal servers that let you create any number of Redis databases in a memory/
storage pool that’s shared across the set. The cluster doesn’t need to scale up/out (or
down/in) whenever a new database is created or deleted. A scaling operation is
triggered only when one of the predefined limit thresholds has been reached, such as
memory, CPU, network, and storage IOPS.
To create a sharded cluster, you need to first specify the number of shards. Once you’ve
done this, your data will automatically be sharded or divided into groups and placed on
optimal nodes. At any given time, a Redis Enterprise cluster node can include between
zero and a few hundred Redis databases in one of the following types:
• A simple database, i.e. a single primary shard
• A highly available (HA) database, i.e.
a pair of primary and replica shards
• A c l u s t e re d d a t a b a s e , w h i c h
contains multiple primary shards,
each managing a subset of the
dataset (or in Redis terms, a
different range of “hash-slots”)
• An HA clustered database, i.e.
multiple pairs of primary/replica
shards
Each database can be accessed in multiple ways:
• Database endpoint: Simply connect your application to your database endpoint (a
unique URL and port on the fully qualified domain name), and Redis Enterprise will
transparently handle all the scaling and failover operations.
• Sentinel API: Use sentinel protocol to connect to the correct node in the cluster in
order to access your database.
• OSS Cluster API: Use the cluster API to directly connect to each shard of your
cluster without any additional hops.
Multiple databases from different applications and users can run on the same Redis
Enterprise cluster and node while fully isolated with multi-tenancy.
High availability
Planning for disaster recovery and zero downtime means having a replica set of the Redis
Enterprise cluster across a single region or several regions. High availability clusters are a
group of hosts that merge as a single system to prevent downtime.
If one server in a high availability cluster goes down, the mission-critical application is
immediately transferred to another server as soon as the fault is detected. Redis
Enterprise will ensure that the replica shard process is always created on a different node
to achieve high failover. If nodes go down, Redis Enterprise will make sure that the
replica shard process on the other available node becomes the new primary shard.
A high availability cluster will utilise multiple systems and multiple primary and replica
nodes that are already integrated. So should a failure cause one system to fail, another
G L4 - I NSAT | 2/4 | 20 2 3 -20 24
can be efficiently leveraged to maintain the continuity of the service or application being
used.
Shared-nothing, linearly scalable, multi-tenant, symmetric architecture
REDIS ENTERPRISE CLUSTER
Redis Enterprise cluster is built on a complete separation between the data-path
components (i.e proxies and shards) and the control/management path components (i.e.
cluster-management processes), which provides a number of significant benefits:
• Performance: Data-path entities need not deal with control and management
duties. The Redis Enterprise architecture guarantees that any processing cycles are
dedicated to serving users’ requests, which improves the overall performance. For
example, each Redis shard in a Redis Enterprise cluster works as if it were a
standalone Redis instance. The shard doesn’t need to monitor other Redis
instances, has no need to deal with failure or partition events, and is unaware of
which hash-slots are being managed.
• Availability: The application continues to access data from its Redis database, even
as sharding, re-sharding, and re-balancing takes place. No manual changes are
needed to ensure data access.
• Security: Redis Enterprise prevents configuration commands from being executed
via the regular Redis APIs. Any configuration operation is allowed through a secure
Redis CLI, UI, or API interface that follows role-based authorisation controls. The
proxy-based architecture ensures that only certified connections can be created
with each shard and Redis shards can receive only certified requests.
• Manageability: Database provisioning, configuration changes, software updates,
and more are done with a single command (via UI or API) in a distributed manner
and without interrupting user traffic.
• Scalability: Databases scale horizontally by distributing data sets across multiple
nodes, servers, and clusters. Scaling involves robust replication of partitioned
database instances to multiple nodes.
Cluster components
G L4 - I NSAT | 3/4 | 20 2 3 -20 24
Redis Enterprise cluster is built on a symmetric architecture, with all nodes containing
the following components:
• Redis shard: An open source Redis cluster or open source Redis instance with
either a primary or replica role that is part of the database.
• Zero-latency proxy: The proxy runs on each node of the cluster, is written in C, and
based on a cut-through, multi-threaded, lock-free stateless architecture. The proxy
handles the following primary functionalities:
◦ Hides cluster complexity from the application/user
◦ Maintains the database endpoint
◦ Requests forwarding
◦ Manages data encryption through SSL
◦ Provides strong, client-based SSL authentication
◦ Enables Redis acceleration through pipelining and connection
management
• Cluster manager: This component contains a set of distributed processes that
together manage the entire cluster lifecycle. The cluster manager entity is
completely separated from the data path components (proxies and Redis shards)
and has the following responsibilities:
◦ database provisioning and de-provisioning
◦ Database provisioning and de-provisioning for optimal resource utilisation
◦ Automatically scaling to handle peak workloads
◦ Automatically re-sharding to guarantee high-throughput, low latency, real-
time performance
◦ Automatically re-balancing to guarantee high-throughput, low latency, real-
time performance
◦ Resource management monitoring the entire system’s health
◦ Node watchdog that monitors all processes running on a given Redis node
and triggers a shard failure event
◦ Cluster watchdog that is responsible for the health of the Redis cluster
nodes and triggers a node failure event
• Secure REST API: In cluster mode, all the management and control operations on
Redis Enterprise are performed through a dedicated and secure API that is resistant
to attacks and provides better control of cluster admin operations. One of the main
advantages of this interface is the ability to provision and de-provision Redis
resources at a very high rate, with almost no dependency on the underlying
infrastructure. This makes it very suitable for microservices-based architectures.
G L4 - I NSAT | 4/4 | 20 2 3 -20 24