DevOps Module 2: Version Control & CI/CD
DevOps Module 2: Version Control & CI/CD
2. Recognise suitable file types managed by version control tools. Version control tools are best for
managing text-based files where changes can be easily tracked line-by-line. Suitable file types include:
• Source Code: .java, .py, .js, .html, .css
Note: Large binary files like compiled code (.exe), videos (.mp4), or design files (.psd) are
generally not suitable as version control cannot effectively track their changes.
4. List major stages involved in continuous integration pipelines. A Continuous Integration (CI)
pipeline automates the process of integrating code changes from multiple developers. The major stages
are:
1. Commit: Developer pushes code to a shared repository.
2. Build: The CI server automatically compiles the source code into a runnable application.
3. Test: Automated unit and integration tests are run to check for errors.
4. Report: The pipeline reports the status (success or failure) to the development team.
5. Identify matching tools for each DevOps phase in a pipeline.
• Plan: Jira, Trello
• Code (Source Control): Git, GitHub
• Build: Maven, Gradle
• Test: JUnit, Selenium
• Deploy (CI/CD Server): Jenkins, GitLab CI
• Operate (Infrastructure): Docker, Kubernetes, Ansible
• Monitor: Prometheus, Grafana, ELK Stack
7. Suggest an open-source tool used for log aggregation during monitoring. The ELK Stack is a
popular choice for log aggregation. It consists of:
• Elasticsearch: A search and analytics engine to store logs.
• Logstash: A data processing pipeline to collect and transform logs.
• Kibana: A visualization tool to create dashboards and explore logs. Another common alternative
is Fluentd.
2. Analyse the transition from traditional deployment to continuous methods. The transition from
traditional to continuous deployment methods represents a fundamental shift in software delivery,
focusing on automation, speed, and risk reduction.
• Traditional Deployment (e.g., Waterfall Model):
• Process: Manual, infrequent, and high-risk. Deployments were large, "big bang" events
that happened monthly or quarterly.
• Risk: High. A single failure could cause a major outage and require a complex rollback.
• Feedback Loop: Very long. Bugs were often discovered late in the cycle, making them
expensive to fix.
• Culture: Siloed teams (Development, QA, Operations) with handoffs between them.
• Continuous Deployment Methods (e.g., DevOps/CI/CD):
• Process: Automated, frequent, and low-risk. Small, incremental changes are deployed to
production automatically as soon as they pass all tests.
• Risk: Low. Since changes are small, identifying and fixing issues is much easier and
faster.
• Feedback Loop: Very short. Automated testing provides immediate feedback, allowing
developers to fix bugs quickly.
• Culture: Collaborative. Dev and Ops teams work together throughout the entire
lifecycle, sharing responsibility.
Analysis of Transition: The transition requires more than just new tools; it demands a cultural shift.
Teams must embrace automation for building, testing, and deploying. The focus moves from delivering
large features to delivering small, valuable increments continuously. This change reduces deployment
anxiety and allows organizations to respond to market needs much faster.
3. Design a toolchain that connects integration testing and deployment. A toolchain automates the
process from testing to deployment, ensuring that only quality code reaches production.
1. Source Code Management (SCM): A developer pushes code to a feature branch in a Git
repository (e.g., on GitHub).
2. CI Server Trigger: A Pull Request to the develop branch triggers a job on a CI server like
Jenkins.
3. Build & Unit Test: Jenkins uses a build tool like Maven to compile the code and runs fast unit
tests using JUnit.
4. Deploy to Staging: If unit tests pass, Jenkins builds a Docker container and deploys it to a
dedicated staging or testing environment running on Kubernetes.
5. Integration Testing: Once deployed to staging, Jenkins triggers automated integration tests.
• For API testing, it uses a tool like Postman (via its command-line runner, Newman).
• For UI testing, it uses a framework like Selenium to simulate user interactions in a
browser.
6. Promote & Deploy to Production: If all integration tests pass, the Pull Request can be
approved and merged. The merge to the main branch triggers another Jenkins job that deploys
the same verified Docker container to the production environment.
This chain ensures that code is thoroughly tested in a production-like environment before it is released,
automating the validation process.
5. Illustrate stages of a container lifecycle and its isolation benefits. A container has a simple, well-
defined lifecycle managed by a container engine like Docker.
Stages of the Lifecycle:
1. Create: A container is created from an image but not yet started. Its filesystem and
configuration are prepared. (docker create)
2. Run: The container is started, and the command specified in the image is executed. (docker
run)
3. Pause/Unpause: All processes within the container are temporarily suspended (paused) and can
be resumed (unpaused).
4. Stop: The main process inside the container is sent a signal to shut down gracefully.
5. Kill: The container is shut down immediately, without waiting for processes to finish.
6. Remove: The stopped container is permanently deleted from the system. (docker rm)
Isolation Benefits: Containers provide process and resource isolation without the overhead of a full
virtual machine. This is achieved through two key Linux kernel features:
• Namespaces: Each container gets its own isolated view of the system, including its own
process tree, network interfaces, and filesystem mounts. This means a process inside a container
cannot see or interfere with processes in another container or on the host system.
• Control Groups (cgroups): This feature limits how much of the host system's resources (like
CPU, memory, and disk I/O) a container can use. This prevents a single container from
consuming all available resources and impacting other containers.
Example: You can run two web applications on the same server, one requiring Python 2.7 and the other
Python 3.8. Using containers, each application runs in its own isolated environment with its specific
Python version, completely avoiding any dependency conflicts.
6. Choose tools for implementing configuration management in a hybrid infrastructure. A hybrid
infrastructure combines on-premises data centers with public cloud resources (like AWS or Azure). The
key is to choose tools that can manage both environments consistently.
Recommended Tools:
1. Ansible:
• Why it's a good choice: It is agentless, meaning it communicates with servers over
standard protocols like SSH (for Linux) and WinRM (for Windows). This makes it very
easy to set up in a diverse, hybrid environment without needing to install special
software on every server.
• How it works: You write simple configuration "playbooks" in YAML. Ansible pushes
these configurations to your servers, whether they are in your data center or in the cloud.
2. Terraform:
• Why it's a good choice: While Ansible is for configuring servers, Terraform is for
provisioning the infrastructure itself (e.g., creating VMs, networks, databases). It
supports all major cloud providers and on-premise solutions like VMware.
• How it works: You define your entire infrastructure in code. Terraform then creates and
manages that infrastructure, ensuring your on-prem and cloud environments are
provisioned consistently.
Implementation Strategy: Use Terraform to provision the core infrastructure (servers, networks)
across both on-prem and cloud locations. Then, use Ansible to configure the software, services, and
security settings on those servers in a uniform way. This combination provides a powerful and
consistent approach to managing a hybrid environment.
7. Map each monitoring layer with appropriate feedback metrics. Monitoring should be done in
layers to get a complete view of system health. Each layer has key metrics that provide feedback.
• Layer 1: Infrastructure Monitoring (The Foundation)
• Focus: Health of the underlying hardware and network.
• Metrics (The "USE" Method):
• Utilization: How busy the resource is (e.g., CPU Utilization %).
• Saturation: How much extra work the resource can't handle (e.g., CPU queue
length).
• Errors: The count of errors (e.g., Network packet drops).
• Example Tools: Prometheus, Nagios.
• Layer 2: Application Performance Monitoring (APM)
• Focus: Performance and health of the application code itself.
• Metrics (The "RED" Method):
• Rate: The number of requests per second.
• Errors: The number of failed requests per second.
• Duration: The time it takes to process a request (latency).
• Example Tools: Datadog, New Relic.
• Layer 3: Business Metrics Monitoring (The User Impact)
• Focus: How system performance impacts business goals and user experience.
• Metrics:
• Conversion Rate: Percentage of users who complete a desired action (e.g., make
a purchase).
• User Engagement: How many active users are on the platform.
• Revenue Impact: Tracking revenue in real-time to detect issues affecting sales.
• Example Tools: Google Analytics, custom dashboards in Grafana.
Repository Structure: A single central repository will be hosted on a platform like GitLab or
GitHub. This acts as the single source of truth.
Branching Logic and Policies:
• main Branch:
• Purpose: This branch represents the official, production-ready release history. The code
in main is always stable and deployable.
• Rules:
• Direct commits to main are strictly forbidden.
• Code only gets into main by merging from release or hotfix branches.
• Each commit on main is tagged with a version number (e.g., v1.0, v1.1).
• develop Branch:
• Purpose: This is the primary integration branch where all completed features are
merged. It reflects the latest development state for the next release.
• Rules:
• Developers do not commit directly to develop.
• Code is merged into develop from feature branches via Pull Requests.
• Purpose: Used for developing new features. Each feature is developed in its own
isolated branch.
• Workflow:
• Branched off from develop.
• Developers work and commit their changes here.
• When the feature is complete, a Pull Request is created to merge it back into
develop.
• The Pull Request requires a code review and passing CI tests before it can be
merged.
• release/* Branches (e.g., release/v1.1):
• Purpose: To prepare for a new production release. This branch allows for final testing,
bug fixing, and documentation updates without interrupting development on the
develop branch.
• Workflow:
• Branched off from develop when it is feature-complete.
2. Critique pipeline failures in CI/CD systems and trace root causes with examples. CI/CD
pipeline failures are not problems; they are valuable feedback mechanisms that prevent faulty code
from reaching production. Understanding where and why they fail is key to improving a DevOps
process.
Common Failure Points and Root Cause Analysis:
• Stage: Build Failure
• Symptom: The pipeline fails at the "compile" or "package" step.
• Example: A developer uses a new Java 11 language feature, but the Jenkins build server
is still configured with Java 8. The code fails to compile.
• Root Cause Analysis:
1. Check Build Logs: The log will show a clear compilation error.
2. Environment Mismatch: The primary cause is often a discrepancy between the
developer's local environment and the CI server's environment.
3. Dependency Issues: A required library might be missing from the build server or
there's a version conflict.
• Solution: Use containerized builds (e.g., running the build inside a Docker container) to
ensure the environment is consistent everywhere.
• Stage: Unit/Integration Test Failure
• Symptom: The build succeeds, but the automated test suite fails.
• Example: A developer changes a function that calculates tax, but forgets to update the
corresponding unit test. The test now fails because it expects the old result.
• Root Cause Analysis:
1. Check Test Reports: CI tools generate detailed reports showing which specific
tests failed and why (e.g., "AssertionError: expected 25 but was 30").
2. Logic Error: A change in the code broke existing functionality (a regression).
3. Flaky Tests: The test fails intermittently due to external factors like network
timeouts or race conditions. These are dangerous and must be fixed or removed.
• Solution: Enforce a policy that code changes must be accompanied by corresponding
test updates.
• Stage: Deployment Failure
• Symptom: The artifact is built and tested successfully, but the application fails to start in
the staging or production environment.
• Example: A new feature requires a new environment variable for an API key. The code
is deployed, but the environment variable is not set on the server. The application
crashes on startup because it can't find the key.
• Root Cause Analysis:
1. Check Application Logs: The application's own logs on the server will show the
error (e.g., "FATAL: API_KEY not configured").
2. Configuration Drift: The target environment is not configured as expected.
3. Permissions Issues: The deployment user may not have the necessary
permissions to write files or restart services.
• Solution: Manage configuration as code (IaC) and store secrets in a secure vault. The
deployment script should pull the latest configuration automatically.
• Stage: Security Scan Failure
• Symptom: A static (SAST) or dynamic (DAST) security tool integrated into the pipeline
finds a vulnerability.
• Example: A SAST tool like SonarQube detects that a newly added open-source library
has a known critical vulnerability (CVE).
• Root Cause Analysis:
1. Review Security Dashboard: The tool will provide a detailed report on the
vulnerability, its severity, and the affected code.
2. Insecure Dependency: A developer used an outdated or insecure third-party
library.
• Solution: Integrate automated security scanning early in the pipeline ("Shift Left") and
configure it to fail the build if critical vulnerabilities are found.
Stateless Deployments: A stateless application does not save any client data or session information on
the container itself. Each request is treated as an independent transaction. If the container is deleted or
crashes, no data is lost because the persistent data (or "state") is stored in an external backend like a
database or a cache.
• Characteristics:
• No Persistent Data: The container itself stores nothing that needs to be kept between
sessions.
• Identical & Interchangeable: Every container instance is a perfect clone. Any instance
can handle any request.
• Easy to Scale: To handle more traffic, you simply add more identical container replicas.
Load balancers distribute requests among them.
• High Availability: If a container fails, an orchestrator like Kubernetes can instantly
destroy it and create a new one to take its place without any data loss.
• Production Use Case Example: A fleet of Nginx web servers or a REST API backend. The
web servers deliver HTML/CSS files, and the API processes requests by fetching data from an
external PostgreSQL database. You can scale the web servers from 3 to 30 replicas during a
traffic spike without any issue.
Stateful Deployments: A stateful application needs to remember its "state." This means it relies on
persistent storage to save data that must survive container restarts or crashes.
• Characteristics:
• Persistent Data: Requires a stable storage volume that remains attached even if the
container is recreated.
• Unique Identity: Each container replica is not interchangeable. It has a unique, stable
network identity and storage. For example, a primary database replica is different from a
secondary replica.
• Harder to Scale: Scaling is more complex. You can't just add random replicas; you
often need to follow specific procedures for clustering or replication.
• Complex Management: Backups, data consistency, and disaster recovery are major
concerns.
• Production Use Case Example: A MySQL database running in a container. The database files
must be stored on a persistent disk. If the MySQL container crashes, Kubernetes must restart it
and re-attach it to the exact same storage volume to ensure data is not lost.
Comparison Table:
4. Design a continuous testing strategy using multiple testing frameworks. A Continuous Testing
(CT) strategy integrates automated testing into every stage of the CI/CD pipeline to provide rapid
feedback on the business risks associated with a software release. The goal is to "test early, test often,
test everywhere." This strategy is often visualized using the Test Automation Pyramid.
6. Compose a tool pipeline integrating SCM, CI, CT, CD, containerization, and monitoring. This
pipeline represents a complete, automated software delivery lifecycle, integrating best-in-class open-
source and industry-standard tools at each stage.
• Tags the image with the Git commit hash for traceability.
• Pushes the versioned image to a container registry like JFrog Artifactory.
4. Deploy to Staging (CD - Continuous Deployment):
• Tool: Kubernetes with Helm.
• Workflow: After the image is pushed, Jenkins triggers the deployment stage.
• It uses Helm (a package manager for Kubernetes) to deploy the new Docker
image to a Staging Kubernetes Cluster.
• This creates a production-like environment for final testing.
5. Test in Staging (CT - Continuous Testing):
• Tool: Selenium and JMeter.
• Workflow: Once the application is live in staging, Jenkins runs:
• A suite of Selenium end-to-end tests to validate critical user journeys.
• A JMeter performance test to ensure the new changes haven't introduced
performance regressions.
6. Release & Deploy to Production (CD - Continuous Deployment):
• Tool: Kubernetes with a strategy like Blue-Green Deployment.
• Workflow: If all staging tests pass, the pipeline proceeds to production. This may
require a manual approval step.
• Jenkins deploys the same verified Docker image to the Production Kubernetes
Cluster.
• A Blue-Green deployment strategy is used to ensure zero downtime. The new
version ("green") is deployed alongside the old version ("blue"). Once the new
version is verified, traffic is switched over.
7. Operate & Monitor:
• Tools: Prometheus, Grafana, and the ELK Stack.
• Workflow:
• Prometheus continuously scrapes performance metrics from the live application
and Kubernetes cluster.
• Grafana provides dashboards to visualize these metrics, showing CPU, memory,
latency, and error rates.
• Logstash collects logs from all containers, Elasticsearch indexes them, and
Kibana provides a searchable log dashboard for debugging.
• If Prometheus detects an anomaly (e.g., high error rate), it fires an alert via
Alertmanager to a team communication platform like Slack.
This end-to-end pipeline automates the entire process, enabling teams to deliver high-quality software
safely, quickly, and reliably.