What is Splunk?
Splunk is a log analysis and SIEM (Security Information and Event
Management) tool used to Collect machine data (logs) from various sources
(systems, apps, firewalls, etc.), Index, store, and search this data in real-time,
Analyze patterns, detect threats, and generate visual reports.
It used in Cybersecurity To monitor security incidents, Detect unauthorized
access attempts, Analyze system behaviors, Help in incident response &
auditing.
Searching in Splunk?
In Splunk, we use search queries to filter and analyze logs. These queries
help us find specific events, like login failures, usernames, or attacker IPs.
● This counts all events in the linux_secure logs
● It finds IPs that failed to log in, helps detect brute-force attackers.
● Lists invalid or unknown users trying to log in (possible scanning or
attack).
● Shows valid login sessions by users (helps track who logged in and
how often).
What is a Report in Splunk?
A report in Splunk is a saved search that gives you specific insights (like
suspicious IPs or login failures). You can run it anytime, schedule it, or even
send it via email.
We create reports in Splunk to:
● Track patterns (e.g., failed logins, top users, suspicious behavior)
● Detect threats regularly
● Share findings with the security team or manager
● Schedule alerts from saved searches (daily/weekly)
We have no need to un the same search again and get regular updates
automatically.
What is a Dashboard in Splunk & Why Do We Use It?
A Dashboard in Splunk is a visual panel where we can show multiple saved
reports, charts, tables, or stats in one place. It helps us monitor everything
in real-time without running queries again.
Why We Create Dashboards
● Visual Monitoring: See real-time data (e.g., login attempts, errors) in
one screen.
● Quick Decision Making: Dashboards help cybersecurity teams take
quick action.
● Centralized View: Combine multiple reports (e.g., failed logins, user
activity) into one.
● Professional Reporting: Used in companies to show system status to
managers or teams.
Field Extractions:
It means pulling out specific data (like IP, username, email) from raw logs using
patterns. Makes search results more meaningful and structured.
Field Alias:
Giving another name (alias) to a field so different sources can be compared
easily. For normalization, e.g., src_ip and source_ip both treated the same.
Calculated Fields:
Fields created using expressions based on existing fields.
| eval bandwidth = bytes / 1024 / 1024
This command calculates bandwidth in MB by dividing bytes by
1024 twice.
Lookups
Lookups allow you to enrich your Splunk data by matching fields
from a CSV file (like adding descriptions to codes).
Event Types
Event types are saved searches in Splunk that group similar events under a
label.
sourcetype="apache:access" status=404
Save this search as event type: web_404_errors. Now anytime we write:
eventtype=web_404_errors, it will run that saved search automatically.
Tags
Tags help in grouping related fields or values. We can tag src_ip=[Link] with
tag malicious. So later, you can search like: tag=malicious.
Working with Visualizations
I created visualizations to monitor login-related security events such as
Failed login attempts, Login errors. These visualizations help in identifying
possible brute-force attempts, unauthorized access, or system issues.
Visualizations make it easy to understand patterns, spot anomalies, and
take quick decisions. Instead of reading long logs, we can see:
● How many login failures happened
● Which users are involved
● At what time most failures occurred
1. Inline Search
A search that you run directly in the search [Link] used this search to quickly
find data or check something without saving the search.
● index=security sourcetype=login error
2. Saved Search
A search that you save for later use. So we don’t have to type the same
search again and again. It can also run automatically on schedule and send
alerts.
3. Chain Search
A search that uses results from a previous saved search. To build more
advanced reports or dashboards using earlier search results.
● 1st search finds failed logins.
● 2nd search (chain search) takes that result and checks which users
failed most.
Event Annotation in Splunk
Event annotation means highlighting or marking special events on your
visualizations (like timecharts) to make them stand out.
It helps you easily spot important incidents, like attacks, failures, or unusual
activities.
Splunk Architecture
Splunk follows a distributed architecture designed to handle large volumes
of machine data in a scalable and efficient way. It is mainly built on three core
components.
1. Forwarder Data Collection Layer
The Forwarder is installed on the machines or devices where data is
generated, such as servers, firewalls. Two types of forwarders.
● The Universal Forwarder (UF) is lightweight and only sends raw data
without any processing.
● The Heavy Forwarder (HF) is a full Splunk instance that can parse,
filter, and route data before sending it to the Indexer.
2. Indexer Data Processing and Storage Layer
It takes care of processing and storing the data. It first parses the data,
which means it breaks the incoming raw logs into individual events and
extracts important information like timestamps. Then it indexes the data,
which involves compressing it and storing it in a searchable format.
3. Search Head Search and User Interface Layer
The Search Head is the component that users interact with. It provides the
user interface where users can search, analyze, and visualize data. Users
write queries using SPL (Search Processing Language) to get insights from
the indexed data.
How These Components Work Together
In a typical Splunk environment, data flows from Forwarders to Indexers,
and then users query that data using the Search Head. For example, log files
on a Linux server are collected by the Universal Forwarder, sent to the
Indexer where they are parsed and indexed, and finally searched using SPL
via the Search Head interface. This modular approach makes Splunk
powerful, scalable, and suitable for large enterprise environments.
What is a Streaming Command?
Streaming commands process data event-by-event.
They work on each event individually, without needing to see the entire dataset.
index=web_logs | eval status="OK"
eval processes each event one by one and adds a new field status.
What is a Non-Streaming Command?
Non-streaming commands need to see all events together before processing.
They often perform grouping, stats, sorting, or transforming the data.
index=web_logs | stats count by status
stats needs all events together to count them by status.