0% found this document useful (0 votes)

94 views40 pages

Data Analytics: Sources, Types, and Tools

Chapter 1 introduces data analytics, covering the sources, types, and characteristics of data, as well as the need for data analytics and the evolution of analytic tools. It outlines the data analytics lifecycle, including key roles and phases, and contrasts analysis with reporting. Chapter 2 delves into advanced data analysis techniques such as regression modeling, multivariate analysis, Bayesian modeling, support vector machines, time series analysis, rule induction, neural networks, and fuzzy logic.

Uploaded by

coolwtf30

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

94 views40 pages

Data Analytics: Sources, Types, and Tools

Uploaded by

coolwtf30

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Chapter1 : Introduction to Data Analytics

1. Sources and Nature of Data:

Data can be generated from a variety of sources. It can come from internal systems,
external sensors, social media, or even customer interactions. These data sources fall
into several categories:

● Internal Data: Collected by organizations through business processes (e.g.,

sales transactions, employee data, customer feedback).
● External Data: Data sourced from external environments, such as social media,
third-party APIs, or government data.
● Structured Data: Organized data in predefined models, often in rows and
columns (e.g., SQL databases).
● Semi-structured Data: Data that does not have a strict data model but includes
some organizational properties (e.g., JSON, XML).
● Unstructured Data: Data without a predefined format, such as images, text,
videos, etc.

python

Copy code

# Example: Using Python to load JSON data (Semi-structured Data)

import json

data = '{"name": "John", "age": 30, "city": "New York"}'

parsed_data = [Link](data)

print(parsed_data)

2. Classification of Data (Structured, Semi-structured, Unstructured):

● Structured Data: Data stored in databases with a clear schema and data types,
typically found in relational databases.
○ Example: SQL Database
○ Type: Tabular (Rows and Columns)
● Semi-structured Data: Data that has a flexible structure, often represented in
markup languages.
○ Example: JSON, XML
○ Type: Nested, key-value pairs
● Unstructured Data: Data without a specific structure, such as text files, audio,
video, and images.
○ Example: Social Media Posts, Web Logs
○ Type: Raw, freeform content

3. Characteristics of Data:

The characteristics of data include:

● Volume: The amount of data (e.g., terabytes or petabytes of data).

● Variety: The different types and formats of data (e.g., text, image, audio, etc.).
● Velocity: The speed at which data is generated and processed (e.g., real-time
data).
● Veracity: The trustworthiness and accuracy of data.
● Value: The usefulness of data for analytics or decision-making.

4. Introduction to Big Data Platforms:

Big Data refers to data sets that are too large or complex for traditional data processing
methods. Big Data platforms include tools and technologies like:

● Hadoop: An open-source framework that allows distributed storage and

processing of big data.
● Spark: A fast processing engine for big data, often used for real-time analytics.
● NoSQL Databases: Examples like MongoDB or Cassandra are optimized for
unstructured or semi-structured data.

python

Copy code

# Example of reading large data using pandas (Structured Data)

import pandas as pd

# Load a large CSV file into a pandas dataframe

df = pd.read_csv('large_data.csv')

print([Link]())

5. Need for Data Analytics:

Data analytics is needed to:

● Improve decision-making with data-driven insights.

● Identify trends and patterns that help predict future outcomes.
● Enhance business performance by optimizing processes.
● Understand consumer behavior and improve customer experience.

6. Evolution of Analytic Scalability:

Scalability refers to the capability of a system to handle a growing amount of work. As

the volume, velocity, and variety of data have grown, analytic tools and platforms have
evolved. From traditional relational databases to modern cloud-based solutions, these
tools have adapted to efficiently process and analyze big data.

7. Analytic Process and Tools:

The data analytics process typically includes:

● Data Collection: Gathering raw data from various sources.

● Data Cleaning: Preprocessing and cleaning the data for analysis.
● Data Analysis: Applying statistical or machine learning techniques to uncover
patterns.
● Data Visualization: Creating charts and graphs to communicate insights.

Some tools for analytics include:

● Excel: Basic data analysis and visualization.

● Python (Pandas, NumPy): For data manipulation and analysis.
● R: A programming language for statistical computing and visualization.
● Power BI, Tableau: For data visualization.
● Hadoop, Spark: For big data processing.

8. Analysis vs Reporting:

● Analysis involves deep exploration of data, identifying patterns, correlations, and

insights.
● Reporting focuses on presenting summarized information, often for tracking
business KPIs or performance metrics.

python

Copy code

# Example: Analysis and Reporting using Python

import [Link] as plt

# Data for analysis (e.g., sales performance)

sales = [200, 240, 300, 180, 400]

months = ['Jan', 'Feb', 'Mar', 'Apr', 'May']

# Reporting: Line Plot for Visualization

[Link](months, sales)

[Link]('Sales Performance')

[Link]('Months')

[Link]('Sales')

[Link]()

9. Modern Data Analytic Tools:

● Tableau: A powerful data visualization tool used to create interactive

dashboards.
● Power BI: A Microsoft tool for business analytics and visualization.
● Google Analytics: Tracks website traffic and user behavior.
● Jupyter Notebooks: An open-source web application for creating and sharing
documents with code, visualizations, and narrative text.
10. Applications of Data Analytics:

Data analytics can be applied in various fields such as:

● Healthcare: Predictive analytics for patient outcomes, treatment optimization.

● Retail: Analyzing customer behavior and sales trends for targeted marketing.
● Finance: Fraud detection, credit scoring, and risk management.
● Sports: Performance analysis and strategy optimization.

Data Analytics Lifecycle

1. Need for Data Analytics Lifecycle:

The data analytics lifecycle is a structured process for achieving successful analytics
outcomes. It ensures systematic planning, execution, and delivery of insights.

2. Key Roles for Successful Analytic Projects:

● Data Analysts: Responsible for data collection, cleaning, and analysis.

● Data Scientists: Create predictive models and apply machine learning.
● Business Analysts: Ensure alignment of data insights with business needs.
● Data Engineers: Handle data pipelines and infrastructure.
● Project Managers: Oversee the project, ensuring timelines and objectives are
met.

3. Phases of Data Analytics Lifecycle:

The data analytics lifecycle typically consists of the following phases:

● Discovery: Understanding the problem, setting objectives, and identifying the

data required.
● Data Preparation: Cleaning, transforming, and structuring the data.
● Model Planning: Selecting the appropriate analytical methods and tools.
● Model Building: Creating models using algorithms and testing them.
● Communicating Results: Presenting the findings using reports or visualizations.
● Operationalization: Deploying models and integrating them into business
processes for continuous monitoring.

python

Copy code
# Example: Data Preparation (Cleaning Data)

import pandas as pd

# Load dataset

df = pd.read_csv('sales_data.csv')

# Clean missing values by filling them with a mean value

[Link]([Link](), inplace=True)

# Drop duplicate entries

df.drop_duplicates(inplace=True)

print([Link]())

Detailed Explanation of Phases:

1. Discovery:

In this phase, the goal is to identify the business problem, gather initial data, and
determine the project’s scope. Stakeholders define key performance indicators (KPIs)
and objectives.

2. Data Preparation:

● Data Cleaning: Remove or fix missing values, handle outliers, correct

inconsistencies.
● Data Transformation: Convert data into formats suitable for analysis.

Example:
python

Copy code

# Remove rows with missing values

df = [Link]()

3. Model Planning:

Here, analysts and data scientists decide which model to use based on the type of
problem. The model could be:

● Supervised Learning: For labeled data (e.g., regression, classification).

● Unsupervised Learning: For unlabeled data (e.g., clustering, association).

4. Model Building:

Data scientists build predictive models and use algorithms (e.g., linear regression,
decision trees, neural networks). Models are trained and validated with data.

python

Copy code

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LinearRegression

# Prepare the data

X = df[['feature1', 'feature2']]

y = df['target']

# Split data into training and testing

X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2)

# Create and train the model

model = LinearRegression()

[Link](X_train, y_train)

# Predict on test data

predictions = [Link](X_test)

5. Communicating Results:

Once the model is built and validated, it is time to communicate the results to
stakeholders through reports or visualizations. This could involve charts, dashboards, or
presentations.

6. Operationalization:

Deploying the model in a production environment where it continuously makes

predictions and integrates into business processes.

Chapter2 : Data Analysis: Advanced Techniques

In this section, we will explore advanced techniques and methods used for data
analysis. These methods are integral to solving complex real-world problems by
providing deeper insights, making predictions, and classifying data effectively.

1. Regression Modeling
Regression is a statistical method used to model and analyze the relationships
between a dependent variable and one or more independent variables. It helps to
predict the dependent variable based on the values of the independent variables.

Types of Regression:
Simple Linear Regression: Models the relationship between two variables by fitting a
linear equation.
python
Copy code
# Example: Simple Linear Regression in Python
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
import [Link] as plt

# Example Data
data = {'X': [1, 2, 3, 4, 5], 'Y': [1, 2, 1.3, 3.75, 2.25]}
df = [Link](data)

# Linear Regression
X = df[['X']] # Independent variable
Y = df['Y'] # Dependent variable

model = LinearRegression().fit(X, Y)
Y_pred = [Link](X)

# Plot the regression line

[Link](X, Y, color='blue')
[Link](X, Y_pred, color='red')
[Link]()

●
● Multiple Regression: Involves more than one independent variable.
● Polynomial Regression: Used when data shows a nonlinear relationship.

2. Multivariate Analysis
Multivariate analysis involves examining relationships between more than two variables
simultaneously. It is used to understand patterns, correlations, and interactions among
multiple variables.

Techniques:
Principal Component Analysis (PCA): Reduces the dimensionality of the data while
retaining most of the variance. PCA is useful when the dataset has many variables.
python
Copy code
# Example: PCA using Scikit-learn
from [Link] import PCA
from [Link] import StandardScaler

# Example dataset with multiple features

data = [[2.5, 3.5, 1.5], [4.5, 3.0, 2.0], [3.0, 3.5, 3.5], [3.5,
4.5, 2.5]]
df = [Link](data, columns=['Feature1', 'Feature2',
'Feature3'])

# Standardizing the data

scaler = StandardScaler()
scaled_data = scaler.fit_transform(df)

# Apply PCA
pca = PCA(n_components=2)
pca_result = pca.fit_transform(scaled_data)
print(pca_result)

●
● Factor Analysis: Similar to PCA but focuses on modeling latent variables that
explain the observed correlations.
● Cluster Analysis: Identifies groups of similar objects within the data (e.g.,
K-Means).

3. Bayesian Modeling, Inference, and Bayesian Networks

Bayesian modeling is based on Bayes' Theorem, which helps to update the probability
of a hypothesis based on new evidence. This approach allows for incorporating prior
knowledge into the analysis.

Bayesian Inference: A method to update the probability estimate for a hypothesis as

more evidence or information becomes available.
python
Copy code
# Example of Bayesian Inference using PyMC3
import pymc3 as pm
import numpy as np

# Generate data
data = [Link](0, 1, 100)

# Define a simple Bayesian model

with [Link]() as model:
mu = [Link]('mu', mu=0, sd=1)
likelihood = [Link]('likelihood', mu=mu, sd=1,
observed=data)

# Inference
trace = [Link](2000, return_inferencedata=False)

[Link](trace)

●
● Bayesian Networks: A probabilistic graphical model that represents a set of
variables and their conditional dependencies via a directed acyclic graph (DAG).

4. Support Vector Machines (SVM) and Kernel Methods

Support Vector Machines (SVM) are supervised learning models used for classification
and regression. It works by finding the optimal hyperplane that separates different
classes.

● Linear SVM: Used when data is linearly separable.

● Kernel SVM: Used when data is not linearly separable. The kernel trick is used
to map data to a higher dimension.

python
Copy code
# Example: SVM with kernel
from sklearn import datasets
from [Link] import SVC
from sklearn.model_selection import train_test_split
from [Link] import accuracy_score

# Load dataset
iris = datasets.load_iris()
X = [Link]
y = [Link]

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.3)

# Train SVM classifier with RBF kernel

svm = SVC(kernel='rbf')
[Link](X_train, y_train)

# Prediction
y_pred = [Link](X_test)
print(f'Accuracy: {accuracy_score(y_test, y_pred)}')

5. Analysis of Time Series: Linear Systems and Nonlinear Dynamics

Time series analysis involves analyzing data points indexed in time order, often for
forecasting and trend analysis.

Linear Systems Analysis: Involves analyzing time series data assuming linear
relationships, often using autoregressive models like ARIMA (AutoRegressive
Integrated Moving Average).
python
Copy code
# Example: ARIMA model for time series forecasting
from [Link] import ARIMA
import numpy as np

# Sample time series data

data = [Link](0, 1, 100)

# Fit ARIMA model

model = ARIMA(data, order=(1, 0, 0)) # AR(1) model
model_fit = [Link]()
print(model_fit.summary())

●
● Nonlinear Dynamics: When time series data shows complex, non-linear
relationships, methods like chaos theory and fractals are used.

6. Rule Induction

Rule induction is a machine learning method that extracts useful rules from data for
classification or regression tasks. It is often used in decision trees or logic-based
systems.

● Decision Trees: A tree-like model used to make decisions based on features. It

is a key tool for rule induction.

python
Copy code
# Example: Decision Tree Classification
from [Link] import DecisionTreeClassifier
from [Link] import load_iris
from sklearn.model_selection import train_test_split
from [Link] import accuracy_score

# Load dataset
iris = load_iris()
X = [Link]
y = [Link]

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.3)

# Train decision tree classifier

model = DecisionTreeClassifier()
[Link](X_train, y_train)

# Prediction and accuracy

y_pred = [Link](X_test)
print(f'Accuracy: {accuracy_score(y_test, y_pred)}')

7. Neural Networks: Learning and Generalization

Neural networks are a subset of machine learning models inspired by biological neural
networks. They are used to model complex patterns and relationships in data.

● Learning: Neural networks learn by adjusting the weights through

backpropagation.
● Generalization: Neural networks should generalize well to new, unseen data,
and not just memorize the training data.

python
Copy code
# Example: Simple Neural Network using Keras
from [Link] import Sequential
from [Link] import Dense
import numpy as np

# Generate random data for binary classification

X = [Link](100, 5)
y = (X[:, 0] + X[:, 1] > 1).astype(int)
# Build a simple neural network model
model = Sequential()
[Link](Dense(10, input_dim=5, activation='relu'))
[Link](Dense(1, activation='sigmoid'))

# Compile and fit the model

[Link](loss='binary_crossentropy', optimizer='adam',
metrics=['accuracy'])
[Link](X, y, epochs=10, batch_size=10)

8. Competitive Learning

Competitive learning refers to unsupervised learning methods where neurons or units

"compete" to represent input patterns. Self-Organizing Maps (SOM) are a common
method in competitive learning.

9. Fuzzy Logic: Extracting Fuzzy Models from Data

Fuzzy logic is a form of logic that allows partial membership rather than binary
membership (true or false). It is used to handle vague or uncertain data.

● Fuzzy Decision Trees: Decision trees can be constructed with fuzzy logic to
handle uncertainty.

python
Copy code
# Example: Using fuzzy logic in Python (Simple Example)
import numpy as np
import skfuzzy as fuzz
import [Link] as plt

# Generate fuzzy membership functions

x = [Link](0, 11, 1)
low = [Link](x, [0, 0, 5])
medium = [Link](x, [0, 5, 10])
high = [Link](x, [5, 10, 10])

# Plot fuzzy sets

[Link](x, low, label='Low')
[Link](x, medium, label='Medium')
[Link](x, high, label='High')
[Link]()
[Link]()

10. Stochastic Search Methods

Stochastic search methods use randomization to explore solution spaces, often

applied in optimization problems.

● Simulated Annealing: A probabilistic technique for approximating the global

optimum of a given function.
● Genetic Algorithms: Search algorithms based on the principles of natural
selection.

python
Copy code
# Example: Simple Genetic Algorithm in Python (Pseudo-Code)
def genetic_algorithm():
# Define population, fitness function, mutation and
crossover operations
pass

These methods and techniques represent advanced approaches in data analysis that
cater to various domains, such as predictive modeling, classification, optimization, and
time-series forecasting. Each technique has its strengths, and selecting the appropriate
method depends on the data, objectives, and problem at hand.

Chapter3 : Mining Data Streams: Concepts, Techniques, and Applications

Mining data streams is a critical aspect of data analytics where data is continuously
generated, often in large volumes and at high velocities. This type of data needs to be
processed in real-time or near-real-time, making traditional batch-processing techniques
impractical. In this section, we will explore key concepts, stream data models, sampling
techniques, real-time analytics, and case studies.

1. Introduction to Streams Concepts

A data stream refers to a sequence of data that is continuously generated over time.
Data streams are typically large in size, time-sensitive, and cannot be stored in memory
for long periods. In contrast to traditional data analysis, where data can be batched and
processed, stream mining requires methods that can handle incoming data in real-time.

Characteristics of Data Streams:

● Volume: Data arrives continuously, sometimes at a very high rate.

● Velocity: Data must be processed quickly as it arrives.
● Veracity: Data may be noisy and uncertain.
● Variety: The data may come in various formats and types, such as numerical
data, text, or multimedia.

2. Stream Data Model and Architecture

A stream data model refers to how data is structured and handled as it flows in
continuous streams. It typically includes the following components:

● Data Source: The source of the streaming data, which could be sensors, logs,
social media feeds, or IoT devices.
● Stream Processing: The core mechanism responsible for handling, filtering, and
processing data in real-time. This is where algorithms are applied to analyze and
derive insights from the incoming data.
● Windowing: Data in streams is often processed in chunks or "windows," which
define the subset of the stream being considered at any given time.
● Storage: Since it is often impractical to store all incoming data, only recent or
summary information is retained, often using specialized storage systems like
in-memory databases or distributed storage.

The typical architecture for stream processing includes:

1. Data Producers: Devices, sensors, or applications that generate data.

2. Stream Processing Engine: Processes the incoming data, applies algorithms,
and produces insights.
3. Data Storage and Analytics: Stores aggregated or processed data for further
analysis or visualization.

Popular stream processing platforms include:

● Apache Kafka: A distributed event streaming platform.

● Apache Flink: A stream processing framework for real-time analytics.
● Apache Storm: A real-time computation system.
● Google Dataflow: A managed service for stream and batch processing.

3. Stream Computing

Stream computing refers to the process of computing and analyzing data continuously
as it is received. The goal of stream computing is to extract value from incoming data
while avoiding the pitfalls of traditional batch processing. Key techniques include:

● Real-time processing: Processing data immediately as it arrives, without waiting

for it to accumulate.
● Event-driven architecture: The system reacts to incoming events (data),
triggering appropriate computations and responses.

Stream computing is often implemented using frameworks like Apache Flink or Spark
Streaming, which provide APIs for real-time analytics.

4. Sampling Data in a Stream

Sampling data in a stream is essential because it is often impossible or inefficient to

store all incoming data due to its high velocity. Stream sampling involves selecting a
subset of data for processing or analysis while ensuring it is representative of the entire
stream.

Reservoir Sampling is one technique used to maintain a representative sample of data

in streams:

● Reservoir Sampling Algorithm: Given a stream of size N, if you want to sample

k items from the stream, this algorithm ensures that every item has an equal
probability of being selected without the need to store all items.
Example in Python (Reservoir Sampling for stream data):

python
Copy code
import random

def reservoir_sampling(stream, k):

reservoir = []

for i, item in enumerate(stream):

if i < k:
[Link](item)
else:
j = [Link](0, i)
if j < k:
reservoir[j] = item
return reservoir

# Simulating a stream of data

stream_data = range(1, 10001)
sampled_data = reservoir_sampling(stream_data, 100)
print(sampled_data)

5. Filtering Streams

Filtering in data streams is essential for removing irrelevant or noisy data. Bloom filters
are commonly used for this purpose, especially for approximate membership testing.

● Bloom Filter: A probabilistic data structure that allows you to test whether an
element is a member of a set. It has a false positive rate, meaning it may
incorrectly identify an element as being present but never incorrectly report an
element as absent.

python
Copy code
from pybloom_live import BloomFilter
# Create a Bloom Filter
bloom = BloomFilter(capacity=10000, error_rate=0.001)

# Adding elements to the Bloom Filter

[Link]("apple")
[Link]("banana")

# Checking for membership

print("apple in bloom filter:", "apple" in bloom)
print("orange in bloom filter:", "orange" in bloom)

6. Counting Distinct Elements in a Stream

In streams, counting distinct elements (e.g., unique IP addresses or user IDs) is

challenging due to the large volume of data. A common technique used is the
HyperLogLog algorithm, which allows you to estimate the number of distinct elements
in a stream.

● HyperLogLog: A probabilistic algorithm for approximating the cardinality (distinct

count) of a stream.

7. Estimating Moments in a Stream

In statistics, moments describe the shape of a probability distribution. In data streams,

estimating moments (such as the mean, variance, skewness, and kurtosis) can
provide insights into the data distribution.

● First Moment (Mean): Estimate of the average of the data stream.

● Second Moment (Variance): Measure of data variability.
● Higher Moments: Higher-order statistics like skewness (asymmetry) and
kurtosis (tailedness).

Estimating these moments efficiently requires algorithms like Online Algorithms that
calculate these statistics without needing to store all the data.
8. Counting Oneness in a Window

In stream mining, we may want to track the "oneness" (whether an element occurs
once or more) within a given window of data. This is typically achieved using algorithms
that maintain a fixed-size window over the stream and count occurrences.

● Sliding Window: A window of fixed size that moves over the stream, updating
the count as data arrives or exits the window.

9. Decaying Window

A decaying window is a technique where older data in the stream is given less weight,
and the window size adjusts dynamically. This method is useful when newer data is
more relevant than older data.

For example, in stock market predictions or real-time sentiment analysis, the most
recent data has a higher influence.

10. Real-Time Analytics Platform (RTAP) Applications

Real-time analytics platforms are designed to process and analyze data in real-time,
enabling immediate decision-making. They provide tools to handle high-velocity data
streams, with applications such as:

● Real-Time Fraud Detection: Identifying and stopping fraudulent transactions as

they occur.
● Real-Time Marketing: Analyzing customer behavior and offering targeted ads
immediately.
● Traffic Monitoring: Analyzing traffic data from sensors and adjusting traffic lights
in real time.

Popular RTAP Tools:

● Apache Kafka for data ingestion.

● Apache Flink and Apache Spark Streaming for stream processing.
● Elasticsearch for real-time search and analytics.

11. Case Studies

Case Study 1: Real-Time Sentiment Analysis

Real-time sentiment analysis involves processing streaming data (such as social media
posts or customer reviews) and analyzing the sentiment of text. This analysis can help
businesses respond to customer feedback or monitor public opinion.

● Approach: Use Natural Language Processing (NLP) techniques in a real-time

streaming environment. Platforms like Apache Kafka for ingesting tweets,
combined with Apache Flink for processing, and TensorFlow for sentiment
classification.

python
Copy code
# Example: Sentiment Analysis using TextBlob
from textblob import TextBlob

# Sample text stream

stream_data = ["I love this product!", "This is terrible.", "It
works well."]
sentiments = [TextBlob(text).[Link] for text in
stream_data]

print(sentiments)

Case Study 2: Stock Market Predictions

Stock market prediction involves processing financial data (such as stock prices and
market indicators) in real-time to forecast trends or make trading decisions.

● Approach: Use historical stock price data combined with real-time market data.
Machine learning algorithms like LSTM (Long Short-Term Memory) are
commonly used for time series prediction in stock prices.

Conclusion

Mining data streams is a crucial task for processing large volumes of continuously
generated data. Stream data models, sampling techniques, and real-time analytics
platforms enable organizations to process and analyze data efficiently. The applications,
such as real-time sentiment analysis and stock market predictions, demonstrate the
power of stream mining in real-world scenarios, where decisions need to be made in
real-time based on rapidly arriving data.

Chapter4 : Frequent Itemsets and Clustering

1. Frequent Itemsets and Market Basket Modeling

Frequent itemsets are groups of items that appear together frequently in a

transactional database. For example, in a supermarket, bread and butter might
frequently be purchased together.

1.1 Apriori Algorithm

The Apriori Algorithm is used to find frequent itemsets and association rules. It
relies on the property that any subset of a frequent itemset must also be frequent.

Steps:

1. Generate frequent 1-itemsets.

2. Use these to generate frequent 2-itemsets, and so on.
3. Use a minimum support threshold to filter itemsets.

Python Code: Apriori Algorithm

from itertools import combinations

def apriori(transactions, min_support):

itemsets = {} # Dictionary to store itemsets with their support

single_items = set(item for transaction in transactions for item in transaction)

# Generate initial 1-itemsets with their support counts

for item in single_items:

count = sum(1 for transaction in transactions if item in transaction)

if count >= min_support:

itemsets[frozenset([item])] = count

# Iteratively generate k-itemsets

k=2

while True:

candidate_itemsets = [

frozenset(comb) for comb in combinations([Link](*[Link]()), k)

new_itemsets = {}

for candidate in candidate_itemsets:

count = sum(1 for transaction in transactions if

[Link](transaction))

if count >= min_support:

new_itemsets[candidate] = count

if not new_itemsets:

break

[Link](new_itemsets)

k += 1
return itemsets

# Example Transactions

transactions = [

{"milk", "bread", "butter"},

{"milk", "bread"},

{"bread", "butter"},

{"milk", "butter"}

# Minimum Support

min_support = 2

# Run Apriori

frequent_itemsets = apriori(transactions, min_support)

print("Frequent Itemsets:", frequent_itemsets)

1.2 Market Basket Analysis

Market Basket Analysis applies frequent itemsets to find rules like: If a customer
buys X, they are likely to buy Y.

Example:

Rules are generated from frequent itemsets, e.g.,:

● Rule: {Milk} → {Bread}
● Confidence = Support({Milk, Bread}) / Support({Milk})

2. Handling Large Datasets

When dealing with large datasets:

1. In-Memory Computation: Use sparse representations to save memory.

2. Limited Pass Algorithms: Algorithms like SON (Simple Online
Neighborhood) divide datasets into manageable chunks.
3. Stream Counting: Use algorithms like Frequent Itemset Mining in Streams
(e.g., Misra-Gries algorithm).

Stream Processing Example:

from collections import defaultdict

def count_frequent_itemsets_stream(stream, min_support):

item_counts = defaultdict(int)

for transaction in stream:

total_transactions += 1

for item in transaction:

item_counts[item] += 1

return {item: count for item, count in item_counts.items() if count >=

min_support}
# Stream of transactions

stream = [

["apple", "banana", "milk"],

["apple", "milk"],

["banana", "milk"],

["apple", "banana"]

# Minimum support threshold

min_support = 2

# Frequent items in the stream

frequent_items = count_frequent_itemsets_stream(stream, min_support)

print("Frequent Items in Stream:", frequent_items)

3. Clustering Techniques

Clustering is the process of grouping data into clusters where objects within a
cluster are similar.

3.1 K-Means Clustering

K-means is a partition-based clustering method:

1. Initialize k centroids randomly.

2. Assign each point to the nearest centroid.
3. Update centroids based on the mean of the points in each cluster.
4. Repeat until convergence.

Python Code: K-Means

from [Link] import KMeans

import numpy as np

import [Link] as plt

# Example data

data = [Link]([

[1, 2], [2, 3], [3, 4],

[8, 7], [9, 8], [10, 10]

])

# Apply K-Means

kmeans = KMeans(n_clusters=2, random_state=0)

[Link](data)

# Results

print("Cluster Centers:", kmeans.cluster_centers_)

print("Labels:", kmeans.labels_)

# Plot

[Link](data[:, 0], data[:, 1], c=kmeans.labels_, cmap='viridis')

[Link](kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], c='red',
marker='X')

[Link]("K-Means Clustering")

[Link]()

3.2 Hierarchical Clustering

Hierarchical clustering builds a tree of clusters (dendrogram).

Python Code:

from [Link] import dendrogram, linkage

import [Link] as plt

# Data

data = [Link]([

[1, 2], [2, 3], [3, 4],

[8, 7], [9, 8], [10, 10]

])

# Apply Hierarchical Clustering

linked = linkage(data, method='ward')

# Dendrogram

dendrogram(linked, labels=['A', 'B', 'C', 'D', 'E', 'F'])

[Link]("Hierarchical Clustering Dendrogram")

[Link]()

3.3 CLIQUE and ProCLUS

● CLIQUE: Finds dense regions in high-dimensional space by dividing it into

subspaces.
● ProCLUS: A k-medoids-based approach for high-dimensional clustering.

3.4 Clustering Non-Euclidean Data

For non-Euclidean spaces (e.g., graphs):

1. Use similarity measures instead of distance metrics.

2. Apply algorithms like Spectral Clustering.

Spectral Clustering Example:

from [Link] import SpectralClustering

# Example adjacency matrix for a graph

adj_matrix = [Link]([

[1, 1, 0, 0],

[1, 1, 1, 0],

[0, 1, 1, 1],

[0, 0, 1, 1]

])

# Spectral Clustering
spectral = SpectralClustering(n_clusters=2, affinity='precomputed')

labels = spectral.fit_predict(adj_matrix)

print("Cluster Labels:", labels)

3.5 Clustering for Streams

Stream clustering uses incremental approaches like CluStream or StreamKM++

for dynamic data.

Stream Clustering:

# Simulating stream clustering

from [Link] import MiniBatchKMeans

# Simulated streaming data

stream_data = [Link](1000, 2)

# MiniBatch K-Means for stream data

mb_kmeans = MiniBatchKMeans(n_clusters=3, batch_size=100)

mb_kmeans.fit(stream_data)

print("Cluster Centers:", mb_kmeans.cluster_centers_)

4. Parallelism in Clustering
Techniques like MapReduce and distributed frameworks (e.g., Apache Spark)
allow clustering for massive datasets.

Example Using Spark (Pseudocode):

from [Link] import KMeans

from [Link] import SparkSession

spark = [Link]("Clustering").getOrCreate()

# Load Data

data = [Link]("[Link]", header=True, inferSchema=True)

# K-Means in Spark

kmeans = KMeans(k=3)

model = [Link](data)

clusters = [Link](data)

[Link]()

Chapter 5 :
1. Display Space Limitation

When visualizing large datasets, display space can limit what can be shown effectively.
If too much data is visualized at once, it can clutter the interface, making it hard to
interpret.

Example: Scatter Plot with Too Many Points

import [Link] as plt

import numpy as np

# Simulate a large dataset

x = [Link](100000)

y = [Link](100000)

# Plot without handling display space limitation

[Link](figsize=(10, 6))

[Link](x, y, alpha=0.2) # Scatter plot with transparency

[Link]("Scatter Plot with Display Space Limitation")

[Link]("X-Axis")

[Link]("Y-Axis")

[Link]()

Solution: Sampling or Aggregation

You can downsample the data to reduce clutter:

# Downsample the dataset to 5000 points

sample_indices = [Link](len(x), 5000, replace=False)

x_sample = x[sample_indices]

y_sample = y[sample_indices]

# Re-plot after sampling

[Link](figsize=(10, 6))

[Link](x_sample, y_sample, alpha=0.5)

[Link]("Scatter Plot After Sampling")

[Link]("X-Axis")

[Link]("Y-Axis")

[Link]()

2. Rendering Time Limitation

Rendering can be slow for large datasets, particularly in interactive visualizations or 3D

plots.

Example: Heatmap for Large Data

import seaborn as sns

# Simulate a large dataset

data = [Link](1000, 1000)

# Heatmap visualization (might take time to render for large data)

[Link](figsize=(12, 8))
[Link](data, cmap="viridis")

[Link]("Heatmap with Large Data")

[Link]()

Solution: Tiling or Downsampling

Reduce the resolution of the data:

# Downsample the heatmap data

data_downsampled = data[::10, ::10] # Take every 10th row and column

# Plot downsampled heatmap

[Link](figsize=(8, 6))

[Link](data_downsampled, cmap="viridis")

[Link]("Downsampled Heatmap for Faster Rendering")

[Link]()

3. Navigation Links

When visualizing complex data, navigation links (e.g., drill-down, zooming, or tooltips)
improve user interaction.

Example: Interactive Plot with Plotly

import [Link] as px

import pandas as pd

# Create sample data

df = [Link]({

"Category": ["A", "B", "C", "D", "E"],

"Values": [100, 200, 300, 400, 500]

})

# Create an interactive bar chart

fig = [Link](df, x="Category", y="Values", title="Interactive Bar Chart with Navigation")

fig.update_layout(hovermode="x unified")

[Link]()

In this example, users can hover over the bar chart to see values interactively.

4. Human Vision: Space and Time Limitations

The human brain has constraints in processing too much data at once, especially in
large or complex visualizations.

Design Guidelines to Address Space and Time

1. Simplify Visualizations: Remove unnecessary elements to focus attention.

2. Use Gestalt Principles: Group related objects for easier perception.
3. Leverage Color and Size: Use perceptually uniform color gradients and relative
sizes.

5. Exploration of Complex Information Space

Complex datasets may require multidimensional visualizations or hierarchical

exploration techniques.

Example: Parallel Coordinates for Multidimensional Data

from [Link] import load_iris

from [Link] import parallel_coordinates

import pandas as pd

# Load Iris dataset

iris = load_iris()

df = [Link]([Link], columns=iris.feature_names)

df["target"] = iris.target_names[[Link]]

# Parallel Coordinates Plot

[Link](figsize=(12, 6))

parallel_coordinates(df, "target", colormap=[Link].Set2)

[Link]("Parallel Coordinates for Multidimensional Data")

[Link]()

6. Space Perception and Data in Space

Humans interpret data better in spatial contexts, such as maps or spatial scatter plots.

Example: Geographic Data Visualization

import geopandas as gpd

import [Link] as plt

# Load sample geographic dataset

world = gpd.read_file([Link].get_path('naturalearth_lowres'))

# Plot world map with data

[Link](column='pop_est', cmap='coolwarm', legend=True, figsize=(12, 8))

[Link]("World Population (Estimate) by Country")

[Link]()

7. Images, Narrative, and Gestures for Explanation

Combining images and narratives enhances comprehension by linking visuals to

storytelling.

Example: Adding Captions and Annotations

# Create a simple plot

x = [Link](0, 10, 100)

y = [Link](x)

[Link](figsize=(10, 6))

[Link](x, y, label="Sine Wave")

[Link]("Sine Wave with Annotation")

[Link]("X-Axis")

[Link]("Y-Axis")

# Adding annotations

[Link]("Peak", xy=(1.57, 1), xytext=(2, 1.5),

arrowprops=dict(facecolor='black', shrink=0.05))

[Link]()

In this example, annotations and titles provide a narrative context for the visualization.

Summary of Techniques

Issue Solution Technique Example

Display Space Limitation Sampling, Aggregation Scatter Plot, Heatmap

Rendering Time Downsampling, Efficient Heatmap

Algorithms

Navigation and Interactive libraries (e.g., Plotly) Bar Chart

Interaction

Human Vision Simplification, Gestalt Principles Multidimensional Data

Constraints

Complex Data Parallel Coordinates, Maps Iris Dataset, World Map

Exploration

Common questions

Support Vector Machines (SVM) are effective for classification tasks due to their ability to handle high-dimensional spaces and find the optimal hyperplane that separates different classes. Linear SVMs are suitable for linearly separable data, while kernel SVMs use kernel functions to map data into higher dimensions for non-linearly separable data . SVMs are particularly useful in scenarios like text classification and pattern recognition, where they can achieve high accuracy by focusing on margin maximization between classes .

Reservoir sampling maintains a representative sample from a continuous data stream by selecting elements randomly as they arrive, ensuring each element has an equal chance of being included . As new data enters, some of it replaces elements in the reservoir, maintaining a current sample size without needing to store the entire stream. This algorithm is particularly effective for large or infinite datasets where storing all elements is impractical, ensuring a near-accurate representation of the stream over time .

Bayesian inference updates the probability estimate for a hypothesis as new evidence or information becomes available. It uses Bayes' Theorem to calculate the likelihood of a hypothesis based on prior knowledge and the observed data . This method allows for a dynamic approach to probability estimation, accommodating new data and refining predictions over time, which is particularly useful in fields where situations evolve rapidly, such as machine learning and real-time analytics .

Principal Component Analysis (PCA) helps manage high-dimensional datasets by reducing the number of variables while retaining most of the variance present in the data. It identifies patterns by converting correlated features into a set of linearly uncorrelated variables called principal components . This reduction aids in simplifying models and improving computational efficiency, especially when visualizing multi-dimensional datasets .

The Apriori algorithm is significant in market basket analysis as it identifies frequent itemsets in transactional data to derive association rules, which can uncover products commonly purchased together . The algorithm iteratively generates itemsets and uses a minimum support threshold to filter relevant sets, ensuring that only significant associations are considered. By relying on the property that subsets of frequent itemsets must also be frequent, Apriori efficiently narrows down potential item combinations, facilitating targeted marketing strategies and inventory management .

The key phases in the data analytics process include Discovery, Data Preparation, Model Planning, Model Building, Communicating Results, and Operationalization. In the Discovery phase, the business problem is identified and project scope defined by formulating objectives and KPIs . Data Preparation involves cleaning and transforming data into a suitable format for analysis . Model Planning includes selecting analytical techniques such as supervised or unsupervised learning based on the problem type . Model Building entails creating and testing models with algorithms like regression or decision trees . Communicating Results focuses on conveying findings through reports or visualizations to relevant stakeholders . Finally, Operationalization integrates models into business processes for ongoing monitoring and decision-making .

Fuzzy logic handles uncertainty in data by allowing for partial membership in sets rather than binary true/false values, thus accommodating ambiguity in real-world information . Unlike traditional logic systems that treat variables in strict dichotomies, fuzzy logic applies degrees of truth, enabling the modeling of scenarios with both approximate and specific outcomes. This flexibility makes it useful in applications like control systems and decision-making processes where uncertainty is prevalent .

Real-time data analysis faces challenges like handling high-velocity, high-volume data streams and providing timely insights without data loss . Stream processing frameworks such as Apache Kafka, Apache Flink, and Apache Storm address these challenges by offering distributed architectures that process data immediately as it arrives. These systems use techniques like windowing, event-driven architecture, and specialized storage to manage continuous data streams efficiently, supporting applications like real-time fraud detection and sentiment analysis .

HyperLogLog is utilized to efficiently count distinct elements in large data streams by providing a probabilistic estimate of the cardinality using a fixed amount of space, significantly reducing memory usage . HyperLogLog leverages hash functions to process large volumes of data, offering scalable performance with a small and controllable error rate. Its main advantage lies in its ability to process large datasets with high-speed computations, making it suitable for real-time analytics where fast and efficient cardinality estimates are required .

Neural networks improve model generalization through techniques like dropout, which randomly ignores a subset of neurons during training to prevent overfitting . They rely on backpropagation to adjust weights effectively across layers, enhancing the model's ability to generalize to new data . Neural networks can handle complex, non-linear relationships, making them suitable for diverse tasks like image and speech recognition, where capturing and generalizing underlying patterns is crucial .

Vedic Mathematics: 16 Sutras Explained
No ratings yet
Vedic Mathematics: 16 Sutras Explained
18 pages
Overview of Vedic Mathematics Sutras
100% (1)
Overview of Vedic Mathematics Sutras
28 pages
Vedic Maths: 16 Sutras Explained
100% (1)
Vedic Maths: 16 Sutras Explained
13 pages
Vedic Mathematics for Competitive Exams
100% (1)
Vedic Mathematics for Competitive Exams
4 pages
Anurupyena Sutra for Cubing 23
No ratings yet
Anurupyena Sutra for Cubing 23
70 pages
Vedic Mathematics Study Material Guide
No ratings yet
Vedic Mathematics Study Material Guide
54 pages
Data Grouping and Sampling Techniques
No ratings yet
Data Grouping and Sampling Techniques
22 pages
Vedic Mathematics: Concepts and Contributions
100% (1)
Vedic Mathematics: Concepts and Contributions
30 pages
Ancient Indian Mathematics Insights
No ratings yet
Ancient Indian Mathematics Insights
44 pages
Vedic Maths Techniques for Beginners
100% (1)
Vedic Maths Techniques for Beginners
2 pages
Divisibility Rules for Numbers Explained
No ratings yet
Divisibility Rules for Numbers Explained
17 pages
Vedic Mathematics: History and Syllabus
No ratings yet
Vedic Mathematics: History and Syllabus
15 pages
Essential Data Cleaning Techniques
No ratings yet
Essential Data Cleaning Techniques
11 pages
Vedic Mathematics Workshop Overview
No ratings yet
Vedic Mathematics Workshop Overview
29 pages
History and Features of Vedic Mathematics
No ratings yet
History and Features of Vedic Mathematics
4 pages
Data Analytics: Types and Importance
No ratings yet
Data Analytics: Types and Importance
21 pages
Business Forecasting Techniques Explained
No ratings yet
Business Forecasting Techniques Explained
18 pages
Vedic Maths Techniques and Methods
No ratings yet
Vedic Maths Techniques and Methods
8 pages
Vedic Maths Techniques and Applications
No ratings yet
Vedic Maths Techniques and Applications
25 pages
IKS Chapter - 1 VEDIC-MATHEMATICS Introduction
No ratings yet
IKS Chapter - 1 VEDIC-MATHEMATICS Introduction
5 pages
Nikhil MOOC Report
No ratings yet
Nikhil MOOC Report
16 pages
Vedic Maths Multiplication Techniques
100% (1)
Vedic Maths Multiplication Techniques
16 pages
Brahmagupta's Mathematical Legacy
No ratings yet
Brahmagupta's Mathematical Legacy
10 pages
Indian Mathematicians and Their Legacy
No ratings yet
Indian Mathematicians and Their Legacy
10 pages
Vedic Mathematics: Vedic Mathematics Is A Book Written by The Indian Monk Bharati
No ratings yet
Vedic Mathematics: Vedic Mathematics Is A Book Written by The Indian Monk Bharati
7 pages
Celebrating Ramanujan: Math's Impact
No ratings yet
Celebrating Ramanujan: Math's Impact
12 pages
Mathematical Foundations for Computer Science
No ratings yet
Mathematical Foundations for Computer Science
2 pages
Benefits of Vedic Mathematics
100% (1)
Benefits of Vedic Mathematics
2 pages
Applications of Calana-Kalanabhyam
No ratings yet
Applications of Calana-Kalanabhyam
1 page
Vedic Mathematics Course Overview
No ratings yet
Vedic Mathematics Course Overview
108 pages
Shakuntala Devi Puzzle Solutions Guide
No ratings yet
Shakuntala Devi Puzzle Solutions Guide
40 pages
Vedic Mathematics Course Overview
100% (1)
Vedic Mathematics Course Overview
5 pages
Indian Knowledge System in Computing
No ratings yet
Indian Knowledge System in Computing
15 pages
Vedic Mathematics: Mental Calculation Techniques
No ratings yet
Vedic Mathematics: Mental Calculation Techniques
4 pages
Vedic Maths Techniques Simplified
No ratings yet
Vedic Maths Techniques Simplified
22 pages
Vedic Maths Square Root Practice
No ratings yet
Vedic Maths Square Root Practice
15 pages
Pingala's Chandaùçästra Insights
No ratings yet
Pingala's Chandaùçästra Insights
29 pages
Discrete Mathematics for Robotic Arms
100% (1)
Discrete Mathematics for Robotic Arms
4 pages
Probability and Statistics Overview
No ratings yet
Probability and Statistics Overview
10 pages
Vedic Maths: Speedy Calculation Techniques
100% (2)
Vedic Maths: Speedy Calculation Techniques
13 pages
Vedic Mathematics Simplified Techniques
No ratings yet
Vedic Mathematics Simplified Techniques
3 pages
NICL AO Descriptive Paper 2025 Guide
No ratings yet
NICL AO Descriptive Paper 2025 Guide
4 pages
Design and Implementation of High Speed 32 Bit Vedic Arithmetic Unit On FPGA
100% (2)
Design and Implementation of High Speed 32 Bit Vedic Arithmetic Unit On FPGA
25 pages
Vedic Mathematics Course Overview
No ratings yet
Vedic Mathematics Course Overview
3 pages
Vedic Mathematics in FPGA Design
No ratings yet
Vedic Mathematics in FPGA Design
3 pages
Indian Mathematics: Past, Present, Future
No ratings yet
Indian Mathematics: Past, Present, Future
5 pages
Data Analytics Terms and Definitions
No ratings yet
Data Analytics Terms and Definitions
23 pages
Introduction to Scientific Computing
No ratings yet
Introduction to Scientific Computing
8 pages
Introduction to Data Science Concepts
No ratings yet
Introduction to Data Science Concepts
12 pages
Vedic Maths: Rules and Examples
100% (1)
Vedic Maths: Rules and Examples
9 pages
Data Warehousing and Mining Overview
No ratings yet
Data Warehousing and Mining Overview
142 pages
Social Media Graphs and Algorithms
No ratings yet
Social Media Graphs and Algorithms
28 pages
Introduction to Algorithms Basics
No ratings yet
Introduction to Algorithms Basics
47 pages
Vedic Maths: Beejank Method Explained
No ratings yet
Vedic Maths: Beejank Method Explained
30 pages
Essential Math for AI and ML
No ratings yet
Essential Math for AI and ML
12 pages
Vedic Mathematics: Simplifying Calculations
100% (1)
Vedic Mathematics: Simplifying Calculations
10 pages
CST 284: Mathematics for ML Syllabus
No ratings yet
CST 284: Mathematics for ML Syllabus
22 pages
Data Analytics Notes
No ratings yet
Data Analytics Notes
26 pages
Introduction to Data Analytics Overview
No ratings yet
Introduction to Data Analytics Overview
29 pages
Introduction to Data Analytics Concepts
100% (2)
Introduction to Data Analytics Concepts
33 pages
E Business Question Paper - II
No ratings yet
E Business Question Paper - II
2 pages
JSX Basics: ClassName, Conditionals, and Events
No ratings yet
JSX Basics: ClassName, Conditionals, and Events
6 pages
DNP3 Data Objects and Function Codes
No ratings yet
DNP3 Data Objects and Function Codes
6 pages
Network and Transport Layer Protocols Guide
No ratings yet
Network and Transport Layer Protocols Guide
9 pages
Gramin Dak Sevak Application Form
No ratings yet
Gramin Dak Sevak Application Form
3 pages
APSMO Division J Sample Olympiad 1
0% (1)
APSMO Division J Sample Olympiad 1
2 pages
Kongsberg Remote Towers Overview
No ratings yet
Kongsberg Remote Towers Overview
12 pages
Digital Transformation at Bien Dong POC
No ratings yet
Digital Transformation at Bien Dong POC
15 pages
Neural Implicit Representations For 3D Synthetic Aperture Radar Imaging
No ratings yet
Neural Implicit Representations For 3D Synthetic Aperture Radar Imaging
12 pages
Linear Programming Problem Solutions
No ratings yet
Linear Programming Problem Solutions
147 pages
Understanding JavaScript Basics and Features
No ratings yet
Understanding JavaScript Basics and Features
13 pages
Free Fire Antiban Panel Configurations
No ratings yet
Free Fire Antiban Panel Configurations
13 pages
Row vs Column Store in SAP HANA
No ratings yet
Row vs Column Store in SAP HANA
20 pages
Uninformed & Informed Search Algorithms
No ratings yet
Uninformed & Informed Search Algorithms
30 pages
OOP Concepts: Car and Bank Account
No ratings yet
OOP Concepts: Car and Bank Account
8 pages
Redesigning Insurance Claims Portal
No ratings yet
Redesigning Insurance Claims Portal
21 pages
Basic PHP Tutorial for Beginners
No ratings yet
Basic PHP Tutorial for Beginners
24 pages
PSCAD Basics for Mac Users
100% (1)
PSCAD Basics for Mac Users
22 pages
Gr11 CAT Network Technology Study Guide
No ratings yet
Gr11 CAT Network Technology Study Guide
7 pages
UART Implementation on FPGA
100% (1)
UART Implementation on FPGA
31 pages
Network Event Severity Analysis
No ratings yet
Network Event Severity Analysis
3 pages
Understanding JVC's K2 Technology
No ratings yet
Understanding JVC's K2 Technology
9 pages
Apache Flume 1.2.0 User Guide
No ratings yet
Apache Flume 1.2.0 User Guide
32 pages
Scripting vs. Programming Languages Explained
No ratings yet
Scripting vs. Programming Languages Explained
10 pages
ACCLOAD User Formula Template
No ratings yet
ACCLOAD User Formula Template
75 pages
AOC L42W665 LCD TV Service Manual
No ratings yet
AOC L42W665 LCD TV Service Manual
89 pages
Smart Assist 2FA Login Guide
No ratings yet
Smart Assist 2FA Login Guide
7 pages
History and Function of Hard Disk Drives
No ratings yet
History and Function of Hard Disk Drives
6 pages
Maximize Tulip Bouquet Beautiness
No ratings yet
Maximize Tulip Bouquet Beautiness
2 pages
FTTH_MarketPanorama_ByCountry_2024
No ratings yet
FTTH_MarketPanorama_ByCountry_2024
42 pages