Faculty of : FCE Program: [Link] Class/Section: Sem V, Sec.
Date:
A,B,C(AIDS)
Name of Faculty: Seema Kaloria Name of Course: R Programming Code: BADCCE5104
Flat Table Objects in R
A flat table in R is most commonly represented as a data frame. It stores data in rows and columns, and is
used to analyze and manipulate tabular data.
Example of Flat Table Object:
# Create a flat table (data frame)
flat_table <- [Link](
id = 1:5,
name = c("Alice", "Bob", "Charlie", "David", "Eva"),
age = c(25, 30, 35, 40, 45),
gender = c("F", "M", "M", "M", "F")
)
print(flat_table)
Output:
id name age gender
1 1 Alice 25 F
2 2 Bob 30 M
3 3 Charlie 35 M
4 4 David 40 M
5 5 Eva 45 F
2. Cross Tables (Contingency Tables)
A cross table (also known as a contingency table) is used to summarize categorical data and show the
relationship between two or more categorical variables. In R, the table() function is often used to create these
tables.
Example: Cross Tabulation of Gender and Age Group
# Adding age groups to the flat table
flat_table$age_group <- cut(flat_table$age, breaks=c(20, 30, 40, 50), labels=c("20-30", "30-40", "40-50"))
# Cross table (contingency table) for gender and age group
cross_table <- table(flat_table$gender, flat_table$age_group)
print(cross_table)
Session 2024-25
Output:
20-30 30-40 40-50
F 1 0 1
M 1 2 0
This cross table shows how many males and females fall into different age groups.
3. Testing Cross-Tabulation
You can test if there’s a significant relationship between two categorical variables in a cross table using the
Chi-squared test.
Example: Chi-squared Test on Cross Table
# Perform Chi-squared test on the cross table
chisq_test <- [Link](cross_table)
print(chisq_test)
Output:
Pearson's Chi-squared test
data: cross_table
X-squared = 1.6667, df = 2, p-value = 0.4346
This will give you the p-value and test statistic, allowing you to assess if the variables are independent or
associated.
4. Recreating Original Data from a Contingency Table
To recreate the original data from a contingency table, you need to expand it back into its raw format,
typically using functions like rep() in combination with [Link]().
Example:
# Example contingency table (for simplicity)
contingency_table <- matrix(c(2, 3, 1, 4), nrow=2, dimnames=list(Gender=c("F", "M"), AgeGroup=c("20-
30", "30-40")))
# Recreate original data from the contingency table
recreated_data <- [Link]([Link](Gender = c("F", "M"), AgeGroup = c("20-30", "30-40")))
recreated_data$Count <- c(contingency_table)
recreated_data <- recreated_data[rep(seq_len(nrow(recreated_data)), recreated_data$Count), -3]
print(recreated_data)
Output:
Gender AgeGroup
1 F 20-30
2 F 20-30
3 F 30-40
4 F 30-40
5 F 30-40
6 M 20-30
7 M 30-40
Session 2024-25
8 M 30-40
9 M 30-40
10 M 30-40
This shows the expanded dataset where each row represents an observation from the contingency table.
5. More Advanced Testing and Operations
For more advanced operations, you can use the dplyr package for group summaries and reshape2 or tidyr for
reshaping data. For example:
Using dplyr for summarization:
library(dplyr)
flat_table %>% group_by(gender, age_group) %>% summarise(count = n())
Output:
# A tibble: 3 × 3
# Groups: gender [2]
gender age_group count
<chr> <fct> <int>
1F 20-30 1
2F 40-50 1
3M 20-30 1
4M 30-40 2
Reshaping Data with tidyr:
library(tidyr)
pivot_table <- pivot_wider(flat_table, names_from = age_group, values_from = age)
# A tibble: 5 × 5
id name gender `20-30` `30-40` `40-50`
<int> <chr> <chr> <dbl> <dbl> <dbl>
1 1 Alice F 25 NA NA
2 2 Bob M 30 NA NA
3 3 Charlie M NA 35 NA
4 4 David M NA 40 NA
5 5 Eva F NA NA 45
Session 2024-25