0% found this document useful (0 votes)
15 views11 pages

Ds Exp 7

The document details a coding assignment for a B.Tech student named Rahul A from PSG Institute of Technology, focusing on performing statistical analysis using Z-tests on medication effectiveness and teaching methods. It includes problem statements, coding solutions, and data in CSV format for analysis. The student successfully completed the assignment, achieving full marks.

Uploaded by

23d101
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views11 pages

Ds Exp 7

The document details a coding assignment for a B.Tech student named Rahul A from PSG Institute of Technology, focusing on performing statistical analysis using Z-tests on medication effectiveness and teaching methods. It includes problem statements, coding solutions, and data in CSV format for analysis. The student successfully completed the assignment, achieving full marks.

Uploaded by

23d101
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

9

39

9
PSG Institute of Technology and Applied Research
03

03

03
0
3

43

43

43
24

32

2
23

23

23
2
Name: Rahul A Scan to verify results
55

55

55

55
71

71

71

71
Email: 23d101@[Link]
Roll no: 715523243039
Phone: 9677304434
Branch: PSG iTech
Department: AI&DS
Batch: 2027
Degree: [Link] AI&DS
39

39

39
3
30

30

30

30
24

24

24

24
2027_IV_AI&DS_Data Science Analytics Lab
23

23

23
52
55

55

55
5
71

71

71

71
NeoColab_PSGiTech_2027_Data Science Analytics_Week 7_COD

Attempt : 1
Total Mark : 40
Marks Obtained : 40

Section 1 : Coding

1. Problem Statement
9

39

9
03

03

03
0
43

43

43

43
Dr. Sophia, a medical researcher, is conducting a study to compare the
32

32

32

2
23
2

effectiveness of two medications, A and B, in reducing patient symptoms.


55

55

55

55
Given a CSV file containing the medication type and the corresponding
71

71

71

71

reduction in symptoms, write a program to perform a two-sample Z-test at


a 1% significance level. The program should compute the mean reduction
for each medication, Z-score, and critical value, then determine whether
there is a statistically significant difference in the effectiveness of the two
medications.

Answer
[Link]
39

39

9
03

03
30

30

import pandas as pd
24

24

24

24
23

23

23

23

import [Link] as stats


55

55

55

55
71

71

71

71
9

39

9
from [Link] import norm
03

03

03
0
3

43

43

43
import numpy as np
24

32

2
import math as m
23

23

23
2
55

55

55

55
import os
71

71

71

71
import sys

# Prompt the user to enter the filename


file_name = input()

# Get the current directory of the script


current_directory = [Link][0]

# Construct the full path to the CSV file


39

39

39
file_path = [Link](current_directory, file_name)
3
30

30

30

30
24

24

24

24
# Read CSV file into a DataFrame
23

23

23
52
55

55

55
df = pd.read_csv(file_path)

5
71

71

71

71
# Separate Medication A and B
reduction_a=df[df['Medication']=='A']['Reduction']
reduction_b=df[df['Medication']=='B']['Reduction']

# Given data
a=reduction_a.mean()
b=reduction_b.mean()
9

39

9
# Step 2: Calculate the Z-score
03

03

03
0
43

43

43

43
z_sc_n=a-b
32

32

32

2
z_sc_d=[Link]((reduction_a.std(ddof=0)**2/
23
2

2
55

55

55

55
len(reduction_a))+(reduction_b.std(ddof=0)**2/len(reduction_b)))
71

71

71

71

z=z_sc_n/z_sc_d

# Print results
alpha=0.01
c=[Link](1-alpha/2)
print(f"Mean Reduction A: {a}")
print(f"Mean Reduction B:{b}")
print(f"Z-score: {z}")
print(f"Critical Value: ±{c}")
39

39

9
03

03
30

30

3
24

24

24

24

if abs(z)>c:
23

23

23

23
55

55

55

55
71

71

71

71
9

39

9
print("Conclusion: There is a significant difference between the effectiveness
03

03

03
0
3

43

43

43
of the two medications.")
24

32

2
else:
23

23

23
2
55

55

55

55
print("Conclusion: There is no significant difference between the effectiveness
71

71

71

71
of the two medications.")

# Compare Z-score with critical value


[Link]
Medication,Reduction
A,15.2
39

39

39
A,14.8
3
30

30

30

30
A,16.0
24

24

24

24
A,15.1
23

23

23
52
55

55

55
A,14.5

5
71

71

71

71
A,15.7
A,14.3
A,15.4
A,16.3
A,15.9
A,15.0
A,14.7
A,14.9
A,16.1
9

39

9
A,15.8
03

03

03
0
43

43

43

43
A,15.3
32

32

32

2
A,15.5
23
2

2
55

55

55

55
A,14.6
71

71

71

71

A,14.4
A,15.6
B,12.5
B,13.2
B,12.7
B,13.5
B,12.9
B,13.8
B,12.6
39

39

B,14.0
03

03
30

30

B,13.1
24

24

24

24

B,12.8
23

23

23

23
55

55

55

55
71

71

71

71
9

39

9
B,13.2
03

03

03
0
3

43

43

43
B,12.5
24

32

2
B,13.6
23

23

23
2
55

55

55

55
B,12.8
71

71

71

71
B,13.3
B,12.4
B,13.5
B,13.1
B,12.9
B,13.0
B,12.7
B,13.4
B,13.2
39

39

39
B,12.6
3
30

30

30

30
B,13.7
24

24

24

24
B,12.9
23

23

23
52
55

55

55
B,13.5

5
71

71

71

71
B,12.8
B,13.1
B,12.7
B,13.3
B,13.0
B,12.9
B,12.6
B,13.8
B,12.5
9

39

9
B,13.6
03

03

03
0
43

43

43

43
B,12.4
32

32

32

2
B,13.4
23
2

2
55

55

55

55
B,12.9
71

71

71

71

B,13.2
B,12.8

Status : Correct Marks : 10/10

2. Problem Statement

Emma, a university researcher, wants to analyze the effectiveness of online


39

39

9
03

03

and offline teaching methods by comparing their mean scores. Given a


30

30

3
24

24

24

24

CSV file containing sample size, mean, and standard deviation for both
23

23

23

23

teaching methods, write a program to perform a two-sample Z-test


55

55

55

55
71

71

71

71
9

39

9
assuming equal variance. The program should compute the Z-score,
03

03

03
0
3

43

43

43
critical Z-score, and P-value, then determine whether there is a statistically
24

32

2
23

23

23
significant difference between the two teaching methods.

2
55

55

55

55
71

71

71

71
Answer
[Link]
import numpy as np
import [Link] as stats
import pandas as pd
import os
import sys
39

39

39
3
# Prompt the user to enter the filename
30

30

30

30
24

24

24

24
file_name = input()
23

23

23
52
55

55

55
5
# Get the current directory of the script
71

71

71

71
current_directory = [Link][0]

# Construct the full path to the CSV file


file_path = [Link](current_directory, file_name)

# Read CSV file into a DataFrame


df = pd.read_csv(file_path)

# Assuming the first row is Offline and the second row is Online
9

39

9
03

03

03
f=[Link][0,:].values
0
43

43

43

43
l=[Link][1,:].values
32

32

32

2
23
2

2
55

55

55

55
# Convert values to integers/floats
71

71

71

71

z_n=l[1]-f[1]
z_d=((f[2]**2/f[0])+(l[2]**2/l[0]))**0.5
z=z_n/z_d
from [Link] import norm
p = 2 * (1 - [Link](abs(z)))
cz=1.959963984540054
print(f"Z-Score: {z}")
print(f"Critical Z-Score: {1.959963984540054}")
# Null Hypothesis = mu_1 - mu_2 = 0
39

39

9
03

03

if abs(z)>cz:
30

30

3
24

24

24

24

print("Reject the null hypothesis.\nThere is a significant difference between


23

23

23

23

the online and offline classes.")


55

55

55

55
71

71

71

71
9

39

9
# Calculate the test statistic (z-score)
03

03

03
0
3

43

43

43
print(f"P-Value: {p}")
24

32

2
print("Reject the null hypothesis.\nThere is a significant difference between
23

23

23
2
55

55

55

55
the online and offline classes.")
71

71

71

71
# Calculate the critical value
else:
print("Fail to reject the null hypothesis.\nThere is no significant difference
between the online and offline classes.")
print(f"P-Value: {p}")
print("Fail to reject the null hypothesis.\nThere is no significant difference
between the online and offline classes.")
39

39

39
# Print Z-score and critical Z-score
3
30

30

30

30
24

24

24

24
# Compare the test statistic with the critical value
23

23

23
52
55

55

55
5
71

71

71

71
# Approach 2: Using P-value
[Link]
Sample_Size,Mean,Std_Dev
50,75,10
60,80,12
[Link]
Sample_Size,Mean,Std_Dev
9

39

9
10,65,8
03

03

03
0
43

43

43

43
20,70,9
32

32

32

2
23
2

[Link]
55

55

55

55
71

71

71

71

Sample_Size,Mean,Std_Dev
20,85,11
30,88,13
[Link]
Sample_Size,Mean,Std_Dev
30,95,15
40,100,17
[Link]
39

39

9
03

03
30

30

Sample_Size,Mean,Std_Dev
24

24

24

24

40,55,7
23

23

23

23
55

55

55

55
71

71

71

71
9

39

9
50,60,8
03

03

03
0
3

43

43

43
24
[Link]

32

2
23

23

23
2
55

55

55

55
Sample_Size,Mean,Std_Dev
71

71

71

71
60,78,10
70,82,11
[Link]
Sample_Size,Mean,Std_Dev
80,90,12
90,95,13
[Link]
39

39

39
Sample_Size,Mean,Std_Dev
3
30

30

30

30
100,85,14
24

24

24

24
110,88,15
23

23

23
52
55

55

55
5
71

71

71

71
Status : Correct Marks : 10/10

3. Problem Statement

Olivia, a quality control analyst at a tech company, is testing whether the


company’s claim that their device battery lasts an average of 12 hours is
statistically valid. Given a CSV file containing battery life measurements
from sample devices and a known population standard deviation of 0.5
9

39

9
03

03

03
hours, write a program to conduct a one-sample Z-test. The program
0
43

43

43

43
should compute the sample mean, Z-score, and critical Z-value for a two-
32

32

32

2
23
2

tailed test at a 5% significance level, then determine whether there is


55

55

55

55
71

71

71

71

sufficient evidence to refute the company’s claim.

Answer
[Link]
import pandas as pd
import [Link] as stats
import numpy as np
import os
39

39

import sys
03

03
30

30

3
24

24

24

24

# Prompt the user to enter the filename


23

23

23

23
55

55

55

55
71

71

71

71
9

39

9
file_name = input()
03

03

03
0
3

43

43

43
24

32

2
# Get the current directory of the script
23

23

23
2
55

55

55

55
current_directory = [Link][0]
71

71

71

71
# Construct the full path to the CSV file
file_path = [Link](current_directory, file_name)

# Read CSV file into a DataFrame


df = pd.read_csv(file_path)

# Given company claim and known population standard deviation


btl=df['Battery Life'].mean()
39

39

39
print(f'Sample Mean: {btl}')
3
30

30

30

30
# Compute sample statistics from the dataset
24

24

24

24
n=len(df['Battery Life'])
23

23

23
52
55

55

55
print(f'Sample Size: {n}')

5
71

71

71

71
# Calculate the Z-score
s=df['Battery Life'].std()
z=(btl-12)/(0.5/(n**0.5))
print(f"Z-score: {z}")
c=1.959963984540054
print(f"Critical Value: ±{c}")
# Print results
if z<0:
if z<-1*c:
9

39

9
print("Conclusion: There is sufficient evidence to refute the company's claim
03

03

03
0
43

43

43

43
about battery life.")
32

32

32

2
else:
23
2

2
55

55

55

55
print("Conclusion: There is no sufficient evidence to refute the company's
71

71

71

71

claim about battery life.")


else:
if z>c:
print("Conclusion: There is sufficient evidence to refute the company's claim
about battery life.")
else:
print("Conclusion: There is no sufficient evidence to refute the company's
claim about battery life.")
39

39

9
03

03
30

30

# Compare Z-score with critical value to determine significance


24

24

24

24
23

23

23

23

[Link]
55

55

55

55
71

71

71

71
9

39

9
4. Problem Statement
03

03

03
0
3

43

43

43
24

32

2
23

23

23
Michael, a statistics professor, wants to determine whether his class's

2
55

55

55

55
average test score significantly differs from the department's historical
71

71

71

71
average of 75. Given a CSV file containing student IDs and their scores,
write a program to compute the sample mean, standard deviation, Z-score,
and P-value. Based on the significance level of 0.05, determine whether
there is a statistically significant difference between the class average and
the department’s historical average.

Answer
[Link]
39

39

39
3
30

30

30

30
import pandas as pd
24

24

24

24
import [Link] as stats
23

23

23
52
from [Link] import norm
55

55

55
5
71

71

71

71
import numpy as np
import os
import sys

# Prompt the user to enter the filename


file_name = input()

# Get the current directory of the script


current_directory = [Link][0]
9

39

9
03

03

03
0

# Construct the full path to the CSV file


43

43

43

43
32

32

32

2
file_path = [Link](current_directory, file_name)
23
2

2
55

55

55

55
71

71

71

71

# Read CSV file into a DataFrame


df = pd.read_csv(file_path)
feat=df['Score']
m=[Link]()
sstd=[Link]()
z=(m-75)/(sstd/(len(feat)**0.5))
p = 2 * ([Link](abs(z)))
alpha = 0.05
c = [Link](1 - alpha / 2)
39

39

# Given historical department mean


03

03
30

30

print(f"Sample Mean: {m}")


3

3
24

24

24

24

print(f"Sample Standard Deviation: {sstd}")


23

23

23

23
55

55

55

55
71

71

71

71
9

39

9
print(f"Z-score: {z}")
03

03

03
0
3

43

43

43
print(f"P-value: {p}")
24

32

2
if abs(z)>c:
23

23

23
2
55

55

55

55
print("Conclusion: There is a significant difference between the class average
71

71

71

71
and the department's historical average.")
else:
print("Conclusion: There is no significant difference between the class average
and the department's historical average.")

# Perform Z-test (assuming population std is unknown, we use t-test as an


approximation)
# Print results
# Significance Level (alpha)
39

39

39
3

[Link]
30

30

30

30
24

24

24

24
Student_ID,Score
23

23

23
52
55

55

55
1,80

5
71

71

71

71
2,70
3,75
4,78
5,74
[Link]
Student_ID,Score
19,81
20,79
9

39

9
21,74
03

03

03
0
43

43

43

43
22,75
32

32

32

2
23,76
23
2

2
55

55

55

55
24,77
71

71

71

71

25,78
26,73
27,72
28,80
29,81
30,79
31,74
32,75
33,76
39

39

34,77
03

03
30

30

35,78
24

24

24

24

36,73
23

23

23

23
55

55

55

55
71

71

71

71
71 71 71 71
55 55 55 55
23 2 32 23 23
24 43 24 24
30 30 3

99,81
98,80
97,72
96,73
95,78
94,77
93,76
92,75
91,74
90,79
89,81
88,80
87,72
86,73
85,78
84,77
83,76
82,75
81,74
80,79
79,81
78,80
77,72
76,73
75,78
74,77
73,76
03 03

100,79
39 9 39 9

Status : Correct
71 71 71 71
55 55 55 55
23 2 32 23 2 32
24 43 24 43
30 0 30 0
39 39 3 9 39

71 71 71 71
55 55 5 55
23 2 32
52
3 23
24 24 2
303
43
03 30 43
03
9 9 39 9

71 71 71 71
55 55 55 55
23 23
Marks : 10/10 23 23
24 2 24 2
3 03
43
03 30 43
03
9 9 39 9

You might also like