0% found this document useful (0 votes)
1K views3 pages

Neural Network Functions and Techniques

This document is an assignment consisting of 10 questions related to large language models and machine learning concepts. Each question includes a statement or query with multiple-choice answers, along with the correct answer and a brief explanation. Topics covered include the Perceptron learning algorithm, backpropagation, activation functions, regularization techniques, and the purpose of hidden layers in multi-layer perceptrons.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1K views3 pages

Neural Network Functions and Techniques

This document is an assignment consisting of 10 questions related to large language models and machine learning concepts. Each question includes a statement or query with multiple-choice answers, along with the correct answer and a brief explanation. Topics covered include the Perceptron learning algorithm, backpropagation, activation functions, regularization techniques, and the purpose of hidden layers in multi-layer perceptrons.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Introduction to Large Language Models

Assignment- 3

Number of questions: 10 Total mark: 10 X 1 = 10


_________________________________________________________________________

Question 1:

State whether the following statement is True/False.


The Perceptron learning algorithm can solve problems with non-linearly separable data.
a. True
b. False
Correct Answer: b
Solution: The Perceptron algorithm can only handle linearly separable problems.

_________________________________________________________________________

QUESTION 2:

In backpropagation, which method is used to compute the gradients?


a. Gradient descent
b. Chain rule of derivatives
c. Matrix factorization
d. Linear regression

Correct Answer: b
Solution: Backpropagation uses the chain rule of derivatives to calculate the gradients layer
by layer.

_________________________________________________________________________

QUESTION 3:

Which activation function outputs values in the range [−1,1]?

a. ReLU
b. Tanh
c. Sigmoid

d. Linear

Correct Answer: b
Solution: The tanh function maps input values to the range [−1,1].

_________________________________________________________________
QUESTION 4:

What is the primary goal of regularization in machine learning?


a. To improve the computational efficiency of the model
b. To reduce overfitting
c. To increase the number of layers in a network
d. To minimize the loss function directly

Correct Answer: b
Solution: As discussed in lecture.

_________________________________________________________________________

QUESTION 5:

Which of the following is a regularization technique where we randomly deactivate neurons


during training?
a. Early stopping
b. L1 regularization
c. Dropout
d. Weight decay

Correct Answer: c
Solution: As discussed in lecture.

_________________________________________________________________________

Question 6:

Which activation function has the vanishing gradient problem for large positive or negative
inputs?
a. ReLU
b. Sigmoid
c. GELU
d. Swish
Correct Answer: b
Solution: The sigmoid function saturates at extreme input values (large positive or negative
inputs).
_________________________________________________________________________

QUESTION 7:

Which activation function is defined as: f(x)=x⋅σ(x), where σ(x) is the sigmoid function?

a. Swish
b. ReLU
c. GELU

d. SwiGLU

Correct Answer: a
Solution: As discussed in lecture.

_________________________________________________________________________

QUESTION 8:

What does the backpropagation algorithm compute in a neural network?


a. Loss function value at each epoch
b. Gradients of the loss function with respect to weights of the network
c. Activation values of the output layer
d. Output of each neuron

Correct Answer: b
Solution: Please refer to the lecture.

__________________________________________________________________

Question 9:
Which type of regularization encourages sparsity in the weights?
a. L1 regularization
b. L2 regularization
c. Dropout
d. Early stopping

Correct Answer: a
Solution: L1 regularization encourages sparsity in the weights.

_________________________________________________________________________

QUESTION 10:

What is the main purpose of using hidden layers in an MLP?


a. Helps to the network bigger
b. Enables us to handle linearly separable data
c. Learn complex and nonlinear relationships in the data
d. Minimize the computational complexity
Correct Answer: c
Solution: Hidden layers enable MLPs to learn complex and nonlinear relationships that a
single-layer perceptron cannot model.

_________________________________________________________________________

Common questions

Powered by AI

The tanh activation function is a scaled version of the sigmoid function that symmetrically maps inputs into the range [-1, 1]. The function calculates the hyperbolic tangent of the input, resulting in a smooth curve that approaches -1 for large negative inputs and 1 for large positive inputs, with a range centered around zero for inputs near zero .

The chain rule of derivatives enables backpropagation by systematically applying the derivative of composite functions. In neural networks, the total loss is a composition of several functions, representing the layers' operations. By applying the chain rule, backpropagation computes the gradient of the loss function with respect to each weight by breaking it down into the gradients of the loss with respect to the output of each layer, and then using these to compute gradients further back in the network .

Regularization reduces overfitting by imposing a penalty on more complex models, which discourages the model from fitting noise in the training data. Overfitting represents a major problem because it results in high variance models that perform well on training data but poorly on unseen data. By enforcing simplicity, regularization enhances the generalization capability of a model, making it more robust in practical applications .

The Swish activation function is defined as f(x)=x⋅σ(x), where σ(x) is the sigmoid function. Unlike ReLU, which completely shuts off neurons with inputs below zero, Swish is smooth and non-monotonic, allowing for small negative values, which can help ensure neurons continue to propagate error signals even when they are not activated, potentially leading to improved training dynamics and better performance in some scenarios .

Hidden layers in an MLP allow the network to model and learn complex, nonlinear relationships within the data by applying multiple layers of nonlinear transformations to the input data. Each layer learns an increasingly abstract representation of the data, enabling the network to capture intricate patterns and dependencies that simple linear models cannot .

Dropout regularization helps prevent overfitting by randomly deactivating a subset of neurons during training, which prevents the model from relying too heavily on any particular element of the model. This stochasticity forces the network to learn more robust features, distributes representation across neurons, and acts as a form of model averaging, leading to better generalization on unseen data .

The Perceptron learning algorithm is designed to find a hyperplane that can separate data into two distinct classes. It adjusts weights based on whether the current decision boundary correctly classifies the data points. However, when data is not linearly separable, no such hyperplane exists, and thus the perceptron cannot correctly classify the data, causing it to fail in these situations .

Backpropagation uses the chain rule of derivatives to compute gradients necessary for updating weights. This iterative method applies the chain rule to propagate the error back through the layers of the network. It is preferred because it efficiently computes the necessary gradients for deep networks, enabling feasible training of modern neural architectures which consist of numerous layers and massive number of parameters .

The sigmoid function outputs values between 0 and 1, and its gradient is small for large positive or negative inputs, effectively saturating. When the network weights are updated through backpropagation, these small gradients cause weight updates to be increasingly smaller, slowing down the learning process. This can halt learning entirely in deep networks, a phenomenon known as the vanishing gradient problem .

L1 regularization adds the absolute value of the coefficients as a penalty to the loss function. This encourages sparsity in the weight vectors, meaning many weights are driven to zero. The implication is that it can lead to models that are simpler and more interpretable, as they use fewer features, but it requires careful tuning to ensure that predictive power is not lost by overly penalizing the model complexity .

You might also like