Offline Phishing Detection Model for Websites Using Recurrent and Convolutional Neural Networks

Description

This project implements an offline phishing website detection model, capable of operating without an internet connection by leveraging the knowledge acquired during its training phase.

The architecture combines Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) networks, leveraging the strengths of both paradigms. CNN layers act as feature extractors, scanning URL components to detect local spatial patterns such as character groupings, token distributions, and structural anomalies. In parallel, the LSTM layer captures sequential dependencies across the URL string, retaining contextual information about recurring patterns that may indicate phishing behavior. This hybrid design allows the model to simultaneously identify fine-grained lexical cues and long-range dependencies, resulting in a more robust and accurate classification of web addresses.

The system analyzes multiple lexical, structural, and heuristic attributes of web addresses, classifying them as legitimate or potentially malicious (phishing) with high accuracy.

Key Features

Hybrid CNN + LSTM architecture for superior detection performance
Offline execution: no internet connection required for URL analysis
Advanced feature extraction: length, subdomains, special characters, suspicious patterns, and more
Interactive analysis interface for user-provided URLs
Warning messages and recommendations to enhance user security
Modular, well-documented code for easy adaptation and integration into other systems

Model Results

Training accuracy: ~97.7%
Test accuracy: ~88.9%
AUC (Area Under ROC Curve): 0.89

Repository Structure

/phishing-detection-rnn-cnn
│
├── dataset_phishing.csv         # Dataset used for training
├── URL_Phishing_Detection.ipynb # Main notebook with the complete workflow
├── my_model.keras                # Trained model ready for offline use
├── X_train.npy / X_test.npy      # Preprocessed datasets
├── Y_train.npy / Y_test.npy
├── history.npy                   # Training history
└── README.md

Running the Project with Google Drive and Colab

To run this project seamlessly in Google Colab while keeping all files organized and persistent:

In your Google Drive, create a new folder named:
```
phishing-detection-rnn-cnn
```
Upload all project files into this folder (.ipynb, .npy, .csv, .keras, etc.).

In Colab, mount your Google Drive:

from google.colab import drive
drive.mount('/content/drive')

Navigate to the project folder in Drive:

%cd /content/drive/MyDrive/phishing-detection-rnn-cnn

Run the notebook URL_Phishing_Detection.ipynb — it will directly access the files from your Drive folder, ensuring smooth execution without manual file uploads each time.

This setup keeps your workspace clean, ensures file persistence, and allows you to resume work instantly from any device.

Requirements

Python 3.10+
TensorFlow
Pandas, NumPy, Matplotlib, Seaborn
scikit-learn

Quick installation:

pip install tensorflow pandas numpy matplotlib seaborn scikit-learn

Usage

Clone the repository:

git clone https://github.com/frangelbarrera/phishing-detection-rnn-cnn.git

Load the trained model:

from tensorflow import keras
model = keras.models.load_model("my_model.keras")

Run the interactive analysis:

python URL_Phishing_Detection.ipynb

Notes and Warnings

The model may misclassify some well-known legitimate sites due to limitations in the current implementation.
For greater accuracy, it is recommended to supplement its use with other security tools.

Screenshots

🔎 URL Analysis Input Interface

Example of URL Classification Result

License

This project is licensed under the MIT License - see the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Offline Phishing Detection Model for Websites Using Recurrent and Convolutional Neural Networks

Description

Key Features

Model Results

Repository Structure

Running the Project with Google Drive and Colab

Requirements

Usage

Notes and Warnings

Screenshots

Example of URL Classification Result

Example of URL Classification Result

Example of URL Classification Result

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
docs/images		docs/images
LICENSE		LICENSE
README.md		README.md
URL_Phishing_Detection.ipynb		URL_Phishing_Detection.ipynb
X_test.npy		X_test.npy
X_train.npy		X_train.npy
Y_test.npy		Y_test.npy
Y_train.npy		Y_train.npy
dataset_phishing.csv		dataset_phishing.csv
history.npy		history.npy
my_model.keras		my_model.keras
screencapture-colab-proyect.pdf		screencapture-colab-proyect.pdf

Folders and files

Latest commit

History

Repository files navigation

Offline Phishing Detection Model for Websites Using Recurrent and Convolutional Neural Networks

Description

Key Features

Model Results

Repository Structure

Running the Project with Google Drive and Colab

Requirements

Usage

Notes and Warnings

Screenshots

Example of URL Classification Result

Example of URL Classification Result

Example of URL Classification Result

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages