This project implements an offline phishing website detection model, capable of operating without an internet connection by leveraging the knowledge acquired during its training phase.
The architecture combines Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) networks, leveraging the strengths of both paradigms. CNN layers act as feature extractors, scanning URL components to detect local spatial patterns such as character groupings, token distributions, and structural anomalies. In parallel, the LSTM layer captures sequential dependencies across the URL string, retaining contextual information about recurring patterns that may indicate phishing behavior. This hybrid design allows the model to simultaneously identify fine-grained lexical cues and long-range dependencies, resulting in a more robust and accurate classification of web addresses.
The system analyzes multiple lexical, structural, and heuristic attributes of web addresses, classifying them as legitimate or potentially malicious (phishing) with high accuracy.
- Hybrid CNN + LSTM architecture for superior detection performance
- Offline execution: no internet connection required for URL analysis
- Advanced feature extraction: length, subdomains, special characters, suspicious patterns, and more
- Interactive analysis interface for user-provided URLs
- Warning messages and recommendations to enhance user security
- Modular, well-documented code for easy adaptation and integration into other systems
- Training accuracy: ~97.7%
- Test accuracy: ~88.9%
- AUC (Area Under ROC Curve): 0.89
/phishing-detection-rnn-cnn
│
├── dataset_phishing.csv # Dataset used for training
├── URL_Phishing_Detection.ipynb # Main notebook with the complete workflow
├── my_model.keras # Trained model ready for offline use
├── X_train.npy / X_test.npy # Preprocessed datasets
├── Y_train.npy / Y_test.npy
├── history.npy # Training history
└── README.md
To run this project seamlessly in Google Colab while keeping all files organized and persistent:
- In your Google Drive, create a new folder named:
phishing-detection-rnn-cnn - Upload all project files into this folder (
.ipynb,.npy,.csv,.keras, etc.). - In Colab, mount your Google Drive:
from google.colab import drive drive.mount('/content/drive')
- Navigate to the project folder in Drive:
%cd /content/drive/MyDrive/phishing-detection-rnn-cnn
- Run the notebook
URL_Phishing_Detection.ipynb— it will directly access the files from your Drive folder, ensuring smooth execution without manual file uploads each time.
This setup keeps your workspace clean, ensures file persistence, and allows you to resume work instantly from any device.
- Python 3.10+
- TensorFlow
- Pandas, NumPy, Matplotlib, Seaborn
- scikit-learn
Quick installation:
pip install tensorflow pandas numpy matplotlib seaborn scikit-learn- Clone the repository:
git clone https://github.com/frangelbarrera/phishing-detection-rnn-cnn.git- Load the trained model:
from tensorflow import keras
model = keras.models.load_model("my_model.keras")- Run the interactive analysis:
python URL_Phishing_Detection.ipynb- The model may misclassify some well-known legitimate sites due to limitations in the current implementation.
- For greater accuracy, it is recommended to supplement its use with other security tools.
🔎 URL Analysis Input Interface

This project is licensed under the MIT License - see the LICENSE file for details.


