0% found this document useful (0 votes)

39 views5 pages

Pivot Point Data Scraping Script

Uploaded by

abhishekwt3

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views5 pages

Pivot Point Data Scraping Script

Uploaded by

abhishekwt3

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Hello, I need to scrape 2 websites using php/python and save data in mysql database in

relational database concept. (means avoid inserting repeat data, check if data already exists,
and only update “end_date” of last insert keeping start date same. Will need to check all
columns for exact match. Also, avoid inserting lengthy text data types, insteads, insert “id” of
row from another table that has text value, this helps during queries.

Here are websites

[Link]

You will only scrape currency pairs (with “/” like AUD/CAD, CAD/JPY..but skip BITCOIN, OIL,
GERMANY 40 etc)

There is Hourly, Daily, Weekly, Monthly tabs, you will scrape data for each tab and save in
database with a column value of “Hourly” or “Daily” for example. But you won’t insert text value,
instead you will use values as below

id Text value

4 Hourly

6 Daily

7 Weekly

8 Monthly

Don’t ask me why, because I have another table that has these values..and used for different
application.

Currency table will be as below-please make sure you exactly match id of each currency you will
be using (replace “/” with “_” underscore for currency name)
You MUST use currency id from above table that matches currency pair being scraped.
Otherwise it will NOT match with my overall application database queries.

You will be saving values as below

id start_date end_date currency_id website_id timeframe S1 S2 S3 R1 R2 R3

First column id will be auto. Website_id will be “2” for dailyfx website.

Same thing for next website.

[Link]

Only difference will be, website_id = “4”. ..also skip H4 timeframe

Next website is
[Link]

Pay attention to drop down for timeframe..We want to scrape only following timeframes
1 minute, 5 minute, 15 minute, hourly, 5 Hours (save as H4 with id of “5”), daily, weekly, monthly
(skip 30 minute)
Use table below for reference of ID.

Remember, not all websites will have all currency pairs, you need to carefully save in db with
correct id for that currency along with correct timeframe for that website id.

R1,R2,R3,S1,S2,S3 values need to be saved in exact decimal format.

I wlil provide currency, time period tables as reference. You code should be modular, properly
grouped into classes, functions for repeatable tasks. Html, css, js separate where necessary.

Prefer to use light weight php or python libraries, to avoid too much dependencies.

This script will run as cron job on linux ubuntu server automatically every 5 minutes to scrape
fresh data.

Common questions

Using lightweight libraries in PHP or Python for web scraping provides several advantages, particularly in enhancing system performance. These libraries have less overhead compared to comprehensive frameworks and thus consume fewer system resources, which is ideal for running scripts as cron jobs repeatedly on servers. This helps in maintaining server performance without causing significant slowdowns. They also load faster, which can reduce the risk of timeouts and improve the reliability of data collection during frequent and recurring scraping tasks. Furthermore, lightweight libraries are generally easier to troubleshoot and maintain, proving beneficial for long-term application stability .

When using cron jobs on a Linux server to automate data scraping, several challenges can arise. Managing script dependencies and ensuring environmental consistency is crucial since the scripts need to run unattended. Network or source website availability issues could cause failed scraping attempts, so implementing error handling and logging is essential for troubleshooting. Data formats might change over time, requiring updates to parsing logic. To manage these, maintain modular and well-documented code, schedule jobs considering server load to avoid overloads, and include notifications or logging for ongoing monitoring and debugging .

Foreign key references in a scraped dataset should be meticulously handled to ensure data integrity and consistency. Each entity, such as a currency pair, should be associated with a unique identifier in a centralized reference table. This requires a mapping system to replace direct textual data input with corresponding 'id' from reference tables. For instance, currency pairs should be mapped to their specific 'id' values, replacing text with formatted identifiers like replacing '/' with '_' for better database query integration. This approach reduces redundancy, enhances data retrieval processes, and ensures alignment across datasets from multiple sources .

Replacing text with numeric IDs in database systems optimizes efficiency by reducing storage space requirements and streamlining query processing. Numeric IDs, being fixed and smaller in size than text strings, enable faster indexing and retrieval operations. This practice is particularly effective when dealing with large datasets from multiple web scraping sources, where data normalization can enhance performance significantly. It allows for consistent and unified reference points across datasets, facilitating integrative analyses and reducing the complexity involved with text handling, leading to cleaner and more scalable database designs .

When deciding not to scrape or store certain data types like lengthy text values, considerations should include the impact on database performance, query efficiency, and system resource usage. Storing long text can lead to increased storage requirements and slower query response times. As an alternative, using foreign key relationships to reference data stored in separate tables can minimize storage needs and optimize querying by avoiding excessive data duplication. Additionally, text-heavy data might not be necessary for all applications, and stripping this data in favor of compact and precise references can streamline data processing and improve overall system efficiency .

Designing a system for scraping data from multiple web sources and storing it in a relational database involves several considerations. Primary among these is to avoid duplicating data by checking if the data already exists and only updating the 'end_date' of the last entry while keeping the 'start_date' unchanged. Data should be inserted only if there's an exact match across all columns to avoid redundancy. Instead of inserting lengthy text data directly, it's advisable to use references such as 'id' from another table to represent text values, which helps optimally query the database. Additionally, aligning scraped data to predefined identifiers for attributes like currency pairs ensures consistency and compatibility with existing database queries .

Modular code design in developing data scraping applications allows for separation of concerns, making the codebase more maintainable and flexible. In the context of PHP or Python, this approach facilitates the division of labor by grouping tasks into classes and functions, enabling reuse and reducing redundancy. It ensures that updates or bug fixes in one part of the code do not affect other parts, enhancing stability. Modular design also supports the implementation of lightweight libraries, ensuring fewer dependencies, which is crucial for ensuring smooth operation in environments like cron jobs on Linux servers where minimal resource consumption is desired .

Accurate time series data collection from web sources requires strategies like checking whether the web source has all required timeframes and ensuring consistency in time interval representation, such as converting non-standard time intervals to standard ones. Storing timeframes as integers in a reference table allows for consistent querying and manipulation. Each data entry should be timestamped during insertion to maintain historical accuracy. Additionally, only select predefined timeframes from each source, like avoiding certain timeframes not required as per the data requirement specification, ensures that irrelevant or redundant data is not collected .

The failure to properly match currency pair identifiers between scraped data and existing database entries can severely disrupt data integrity and analysis. Incorrect matching can lead to misaligned data entries, making accurate filtering and querying challenging, which might result in misleading analysis outcomes. The database's validity could be compromised, leading to potential system failures when the data is used for critical applications. Mismatching can also escalate into broader integration issues across various interconnected systems relying on accurate currency pair matching, exacerbating data consistency problems across platforms .

Applying relational database concepts to avoid inserting duplicate data enhances database management by maintaining data integrity and consistency. In a web scraping context, this practice reduces redundant data entries, which can otherwise lead to bloated database size and inefficient query performance. By checking for existing data before insertion and updating specific fields, such as the 'end_date', this approach conserves storage space and speeds up data retrieval. It also simplifies data maintenance by reducing the need for extensive data cleaning and ensures the accuracy and relevance of the stored data over time .

Web Scraping Techniques and Tools
No ratings yet
Web Scraping Techniques and Tools
3 pages
Multi-Tenant KPI Dashboard for WooCommerce
No ratings yet
Multi-Tenant KPI Dashboard for WooCommerce
9 pages
House Price Prediction with Machine Learning
No ratings yet
House Price Prediction with Machine Learning
24 pages
Significance of Data Scraping in Business
No ratings yet
Significance of Data Scraping in Business
12 pages
Upwork Job Scraper Tool Instructions
No ratings yet
Upwork Job Scraper Tool Instructions
4 pages
Binance API Guide for Python Users
No ratings yet
Binance API Guide for Python Users
4 pages
Flask
No ratings yet
Flask
4 pages
NBP Data Scraper Tool Instructions
No ratings yet
NBP Data Scraper Tool Instructions
4 pages
Web Scraping with Python & BeautifulSoup
100% (2)
Web Scraping with Python & BeautifulSoup
10 pages
Web Scraping for Price Comparison
No ratings yet
Web Scraping for Price Comparison
23 pages
Crypto Data Scraper and Dashboard Guide
No ratings yet
Crypto Data Scraper and Dashboard Guide
5 pages
Industrial Training on Web Scraping
No ratings yet
Industrial Training on Web Scraping
26 pages
Web Scraping with Python and BeautifulSoup
No ratings yet
Web Scraping with Python and BeautifulSoup
6 pages
Python Data Scraping Techniques
No ratings yet
Python Data Scraping Techniques
45 pages
Web Scraping for Financial Data
No ratings yet
Web Scraping for Financial Data
31 pages
Web Scraping Techniques Overview
No ratings yet
Web Scraping Techniques Overview
9 pages
Unit 11 Application Development Using Python
No ratings yet
Unit 11 Application Development Using Python
19 pages
Scraping Store and Barcode Data
No ratings yet
Scraping Store and Barcode Data
9 pages
PDF Scraping Functions Overview
No ratings yet
PDF Scraping Functions Overview
5 pages
Seminar Completed
No ratings yet
Seminar Completed
22 pages
Python Web Scraping Project Guide
No ratings yet
Python Web Scraping Project Guide
14 pages
Web Scraping O'Reilly Books Data
No ratings yet
Web Scraping O'Reilly Books Data
5 pages
E-Commerce Product Update Alert System
No ratings yet
E-Commerce Product Update Alert System
9 pages
Web Scraping Course Overview
No ratings yet
Web Scraping Course Overview
89 pages
Stock Price Scraper API Prototype
No ratings yet
Stock Price Scraper API Prototype
5 pages
Fastest Language for Web Scraping
No ratings yet
Fastest Language for Web Scraping
7 pages
Template
No ratings yet
Template
21 pages
FX and Crypto Spread Tracker
No ratings yet
FX and Crypto Spread Tracker
4 pages
Web Scraping: Techniques and Tools
No ratings yet
Web Scraping: Techniques and Tools
28 pages
Scraping Saudi Stock Data with Python
No ratings yet
Scraping Saudi Stock Data with Python
1 page
yfinance Python Download Example
No ratings yet
yfinance Python Download Example
1 page
Web Scraping Quick Start Guide
No ratings yet
Web Scraping Quick Start Guide
7 pages
FX and Crypto Spread Tracker
No ratings yet
FX and Crypto Spread Tracker
4 pages
BTCUSDT Spread Data on Kucoin
No ratings yet
BTCUSDT Spread Data on Kucoin
4 pages
Web Scraping with Python Requests
No ratings yet
Web Scraping with Python Requests
19 pages
Web Scraping Techniques and Tools
100% (1)
Web Scraping Techniques and Tools
31 pages
Scraping Dynamic Websites with Python
No ratings yet
Scraping Dynamic Websites with Python
4 pages
Python Scripts by Serhan Sari
No ratings yet
Python Scripts by Serhan Sari
193 pages
Scraping Crypto Prices with Python
No ratings yet
Scraping Crypto Prices with Python
4 pages
Web Scraping in Data Science with Python
No ratings yet
Web Scraping in Data Science with Python
11 pages
Python Web Scraping Tutorial
No ratings yet
Python Web Scraping Tutorial
3 pages
Credit Card Promo Scraper in NodeJS
No ratings yet
Credit Card Promo Scraper in NodeJS
2 pages
Scraping Hotel Prices from Booking.com with Python
No ratings yet
Scraping Hotel Prices from Booking.com with Python
12 pages
Yellow Pages Scraping Script
No ratings yet
Yellow Pages Scraping Script
3 pages
BeautifulSoup Web Scraping for Analytics
No ratings yet
BeautifulSoup Web Scraping for Analytics
6 pages
Basic Web Scraping Script Example
No ratings yet
Basic Web Scraping Script Example
1 page
DealHunter Project Overview and Insights
No ratings yet
DealHunter Project Overview and Insights
2 pages
Data Science Projects Portfolio
No ratings yet
Data Science Projects Portfolio
3 pages
Web Scraping Cheat Sheet Guide
No ratings yet
Web Scraping Cheat Sheet Guide
10 pages
ETL Process with Python for Data Engineering
No ratings yet
ETL Process with Python for Data Engineering
80 pages
Semin
No ratings yet
Semin
8 pages
Web Scraping
No ratings yet
Web Scraping
4 pages
Web Scraping Automation Overview
No ratings yet
Web Scraping Automation Overview
6 pages
Web Scraping for Ontology Building
No ratings yet
Web Scraping for Ontology Building
11 pages
Python Binance Trading Analysis
No ratings yet
Python Binance Trading Analysis
5 pages
Mango Web Scraping Project Overview
No ratings yet
Mango Web Scraping Project Overview
3 pages
Python Web Scraping for Currency Data
No ratings yet
Python Web Scraping for Currency Data
30 pages
Bach Guitar
100% (1)
Bach Guitar
15 pages
Furniture Bidding Document for PMU Quetta
No ratings yet
Furniture Bidding Document for PMU Quetta
37 pages
Overview of Puranas in Hindu Literature
No ratings yet
Overview of Puranas in Hindu Literature
128 pages
Factors Influencing College Course Choices
No ratings yet
Factors Influencing College Course Choices
4 pages
Corporate and Supply Chain Strategy Overview
No ratings yet
Corporate and Supply Chain Strategy Overview
31 pages
Right to Sleep as Fundamental Right
No ratings yet
Right to Sleep as Fundamental Right
10 pages
New Jersey Devils vs. Tampa Bay Lightning Game Notes
No ratings yet
New Jersey Devils vs. Tampa Bay Lightning Game Notes
40 pages
Understanding Fake News Among HUMSS Students
No ratings yet
Understanding Fake News Among HUMSS Students
78 pages
Gender Archaeology: Understanding Roles
No ratings yet
Gender Archaeology: Understanding Roles
77 pages
Final PW June Month
No ratings yet
Final PW June Month
169 pages
LEPT Exam Schedule and Guidelines 2023
No ratings yet
LEPT Exam Schedule and Guidelines 2023
4 pages
Case Report on ANUG Management
No ratings yet
Case Report on ANUG Management
6 pages
CIA4U: Analyzing Economic Issues Exam
No ratings yet
CIA4U: Analyzing Economic Issues Exam
14 pages
English Language Practice Questions
No ratings yet
English Language Practice Questions
3 pages
Assam Public Services Exam Rules 2019
No ratings yet
Assam Public Services Exam Rules 2019
105 pages
Understanding Learning Disabilities
No ratings yet
Understanding Learning Disabilities
5 pages
ASTM E18 (2019) - Part39
No ratings yet
ASTM E18 (2019) - Part39
1 page
Financial Concepts and Securitization Insights
No ratings yet
Financial Concepts and Securitization Insights
50 pages
Boiler Erection and Registration Process
No ratings yet
Boiler Erection and Registration Process
6 pages
The Battle of Boston: Busing and The Struggle For School Desegregation
No ratings yet
The Battle of Boston: Busing and The Struggle For School Desegregation
149 pages
Conditional Probability Problems Explained
No ratings yet
Conditional Probability Problems Explained
7 pages
MBA HRM Syllabus Overview 2019-20
No ratings yet
MBA HRM Syllabus Overview 2019-20
43 pages
Understanding Blogs: Features & Types
No ratings yet
Understanding Blogs: Features & Types
4 pages
SAP S/4HANA Service Overview
No ratings yet
SAP S/4HANA Service Overview
26 pages
Mount Apo: Philippines' Highest Volcano
No ratings yet
Mount Apo: Philippines' Highest Volcano
4 pages
Analyzing Changes in Various Locations
No ratings yet
Analyzing Changes in Various Locations
14 pages
Consanguineous Marriages in India
No ratings yet
Consanguineous Marriages in India
13 pages
China Plans 180GW Energy Storage by 2027
No ratings yet
China Plans 180GW Energy Storage by 2027
9 pages
Student Diet and Health Comparison
No ratings yet
Student Diet and Health Comparison
9 pages
Indian Evidence Act Commentary PDF
No ratings yet
Indian Evidence Act Commentary PDF
16 pages

Pivot Point Data Scraping Script

Uploaded by

Pivot Point Data Scraping Script

Uploaded by

Hello, I need to scrape 2 websites using php/python and save data in mysql database in

Here are websites

You will be saving values as below

id start_date end_date currency_id website_id timeframe S1 S2 S3 R1 R2 R3

Same thing for next website.

Only difference will be, website_id = “4”. ..also skip H4 timeframe

R1,R2,R3,S1,S2,S3 values need to be saved in exact decimal format.

Common questions

What are the advantages of using lightweight libraries in PHP or Python for web scraping, and how do they affect system performance during routine data collection?

What challenges might arise when synchronizing data scraping scripts to run as cron jobs on a Linux server, and how can they be managed?

How should foreign key references be handled in a scraped dataset to ensure data integrity and consistency within the database?

In what ways does replacing text with numeric IDs improve database efficiency in handling large datasets from multiple web scraping sources?

What considerations should be taken into account when deciding not to scrape or store certain data types, such as lengthy text values?

What are the essential considerations when designing a system to scrape data from multiple web sources and store it in a relational database?

How can modular code design benefit the process of developing data scraping applications, specifically in the context of PHP or Python?

What strategies should be employed to ensure accurate time series data collection and storage from various timeframes when scraping from web sources?

What are the potential pitfalls of not properly matching currency pair identifiers between the scraped data and existing database entries?

How does using relational database concepts to avoid inserting duplicate data improve database management in a web scraping context?

You might also like