-
Notifications
You must be signed in to change notification settings - Fork 6k
Description
🐛 Bug Description
if the input csv files of dump_fix mode only have part of the times in the calendars, the dump_fix generated database is only accessible for the times in the input csv, although all other dates are still in the calendar.
To Reproduce
Steps to reproduce the behavior:
use the following script to test available data in the database
import qlib
qlib.init(provider_uri="qlib_data/day", dataset_cache=None, custom_ops=[], expression_cache=None, region=qlib.config.REG_US)
from qlib.data import D
import pandas as pd
close = D.features(instruments=["SPY"], fields=["$close"], start_time="2022-07-01", end_time="2022-07-31", freq="day")
df = pd.DataFrame(index=close.reset_index().datetime)
df["close"] = close.reset_index(drop=True).values
print(df.tail(5))
build the database with full calendar csv (say, from 2000-01-01 to 2022-07-25), then dump_fix the database using only one day csv (say, with only 2022-07-25), then the resulted database only have data on 2022-07-25
Expected Behavior
The dump_fix should not affect data that is not in the input csv
Screenshot
Environment
Note: User could run cd scripts && python collect_info.py all under project directory to get system information
and paste them here directly.
Linux
x86_64
Linux-4.18.0-147.el8.x86_64-x86_64-with-glibc2.2.5
#1 SMP Wed Dec 4 21:51:45 UTC 2019
Python version: 3.8.6 (default, Oct 22 2020, 17:03:03) [GCC 9.3.0]
Qlib version: 0.8.6.99
numpy==1.23.1
pandas==1.3.5
scipy==1.8.1
requests==2.28.1
sacred==0.8.2
python-socketio==5.7.1
redis==4.3.4
python-redis-lock==3.7.0
schedule==1.1.0
cvxpy==1.2.1
hyperopt==0.1.2
fire==0.4.0
statsmodels==0.13.2
xlrd==2.0.1
plotly==5.9.0
matplotlib==3.5.2
tables==3.7.0
pyyaml==6.0
mlflow==1.27.0
tqdm==4.64.0
loguru==0.6.0
lightgbm==3.3.2
tornado==6.2
joblib==1.1.0
fire==0.4.0
ruamel.yaml==0.17.21