Skip to content

docs: add Google-style docstrings to dspy/datasets/dataloader.py#9458

Open
saivedant169 wants to merge 3 commits intostanfordnlp:mainfrom
saivedant169:docstrings-dataloader
Open

docs: add Google-style docstrings to dspy/datasets/dataloader.py#9458
saivedant169 wants to merge 3 commits intostanfordnlp:mainfrom
saivedant169:docstrings-dataloader

Conversation

@saivedant169
Copy link
Copy Markdown

Resolves #9457
Part of #8926

Description

Adds comprehensive Google-style docstrings with Args, Returns, Raises, and Example sections to all 9 public APIs in DataLoader:

  • DataLoader class docstring
  • from_huggingface() — includes split behavior (dict vs flat list)
  • from_csv()
  • from_pandas()
  • from_json()
  • from_parquet()
  • from_rm()
  • sample()
  • train_test_split() — documents float vs int size semantics

Each docstring includes a runnable code example showing typical usage.

ruff check and ruff format pass clean.

Add comprehensive docstrings with Args, Returns, Raises, and Example
sections to all 9 public APIs in DataLoader:
- DataLoader class
- from_huggingface, from_csv, from_pandas, from_json, from_parquet
- from_rm, sample, train_test_split

Resolves stanfordnlp#9457
@MaximeRivest
Copy link
Copy Markdown
Collaborator

before reviewing docstrings pr we now ask that you please add screenshots of you pr's changes as we would see them on dspy.ai after the pr is merged.

See #8926 (comment)

@saivedant169
Copy link
Copy Markdown
Author

@MaximeRivest I tried building the docs locally with mkdocs serve but the build crashes on a nbconvert / Python 3.14 incompatibility when processing Jupyter notebook pages (ValueError: No template sub-directory with name 'lab'). This is unrelated to my docstring changes — it fails before reaching the API reference pages.

I verified that all 9 docstrings parse correctly through griffe (the handler mkdocstrings uses). Here's the output from griffe.load('dspy.datasets.dataloader'):

DataLoader class:

Utility for loading datasets from various sources into DSPy Examples.

DataLoader provides methods to load data from Hugging Face Hub, CSV, JSON,
Parquet files, Pandas DataFrames, and retrieval modules, converting each row
into a dspy.Example with the specified input keys.

Methods documented (all 9/9):

  • from_huggingface() — Args, Returns, Raises, Example
  • from_csv() — Args, Returns, Example
  • from_pandas() — Args, Returns, Example
  • from_json() — Args, Returns, Example
  • from_parquet() — Args, Returns, Example
  • from_rm() — Args, Returns, Raises
  • sample() — Args, Returns, Raises, Example
  • train_test_split() — Args, Returns, Raises, Example

Every docstring follows Google style and includes runnable >>> examples. If someone with Python 3.12/3.13 can confirm the full mkdocs render, happy to add screenshots from their build.

@MaximeRivest
Copy link
Copy Markdown
Collaborator

MaximeRivest commented Mar 16, 2026

please, push through. it does build. once you do see the docs locally, you will notice that you need to change some elements in you docstrings to respect formats.

see: #9445 and #9444 for example of format and change depth we expect.

@saivedant169
Copy link
Copy Markdown
Author

@MaximeRivest Here are the rendered docs screenshots from local mkdocs build:
Screenshot 2026-03-16 at 3 50 25 PM
Screenshot 2026-03-16 at 3 50 32 PM
Screenshot 2026-03-16 at 3 50 42 PM
Screenshot 2026-03-16 at 3 50 49 PM
Screenshot 2026-03-16 at 3 50 55 PM
Screenshot 2026-03-16 at 3 51 00 PM
Screenshot 2026-03-16 at 3 51 04 PM
Screenshot 2026-03-16 at 3 51 28 PM

@MaximeRivest
Copy link
Copy Markdown
Collaborator

Please fix the formatting before we can engage into reviewing the content.

This is one example of a formatting issue:

image

@saivedant169
Copy link
Copy Markdown
Author

Screenshot 2026-03-18 at 3 05 40 PM

@saivedant169
Copy link
Copy Markdown
Author

Hey @MaximeRivest, just checking in on this one. Let me know if anything needs changing or if you'd rather handle it differently.

@MaximeRivest
Copy link
Copy Markdown
Collaborator

hello @saivedant169, do you mind providing a proof that all your examples run?

@saivedant169
Copy link
Copy Markdown
Author

yes for sure

@saivedant169
Copy link
Copy Markdown
Author

Screenshot 2026-03-27 at 1 18 49 PM

@saivedant169
Copy link
Copy Markdown
Author

is this good @MaximeRivest ?

@MaximeRivest
Copy link
Copy Markdown
Collaborator

thank you! good job on running all those checks and tests. I will now review the text in the coming days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Docs] Add docstrings to dspy/datasets/dataloader.py

2 participants