Data Science Project Dependencies

Convert a data science project's requirements.txt with NumPy, Pandas, scikit-learn, and Jupyter to pyproject.toml format with proper version constraints.

Real-World Projects

Detailed Explanation

Data science projects often have complex dependency trees with packages that have strict version compatibility requirements. Here's a typical data science project conversion.

Data science requirements.txt:

numpy>=1.24,<2.0
pandas>=2.0
scikit-learn>=1.3
matplotlib>=3.8
seaborn>=0.13
scipy>=1.11
statsmodels>=0.14

# Deep Learning
torch>=2.1 ; sys_platform != "win32" or python_version >= "3.9"
torchvision>=0.16

# Data Processing
polars>=0.20
pyarrow>=14.0
openpyxl>=3.1

# Jupyter
jupyterlab>=4.0
ipywidgets>=8.1

Converted to pyproject.toml:

[project]
name = "my-ml-project"
version = "0.1.0"
requires-python = ">=3.10"

dependencies = [
    "numpy>=1.24,<2.0",
    "pandas>=2.0",
    "scikit-learn>=1.3",
    "matplotlib>=3.8",
    "seaborn>=0.13",
    "scipy>=1.11",
    "statsmodels>=0.14",
    "polars>=0.20",
    "pyarrow>=14.0",
    "openpyxl>=3.1",
]

[project.optional-dependencies]
gpu = [
    "torch>=2.1",
    "torchvision>=0.16",
]
notebook = [
    "jupyterlab>=4.0",
    "ipywidgets>=8.1",
]

Data science packaging considerations:

  1. NumPy 2.0 boundary: Many packages need time to support NumPy 2.0, so <2.0 upper bounds are common
  2. GPU packages: PyTorch and CUDA dependencies are often split into optional groups since they're large and platform-specific
  3. Jupyter in optional groups: Notebook tools are development aids, not runtime dependencies
  4. Platform-specific packages: Some packages have different wheels for different platforms and Python versions
  5. Large packages: Consider whether all dependencies are needed in production or just for development/experimentation

The pyproject.toml format makes it natural to separate core analysis libraries from optional GPU and notebook dependencies, reducing installation time for production deployments.

Use Case

Converting a machine learning project's flat requirements.txt into a structured pyproject.toml that separates core, GPU, and notebook dependencies for different deployment scenarios.

Try It — Requirements ↔ Pyproject

Open full tool