Python and Data Science .gitattributes
Configure .gitattributes for Python projects including Jupyter notebooks, data files, trained models, and scientific computing artifacts.
Detailed Explanation
Python / Data Science .gitattributes
Python data science projects combine source code, Jupyter notebooks, large datasets, and trained model files. Each category needs different Git handling.
Recommended Configuration
# Auto detect
* text=auto
# Python source
*.py text diff=python
*.pyx text diff=python
*.pxd text diff=python
*.pyi text diff=python
# Config / packaging
*.cfg text
*.ini text
*.toml text
*.yaml text
*.yml text
setup.py text diff=python
pyproject.toml text
# Jupyter notebooks
*.ipynb text -diff
# Lock files
poetry.lock text -diff
Pipfile.lock text -diff
requirements.txt text
# Data files (text-based)
*.csv text
*.tsv text
*.json text
*.jsonl text
*.xml text
# Data files (binary)
*.pkl binary
*.pickle binary
*.parquet binary
*.feather binary
*.hdf5 binary
*.h5 binary
*.npy binary
*.npz binary
*.arrow binary
# Trained models
*.pt binary
*.pth binary
*.onnx binary
*.pb binary
*.tflite binary
*.joblib binary
# Python compiled
*.pyc binary
*.pyd binary
*.so binary
*.egg binary
*.whl binary
# Images / plots
*.png binary
*.jpg binary
*.jpeg binary
*.svg text
# Shell scripts
*.sh text eol=lf
Jupyter Notebooks and -diff
Jupyter .ipynb files are JSON with embedded outputs (images encoded as base64, HTML tables, etc.). Standard diffs of notebooks are nearly unreadable. Using text -diff ensures line ending normalization while suppressing noisy diffs.
For better notebook diffs, consider tools like nbdime:
# In .gitconfig
[diff "jupyternotebook"]
command = git-nbdiffdriver diff
[merge "jupyternotebook"]
command = git-nbmergedriver merge %O %A %B %L %P
# In .gitattributes (with nbdime)
*.ipynb text diff=jupyternotebook merge=jupyternotebook
Model and Data Files
Trained models (.pt, .onnx, .pb) and data files (.parquet, .hdf5) are binary. For large models, consider Git LFS:
*.pt filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
Use Case
Machine learning and data science teams working with Python, Jupyter notebooks, and large datasets need these attributes to handle the variety of file formats in their workflows. Proper configuration prevents notebook merge conflicts and model file corruption.