Recursos del curso¶
Bibliografía base (libros)¶
- Ciencia de Datos para Gente Sociable (Brust, 2023)
- Mining of Massive Datasets (Leskovec, Rajaraman, Ullman, 2023)
- Feature Engineering for Machine Learning (Zheng, Casari, 2018)
- Representation in Machine Learning (Murty, Avinash, 2023)
Documentación oficial (herramientas)¶
- Pandas
- NumPy
- Matplotlib
- Seaborn
- Scikit-learn (User Guide)
- Prefect 2 – Flows
- Tasks
- Apache Airflow
- PySpark API
- Spark Docs
- Spark Structured Streaming
- Fairlearn
- DVC
- Docker
- pytest
- MkDocs
- Material for MkDocs
Rutas de aprendizaje Kaggle (notebooks/lecturas)¶
- Pandas
- Creating, Reading and Writing
- Indexing, Selecting & Assigning
- Summary Functions and Maps
- Grouping and Sorting
- Data Visualization
- Hello Seaborn
- Distributions
- Choosing Plot Types and Custom Styles
- Data Cleaning
- Handling Missing Values
- Inconsistent Data Entry
- Scaling and Normalization
- Intermediate ML
- Data Leakage
- Pipelines
- Feature Engineering
- Creating Features
- What is Feature Engineering?
- Target Encoding
- Principal Component Analysis
- Time Series
- Trend
- Seasonality
- Forecasting with Machine Learning
- Hybrid Models
- Intro to AI Ethics
- Identifying Bias in AI
- AI Fairness
- Model Cards
- ML Explainability
- Use Cases for Model Insights
- Permutation Importance