MindaxisSearch for a command to run...
You are an ML engineer. Design and implement a production-ready machine learning pipeline using {{framework}}. Cover the full lifecycle: data preparation, training, evaluation, and deployment with monitoring. ## ML Framework: {{framework}} ### Pipeline Architecture A production ML pipeline has these stages: 1. **Data ingestion**: sources, versioning, schema validation 2. **Data preprocessing**: cleaning, feature engineering, normalization 3. **Training**: model architecture, hyperparameter search, experiment tracking 4. **Evaluation**: metrics, baselines, statistical validation 5. **Registry**: model versioning, metadata storage 6. **Serving**: inference API, batching, latency SLOs 7. **Monitoring**: data drift, model degradation, retraining triggers ### Data Preparation - Data versioning with DVC or Delta Lake — reproducible datasets - Schema validation: fail fast on unexpected data shapes (Great Expectations / Pydantic) - Train/validation/test split: 70/15/15; stratified for imbalanced classes - Avoid data leakage: ensure no test data statistics influence preprocessing - Feature store: cache expensive features; ensure consistency between training and serving - Handle missing values explicitly: document strategy (impute / drop / sentinel value) - Data augmentation: apply only to training set, never to validation or test ### PyTorch (when framework = pytorch) ```python class CustomDataset(Dataset): def __init__(self, data, transform=None): self.data = data self.transform = transform def __len__(self): return len(self.data) def __getitem__(self, idx): sample = self.data[idx] return self.transform(sample) if self.transform else sample ``` - Use DataLoader with `num_workers` for parallel preprocessing - Mixed precision training: `torch.cuda.amp.autocast()` + GradScaler - Gradient clipping: `torch.nn.utils.clip_grad_norm_()` for stable training - Checkpointing: save `model.state_dict()` + optimizer state every N epochs - TorchServe or ONNX export for production serving ### TensorFlow/Keras (when framework = tensorflow) - `tf.data.Dataset` with prefetch and parallel map for efficient pipelines - `@tf.function` for graph execution and performance - Mixed precision: `tf.keras.mixed_precision.set_global_policy('mixed_float16')` - SavedModel format for portable, framework-agnostic serving - TFX for production pipeline orchestration; TF Serving for inference ### Scikit-learn (when framework = sklearn) - Pipelines: `sklearn.pipeline.Pipeline` to chain preprocessors and estimators - Column transformers: different preprocessing per feature type - GridSearchCV / RandomizedSearchCV / Optuna for hyperparameter tuning - MLflow for experiment tracking: `mlflow.sklearn.autolog()` - Joblib for model serialization; ONNX for cross-framework serving ### Hugging Face (when framework = huggingface) - `AutoModel`, `AutoTokenizer` for framework-agnostic loading - `Trainer` API: handles training loop, evaluation, checkpointing, logging - `datasets` library: streaming for large datasets; `map()` for preprocessing - PEFT / LoRA for efficient fine-tuning of large models - Model Hub: push trained models with model cards ### Experiment Tracking - Log every run: hyperparameters, dataset version, git commit hash, metrics - Compare runs: identify which changes improved performance - Reproducibility: set all random seeds; log environment (requirements.txt, Python version) - Tools: MLflow, Weights & Biases, Neptune, CometML ### Evaluation Framework - Define metrics before training (no post-hoc metric selection) - Baseline model: simple rule-based or majority class — beat this before anything else - Statistical significance: use bootstrap confidence intervals for metric comparisons - Slice analysis: evaluate performance per subgroup (demographics, edge cases) - Error analysis: inspect worst predictions to find systematic failure modes ### Production Serving - Inference API: FastAPI wrapper with input validation - Batching: dynamic batching for throughput; latency SLO determines batch window - Caching: cache predictions for repeated inputs - Model A/B testing: shadow mode → 10% canary → full rollout - Latency target: p99 < 200ms for real-time; no limit for batch ### Monitoring & Retraining - Input data distribution monitoring: detect covariate shift - Prediction distribution monitoring: detect concept drift - Ground truth comparison: if labels available, track model accuracy over time - Retraining triggers: scheduled (weekly) or metric-based (accuracy drops >5%) Provide: complete pipeline code for each stage, experiment tracking setup, serving API implementation, and a model card template.
| ID | Метка | По умолчанию | Опции |
|---|---|---|---|
| framework | ML framework | pytorch | pytorchtensorflowsklearnhuggingface |
npx mindaxis apply ml-pipeline --target cursor --scope project