AI-Driven Software Engineering for Data Science
IntelaAI-Driven Software Engineering for Data Scientists
A fully self-paced, practice-first course that bridges the gap between Junior and Middle Data Scientists by simulating real company work — without requiring any other humans. You will build and ship a production-style ML service while applying AI‑DSE: AI integrated into every SDLC stage as a productive co-executor, while you remain the Accountable decision-maker.
A key mechanism is the Independent AI Auditor — a separate verification step that checks artifacts produced by both you and the working AI for consistency, hallucinations, vulnerabilities, and missing tests. Every week ships a production increment, like a real team.
Who this course is for
- Junior Data Scientists who can train models but lack production and team workflow experience
- Early-career ML practitioners targeting ML Engineer / Applied Scientist responsibilities
- Analysts transitioning into production ML work
Literature and methodological foundation
This course is informed by foundational software engineering literature, including:
- SWEBOK Guide (Software Engineering Body of Knowledge), which provides a structured view of generally accepted software engineering knowledge across process, quality, testing, configuration management, and professional practice.
- Software Development Lifecycle Models, which provides context on major SDLC approaches such as waterfall, spiral, V-model, RAD, and incremental development.
These references support the course perspective that a DS/ML repository should be treated not merely as an experimentation space, but as a controlled software-engineering system with traceability, quality gates, reproducibility, and auditability.
Prerequisites
- Python: functions, classes, pandas, NumPy
- Basic ML: train/test split, overfitting, common metrics
- Basic Git: clone / commit / push
What you will build
- Reproducible training pipeline + experiment tracking
- Your own dataset and project theme — chosen in Week 1 and used throughout the entire course as the foundation for every artifact you build
- Data validation tests + leakage checks
- Architecture diagram (C4-style) of the ML service — designed by you, generated with Working AI assistance, verified by the Auditor
- Inference API (FastAPI) + Docker
- CI pipeline (tests / lint / build)
- Monitoring hooks + drift signals
- AI‑DSE audit trail (auditor reports + decision logs)
- Promptbook — a living collection of approved prompt patterns, refined throughout the course based on feedback from production, incidents, and audits
- Post-release artifact: tech debt backlog + product evolution plan
Learning outcomes
- Translate vague objectives into testable requirements and acceptance criteria
- Design an evaluation plan with metrics, slicing, and rollback conditions
- Build reproducible pipelines for data prep, training, and inference
- Track experiments and justify model choice with a decision memo
- Design the architecture of an ML service: choose the component structure, document trade-offs, and have the result audited for compliance with NFRs
- Serve a model via a stable API contract, containerize it, and ship with CI
- Add monitoring + drift detection and respond to incidents with a runbook
- Use AI copilots effectively while enforcing Independent AI Audits and quality gates
- Maintain an auditable trail of decisions, risks, and evidence
- Apply AI-RACI in practice: correctly assign Accountable / Responsible / Consulted / Informed roles across all key SDLC activities
- Manage the post-release lifecycle: identify tech debt, assess change impact, and plan product evolution with AI assistance
Format & time budget
- 8 weeks × 10 h/week: 6h mandatory core + up to 4h optional stretch
- Each week: Sprint Planning → Guided Lab → Build & PR → Audit & Gate
- Gates: Requirements, Model Readiness, Merge, Release
- Every PR must include: CI evidence, audit report, decision log update
- Assessment: pass/fail gates + rubric-scored artifacts