Hold on just a sec...
3 credits
Spring 2026 Lecture Upper DivisionThis course introduces essential computational methods in modern data science, focusing on simulation, resampling, Bayesian data analysis, and the utilization of large language models (LLMs) in data science workflows. Students will learn foundational simulation techniques, such as random variable generation through the inverse cumulative distribution function and rejection sampling. The course provides an overview of Frequentist and Bayesian inference, highlighting their theoretical foundations and practical applications. Key resampling methods, including bootstrapping and cross-validation, will be explored as tools for assessing variability, constructing confidence intervals, and validating predictive models. The course also emphasizes the practical and responsible use of LLMs in data science pipelines, including tasks such as text preprocessing, feature extraction, and leveraging pre-trained models to enhance data analysis and annotation. The course culminates in a capstone project where students will synthesize their learning by designing and implementing comprehensive solutions to real-world data science problems, demonstrating both theoretical understanding and practical implementation skills.
Learning Outcomes1Apply simulation techniques, including Monte Carlo methods, transformation approaches, and rejection sampling, to analyze probabilistic behavior in data science applications. They will compare and evaluate Frequentist and Bayesian inference paradigms by examining their theoretical foundations, identify their strengths and limitations, and explain their roles in statistical modeling and decision-making. Students will design, implement, and assess resampling methods, focusing on both nonparametric and parametric forms of the bootstrap, to estimate variability, construct confidence intervals, and improve statistical estimates through bias correction techniques. They will learn the principles of cross-validation, analyze its role in model assessment, and apply it to compute model performance metrics, detect overfitting and underfitting, and select models with reliable predictive accuracy using Python libraries.
2Construct and interpret posterior distributions and credible intervals, apply Markov Chain Monte Carlo methods to approximate posteriors, and evaluate the role of prior distributions in Bayesian inference. In addition, students will utilize large language models in creative and practical ways within data science workflows, such as contextual data augmentation, feature engineering, and integrating structured and unstructured data to enhance predictive models, addressing challenges such as privacy and reliability. The course will culminate in a capstone project where students will synthesize selected course topics to design, develop, and present robust solutions to real-world data science challenges. By integrating computational methods, statistical principles, and real-world datasets, students will demonstrate their ability to create effective, data-driven solutions that showcase both theoretical understanding and applied expertise.