EN.685 (Data Science) | Johns Hopkins University Academic Catalogue

Courses

EN.685.603. Foundations of Algorithm Analysis. 3 Credits.

This course equips students with the core mathematical tools needed to design, analyze, and implement data science algorithms. Topics are carefully selected to support the computational, statistical, and analytical foundations of modern algorithmic methods. Key focus areas include linear algebra, discrete mathematics, matrix decompositions, optimization theory, and computational complexity. Through hands-on modeling and problem-solving, students will explore real-world applications, uncover algorithmic insights using matrix and graph theory, and build strong mathematical intuition for data-driven problem solving.

EN.685.621. Algorithms for Data Science. 3 Credits.

This course offers an in-depth journey through the algorithmic concepts vital for mastering the intricacies of data science. It begins with an intensive examination of algorithm analysis, with a special focus on understanding the runtime complexities essential for addressing real-world data problems. The curriculum encompasses thorough training in data preprocessing, along with foundational knowledge in probability and statistics, equipping students to proficiently clean and interpret data. The course introduces key mathematical transformations such as Eigen decomposition, FFT, DCT, and Wavelets. These tools are crucial for unearthing underlying patterns in data by creating innovative feature spaces. Students will explore a seamless blend of diverse algorithm types, including intelligent algorithms, statistical algorithms, optimization algorithms, graph algorithms, and learning algorithms. This comprehensive approach, enriched with optimization techniques, forms a holistic toolkit for the contemporary Data Scientist. Moving beyond theoretical concepts, the course delves into practical aspects of analysis, visualization, and understanding of complexity classes. Occasional forays into algorithmic proofs enhance the theoretical grounding of students, bridging theory with practical application. The course culminates in modules focused on data modeling and visualization, enabling students to adeptly apply algorithmic techniques to produce insightful and meaningful data representations. Upon completing this course, students will be thoroughly equipped with both practical and theoretical algorithmic strategies, preparing them to confidently address a wide array of challenges in the data science field. Students can only earn credit for one of EN.605.620, EN.605.621, or EN.685.621.

EN.685.640. Mathematical Reasoning and Structure for Data Science. 3 Credits.

This course provides a rigorous mathematical foundation for the statistical and algorithmic reasoning involved in modern data science. It is designed to prepare students to approach data modeling, simulation, and evaluation with mathematical precision and clarity. Students will explore logic, set theory, combinatorics, linear models from first principles, and essential probability theory with a computational focus. Emphasis is placed on the conceptual structure behind methods such as regression, classification, and clustering, enabling students to understand not only how to use them—but why they work.

EN.685.648. Data Science. 3 Credits.

This course will cover the core concepts and skills in the interdisciplinary field of data science. These include problem identification and communication, probability, statistical inference, visualization, extract/transform/load (ETL), exploratory data analysis (EDA), linear and logistic regression, model evaluation and various machine learning algorithms such as random forests, k-means clustering, and association rules. The course recognizes that although data science uses machine learning techniques, it is not synonymous with machine learning. The course emphasizes an understanding of both data (through the use of systems theory, probability, and simulation) and algorithms (through the use of synthetic and real data sets). The guiding principles throughout are communication and reproducibility. The course is geared towards giving students direct experience in solving the programming and analytical challenges associated with data science. The assignments weight conceptual (assessments) and practical (labs, problem sets) understanding equally. Prerequisite(s): A working knowledge of Python scripting and SQL is assumed as all assignments are completed in Python.

Prerequisite(s): EN.685.652 Data Engineering Principles and Practice or equivalent course.

EN.685.652. Data Engineering Principles and Practice. 3 Credits.

Data Engineering is the ingestion, transformation, storage and serving of data in ways that enable data scientists or applications to use and derive insights from data. In this course, we will look at various file-based data formats, data collection, data cleansing, data transformation, and data modeling for both relational and NoSQL databases. The course will also cover movement of data into data warehouses and/or data lakes using pipelines and workflow automation. Finally, we will discuss data security, governance, and compliance. The format of this course will be a mix of lectures, hands-on demos, and labs. Upon completing this course, students will have a deeper understanding of what a data engineer does and the various technologies that make up data engineering, along with hands-on experience working with various tools and processes.

EN.685.662. Data Patterns and Representations. 3 Credits.

This course will explore the practical application of data visualization and representation, employing lenses such as personas, to understand the different purposes of visualizations. Data visualization plays a crucial role in the entire data science process, serving multiple purposes such as communicating results and insights in a clear and understandable way, facilitating preliminary data exploration, and analyzing outcomes from physics-based or machine learning models and simulations. The course will introduce various tools and equip students with the knowledge to effectively choose the most suitable tool for a given problem. We will also explore various essential tools for data visualization, including Microsoft Excel, Python plotting libraries like matplotlib and plotly, Python graphical interfacing libraries such as streamlit, and Tableau, among others. As a Data Scientist, you will often need to collaborate in cross-functional teams of varying levels of technical expertise and with role-specific requirements. To prepare you for a well-rounded career in Data Science, the course project will focus on connecting stakeholders with appropriate visualization methods and techniques, the aim of which is to enhance your skills in data visualization to effectively communicate insights to diverse audiences.

EN.685.701. Data Science: Modeling and Analytics. 3 Credits.

This course advances the design of data modeling as it applies to the field of data science while leveraging key concepts from AI, machine learning, and statistics. Data modeling is a combination of various fields which allow the processing of various data types, and representing the data in an expressive way that shows the relationships between data points and intrinsic patterns. The course will show how to identify, design, and implement the modeling process by outlining the framework, determining the appropriate model type, evaluating the model, and representing the outputs in an explainable way. The models used will be based on intelligent algorithms (reasoning, optimization, and pattern recognition), machine learning algorithms (supervised and unsupervised), and statistical methods (descriptive statistics, inferential statistics, multi-variate, and regression). The focus will be developing and applying models using Python-based frameworks to datasets from online resources such as Kaggle, Data.gov, and open-source repositories.

EN.685.748. Advanced Data Science. 3 Credits.

This course delves deeper into the computational analytics of Data Science by introducing foundational and advanced methods in Bayesian analysis and causal inference through lectures and hands-on exercises. Topics include probabilistic modeling, regression techniques, propensity scores, difference-in-differences, conditional treatment effects, and advanced methods such as panel data analysis, metalearners, and Gaussian processes. Students will learn to construct, evaluate, and diagnose Bayesian and causal models using tools like PyMC3, Bambi, and metalearners. The course emphasizes practical applications, including addressing biases, leveraging panel data, and extending causal analyses to real-world decision-making. Hands-on exercises reinforce critical concepts, and students will synthesize methods to solve complex problems.

Prerequisite(s): EN.685.648 Data Science

EN.685.795. Capstone Project in Data Science. 3 Credits.

This course permits graduate students in data science to work with a faculty mentor to explore a topic in depth or conduct research in selected areas. Requirements for completion include submission of a significant paper or project. Prerequisite(s): Seven data science graduate courses including two courses numbered 605.7xx or 625.7xx or admission to the post-master’s certificate program. Students must also have permission of a faculty mentor, the student’s academic advisor, and the program chair.

EN.685.801. Independent Study in Data Science I. 3 Credits.

This course permits graduate students in data science to work with a faculty mentor to explore a topic in depth or conduct research in selected areas. Requirements for completion include submission of a significant paper suitable to be submitted for publication. Prerequisite(s): Seven data science graduate courses including two courses numbered 605.7xx or 625.7xx or admission to the post-master’s certificate program. Students must also have permission of a faculty mentor, the student’s academic advisor, and the program chair.

EN.685.802. Independent Study in Data Science II. 3 Credits.

Students wishing to take a second independent study in data science should sign up for this course. Prerequisite(s): EN.605.801 Independent Study in Data Science I and permission of a faculty mentor, the student’s academic advisor, and the program chair. Course Note(s): Students may not receive credit for both EN.685.795 Capstone Project in Data Science and EN.685.802.