应用数据科学3+2路径

New College students who complete the second year of their studies in any area of concentration are encouraged to follow the accelerated 3+2 curriculum provided below, if they are interested in completing both undergraduate and graduate programs in five years. Students who are interested in this option will be eligible only after entering the New College undergraduate program and showing strong academic performance. These applicants must satisfy the following minimum conditions before they can be admitted via the 3+2 pathway:

  • Complete 2 years of study with Satisfactory evaluations in all academic undertakings.
  • 完成前提课程(见下文)
  • 由一名教师推荐参加3+2课程

必备的课程

The following courses must be completed during the first two years of undergraduate study:

  • 数学2400 -微积分1
  • 数学3250 -微积分II
  • CSCI 2200 - Python编程入门
  • CSCI 3250 – Intermediate Python or CSCI 2400 – Object Oriented Programming
  • 数学2200 -概率1 (Mod 1)
  • 数学4550 -概率2 (Mod 2)
  • 数学2320 -线性代数

These courses also count towards satisfying the IDC 5100 Introduction to Data Science Bootcamp course in the graduate program.

IDC 5204 -应用统计I: A statistics course focusing on descriptive and inferential statistics, 以线性回归为主题, 置信区间和假设检验, including probability theory and modern approaches such as resampling, with all methods illustrated in R and a focus on methods relevant for data science using industrial datasets.

IDC 5110 -数据整理和探索性数据分析: 关于重塑的实用方法的课程, 重组, and summarizing relationships in data through exploratory analysis. 预处理的原理和方法, 正常化, 包括验证数据, with an emphasis on collaborative and reproducible research.

IDC 5120 -数据科学算法: 算法的基本原理和性能的度量. Python教学, the course includes an exploration of efficient algorithms for sorting and retrieving data, 图算法和组合优化, 动态规划, 随机算法和近似算法.

IDC 5130数据科学数据库: Fundamentals of traditional database design and management. Various types and comparison of databases including SQL databases (eg. Postgre, SQLite), NoSQL databases, column-oriented databases (eg. HBase)和面向文档的数据库(例如. MongoDb). Consistency, availability, scalability, efficiency and performance in data retrieval and storage.

IDC 5296 -工业研讨会系列I: The first offering of a three-semester long seminar series which hosts professionals and executives as guest speakers from a variety of industrial domains. Each weekly or biweekly seminar covers topics and applications to diverse problems in business via applications of various data science techniques.

IDC 5295 -工业车间: This course offers content modules complementary to the regular coursework of the graduate program in applied data science. 例子包括, 但不限于, 例如道德, 数据科学中的新兴或趋势技术, 特定领域的应用, 工业软件平台或工具, and professional certification modules and exams widely acknowledged in the industry.

IDC 5205 -应用统计II: 统计建模课程, 包括多元线性回归和逻辑回归, 更广泛地说, 广义线性模型. 重点放在模型的制定上, 建筑, 假设, 解释, 预测和评估, with implementation carried out in R and a focus on methods and models relevant for data science using industrial datasets.

IDC 5112 -数据可视化 A project-centered introduction to the visual display of quantitative information for both knowledge discovery and the communication of results. 培养学生, 在这学期的课程中, a visual application in their interest with data collected from an industrial application or project.

IDC 5210 -应用机器学习: Project-based course with a coverage of supervised and unsupervised learning and an emphasis on working with real industrial data. Bayesian analysis and other specific learning paradigms including regression, 聚类, 随机森林, 支持向量机, 内核的方法, 神经网络.

IDC 5131 -分布式计算: Fundamentals concerning the design and maintenance of massively parallel data sets. 非关系数据库及其管理. Algorithms for parallel architectures and associated software tools including the MapReduce/Hadoop framework and BigTable.

IDC 5297 -工业研讨会系列II: The second offering of a three-semester long seminar series that hosts professionals and executives as guest speakers from a variety of industrial domains. Each weekly or biweekly seminar covers topics and applications to diverse problems in business via applications of various data science techniques.

IDC 6293 -工业实习I: Intended as a summer internship or interterm applied project, this course is the first extensive real industry experience opportunity offered to students who would like to put their data science knowledge and skills to practical use. Must be completed with an industrial partner of the program or a company/organization the student chooses to work with, 在数据科学学院的监督下.

IDC 6200 -高级应用统计: 第二门统计建模课程, 与主题的混合,如广义加性模型, 纵向响应模型, 时间序列模型, 生存分析, 统计学习或贝叶斯统计, 重点关注与数据科学相关的模型. Taught with a project-based focus using real industrial data in an applied business context.

IDC 6215 -高级应用计算: 计算机高级主题, including such topics as image processing and object detection, 文本挖掘, 自然语言处理, 循环神经网络, 强化学习. Taught with a project-based focus using real industrial data in an applied business context.

IDC 6250 -实用数据科学: Analysis of data and creation of a data science pipeline and deliverable for industry. 在小组中工作, students analyze an industry-submitted data set starting with exploratory analysis, followed by statistical or machine learning-based model 建筑, and the construction and presentation of a data product to an industry partner.

IDC 6298 -工业研讨会系列III: The third and final offering of a three-semester long seminar series that hosts professionals and executives as guest speakers from a variety of industrial domains. Each weekly or biweekly seminar covers topics and applications to diverse problems in business via applications of various data science techniques.

IDC 6294 -工业实习II: A full semester working in industry as part of a data science team, while under the weekly supervision of and submitting reports to a Data Science faculty. This is the second and final stage of the industrial practicum where the student works in an industrial partner company or organization or in a company of their choice. Performance is assessed both by a faculty advisor and a company supervisor.