Did you know that being a data scientist is ranked as the top career in America? Learn more about Data Science from Columbia University’s Machine Learning for Data Science and Analytics course team.
So much depends on algorithms, the recipe-like instructions that underlie modern car engines, navigation tools, music streaming services and so much else. Professors at Columbia University’s Data Science Institute will explain and explore the fundamental role of algorithms in data science and modern life in part two of Columbia’s XSeries on Data Science and Analytics, Machine Learning for Data Science and Analytics.
The course begins with computer science and industrial engineering professor Cliff Stein, author of the best-selling textbook Introduction to Algorithms. Professor Stein explains how to match the correct algorithms for each problem and how to evaluate algorithms’ speed and accuracy. He will cover commonly used techniques such as sorting and searching, and introduce the concept of greedy algorithms and dynamic programming using graphs, networks and large bodies of text as examples. He will also show how popular scheduling and mapping tools make use of these techniques.
In the third module, computer science professor Mihalis Yannakakis will explore hashing and search trees, and data structures for representing sets of objects that support basic operations such as insertion, deletion and search. You will learn how dynamic and linear programming can be used to model and solve optimization problems in many fields. The module ends with discussion of so-called NP-complete problems that are so complex they can’t be solved in a realistic amount of time.
In the fourth module, computer science professor Itsik Pe’er will show how algorithms are being applied to massive amounts of genomic data, moving us closer to a healthcare model where prevention and treatment are tailored to an individual’s unique genome. Pe’er will cover the computational challenges of processing the billions of snippets of DNA contained in one person’s genome and linking genetic variations to disease in individuals and groups. Personal case studies will be reviewed.
The last half of the course will focus on machine learning techniques and the kinds of prediction problems that can and can’t be solved with algorithms. Statistics professor Peter Orbanz will cover basic machine learning principles and commonly used methods such as model selection, cross validation and classification — including linear classifiers and random forests.
Computer science and statistics professor David Blei, who pioneered a popular text-mining technique called topic modeling, will explain how probabilistic models can uncover hidden themes in large bodies of text. Probabilistic modeling can summarize texts and form predictions, providing customized data analysis useful in science, industry and government.
Computer science lecturer Ansaf Salleb-Aouissi will end the course with a case study from her own research showing that machine learning methods can dramatically improve doctors’ abilities to identify mothers at risk of giving birth too early, a $26 billion public health problem. Salleb-Aouissi will summarize efforts to clean and analyze data tied to 3,000 pregnancies while emphasizing the importance of understanding the data as it is prepared for analysis. The module will introduce support vector machines and their application in this research.
Learn about data science and enroll in Machine Learning for Data Science and Analytics today!
17 Aug 2017