Class: Meeting from 1:00-2:15 pm on Monday and Wednesday, in Bass 305. 

Discussion section: Monday 2:15-3:15pm (Bass 405); Monday 4:00-5:00pm (Bass 405 or Bass 205). 

Course Description

Bioinformatics encompasses the analysis of gene sequences, macromolecular structures, and functional genomics data on a large scale. It represents a major practical application for modern techniques in data mining and simulation. Specific topics to be covered include sequence alignment, large-scale processing, next-generation sequencing data, comparative genomics, phylogenetics, biological database design, geometric analysis of protein structure, molecular-dynamics simulation, biological networks, normalization of microarray data, mining of functional genomics data sets, and machine learning approaches for data integration.

Different headings for this class (4 variants)

CB&B752/CPSC752 - grad. w/ programming
This graduate-level version of the course consists of lectures, programming assignments, and a final programming project.

MB&B452/MCDB452 - undergrad. 
This undergraduate version of the course consists of lectures, written problem sets, and a final (semi-computational section and a literature survey) project. 

MB&B752/MCDB752 - grad. w/o programming 
This graduate-level version of the course consists of lectures, written problem sets, and a final (semi-computational section and a literature survey) project. Unlike CBB752, there is no programming required. 

MB&B 753a3/MB&B 754a4 - Modules
For graduate students the course can be broken up into two "modules" (each counting 0.5 credit towards MB&B course requirement):
753 - Bioinformatics: Practical Application of Data Mining (1st half of term)
754 - Bioinformatics: Practical Application of Simulation (2nd half of term)
Each module consists of lectures, written problem sets, and a final, graduate level written project that is half the length of the full course's final project.

This is allowed but we'd prefer if you would register for the class. 


The course is keyed towards CBB graduate students as well as advanced MB&B undergraduates and graduate students wishing to learn about types of large-scale quantitative analyses that whole-genome sequencing will make possible. It would also be suitable for students from other fields such as computer science or physics wanting to learn about an important new biological application for computation.

Students should have:
(1) A basic knowledge of biochemistry and molecular biology. 
(2) A knowledge of basic quantitative concepts, such as single variable calculus, basic probability and statistics, and basic programming skills.

These can be fulfilled by: MBB 200 and Mathematics 115 or permission of the instructor.

We realize that students with diverse backgrounds will be taking the course and will be willing to recommend supplementary reading and/or MOOCs to help with the background to specific topics. 

Class Requirements
Discussion Section / Readings

Papers will be assigned throughout the course. These papers will be presented and discussed in weekly 60-minute sections with the TFs. A brief summary (a half-page per article) should be submitted at the beginning of the discussion session.

Bioinformatics quizzes

There will be four short quizzes (25 minutes) in class comprising SIMPLE questions that you should be able to answer from the lectures plus the main readings.

Answer keys to Quizzes 1-4 in the fall of 2012 can be found here

Programming Assignments (CBB and CS) and Programming issues

There will be several short programming assignments required for CBB and CS students taking this course. Acceptable languages and submission requirements will be discussed prior to the first assignment. These assignments are NOT required for students not taking the CBB or CS sections of the course.

These are the programming languages that we permit in the programming assignments and final project: Perl, Python, C, C++, MATLAB and R. If you really feel more comfortable with other languages, please email the TFs to discuss. Also, packages such as BioPerl and BioPython are not allowed in the assignments and final project. If in doubt, please consult the TFs.

We recommend the use of PERL for most of the programming. A useful resource is the following book: Programming Perl, 3rd Edition in the O' Reilly series, by Larry Wall, Tom Christiansen, Jon Orwan. The Yale Library has also older editions, which would work too. We would also recommend the following online resources: and Otherwise, Google is your best friend.

Pages from previous years

2015 is the 18th time Bioinformatics has been taught at Yale. Pages for the 17 previous iterations of the class are available. Look at how things evolve!  2014 spring, 2013 fall, 2012 spring, 2011 spring, 2010, 2009 and earlier (12 years of classes, staring in '98). (Note the pre-2010 course was Genomics & Bioinformatics; after 2010, the course contains all of the "Bioinformatics" of previous years and then more (!) with less "Genomics".)

  • Practice Quiz 3 Hi Everyone,Here is a practice quiz 3 from Prof O'Hern.  The quiz will be administered in the last ~30 min of class on Wednesday.  Happy studying!Michael
    Posted Apr 21, 2015, 6:35 AM by Michael Schoenberg
Showing posts 1 - 1 of 5. View more »
NameDue DateDescription
Showing 0 items from page Final Project sorted by Due Date, edit time. View more »
Showing 0 files from page Section Readings and MOOC Lectures.