Requisites Documentation

This document describes the ways in which subject prerequisites and corequisites (known together as requisites) are represented in the CIM Courses product, the Container/Template Subject Structure (CTSS) database, and in the mit-subjects API.

Document contents:

Requisites Documentation

Background
Requisites in the Container/Template Subject Structure (CTSS)

The Tree Structure
Database Structure

Requisite Processing

CIM Inbound Feed
Get Requisites API Endpoint
Update Requisites API Endpoint
Flatten Requisites Flow

Requisite Display

General Handling
Simple Requisite Display

Freetext Requisites
GIR Requisites
Permission Requisites
Subject Requisites

Composite Requisite Display

Top-Level with distinct corequisite
Top-Level without distinct corequisite
Other levels

Notes and Known Issues

Background

Historically, the MIT course catalog or subject listings have presented a subject’s requisites as a single text field written in English, with some conventions to indicate requisite characteristics. For example:

Subject	Requisites	Requisite Meaning
8.01 (Physics I)	None	There are no pre- or co-requisites
8.022 (Physics II)	Physics I (GIR); Coreq: Calculus II (GIR)	To take 8.022, you must have completed the Physics I (GIR); you must also be taking the Calculus II (GIR) (i.e. Calculus II is a corequisite).
8.05 (Quantum Physics II)	8.04	To take 8.05, you must have completed 8.04
8.07 (Electromagnetism II)	8.03 and 18.03	To take 8.07, you must have completed both 8.03 and 18.03
8.18 (Research Problems)	Permission of instructor	To take 8.18 you need only the instructor’s permission
8.224 (Exploring Black Holes)	8.033 or 8.20	To take 8.224, you must have completed either 8.033 or 8.20
8.226 (43 Orders of Magnitude)	(8.04 and 8.044) or permission of instructor	To take 8.226, you must either have the instructor’s permission, or you must have completed both 8.04 and 8.044
8.241 (Intro to Biological Physics)	Physics II (GIR) and (5.60 or 8.044)	To take 8.241, you must have completed the Physics II GIR and you must also have completed either 5.60 or 8.044

As can be seen in the table above, parentheses,“ands”, and “ors” are used to logically express the combinations of pre- co-requisites for a subject.
This text format is generally comprehensible to a human being, but there are often inconsistencies in formatting which make it impossible for a software program to interpret the requisite rules. For example, “and” is sometimes represented by a semi-colon as for subject 8.022 above.

Requisites in the Container/Template Subject Structure (CTSS)

One of the goals of the CIM Courses implementation was to build a new database structure for subjects. This structure was designed to better represent the usage of subject data at MIT. A central aspect of this data model redesign was the representation of subjects as Containers and Templates, hence the general name CTSS for the new database. Another goal of this redesign was to represent requisites in a structured way that could improve the consistency of data entry, and could enable software systems to interpret and analyze requisites correctly.

A subject’s requisites are now represented as a tree structure in the Java code. In the database, the tree structure is stored as a set of rows, each row representing a node in the tree.

The Tree Structure

The tree structure (see https://en.wikipedia.org/wiki/Tree_(data_structure)) that we use in the code to represent the requisites stores the operators and operands in the requisites descriptor as a set of linked tree nodes. From the two examples above, the trees look like this:

8.03 and 18.03

(8.04 and 8.044) or permission of instructor

For more thorough information on tree representations of requisite data, see https://wikis.mit.edu/confluence/display/EduSys/Requisite+Tree+Tutorial

Database Structure

The requisite tree structure is stored in the database as a set of rows, one row for each node in the tree. Each row contains references to its parent node so that the data can be read into a Java tree structure when needed.

The database table holding this information is:

SUBJECT_TMPL_REQUISITE


Name                      Null Type  
------------------------ ------ -----------------
SUBJECT_TMPL_REQUISITE_ID NOT NULL VARCHAR2(32)       
SUBJECT_TEMPLATE_ID       NOT NULL VARCHAR2(32)  
REQUISITE_TIMING                   VARCHAR2(1)
REQUISITE_TYPE_CODE                NUMBER(4)
REQUISITE_VALUE                    VARCHAR2(150 CHAR)
COMPOSITE_REQ_OPERATION            VARCHAR2(3 CHAR)
PARENT_REQ_ID                      VARCHAR2(32)
CREATE_BY                 NOT NULL VARCHAR2(20 CHAR)  
CREATE_DATE               NOT NULL TIMESTAMP(6)  
MODIFY_BY                          VARCHAR2(20 CHAR)
MODIFY_DATE                        TIMESTAMP(6)

The relationship between a subject template and its requisites is one-to-many:

The columns in the table and their meanings are:

Column	Description
SUBJECT_TMPL_REQUISITE_ID	Primary Key: unique identifier for the row.
SUBJECT_TEMPLATE_ID	Foreign key: identifies the subject template that the requisite belongs to.
REQUISITE_TIMING	Indicates whether the requisite is a Prerequisite or Corequisite. Value (P, C, or null). Rows that represent an AND or OR operator must have null in the Timing column. All other rows must have P or C.
REQUISITE_TYPE_CODE	Describes the type of requisite. Values: 1001: Subject - represents a subject requisite 1002: GIR - GIR requisite 1003: Freetext - arbitrary text requisite 1004: Permission - represents the special "permission of instructor" requisite 1005: Composite - AND or OR operator
REQUISITE_VALUE	Value depends on type: For type 1001 (subject) this value will be a container ID For type 1002 (GIR) this value will describe the GIR (e.g. PHY1, BIOL, CHEM) For type 1003 (free text) this value will be a text value entered by a user For type 1004 (permission) this value will be "permission of instructor" For type 1005 (composite) this value will be null
COMPOSITE_REQ_OPERATION	If the row is a Composite requisite (represents an AND or OR operator), this column has value "AND" or "OR", otherwise it is null.
PARENT_REQ_ID	Primary key of the row that represents this row's parent node in the tree structure. This column is null if the row is the root node (top level) of the tree. The column must not be null if the row is not a root node.
CREATE_BY	ID of the user or process that created the row
CREATE_DATE	Date/time when the row was created
MODIFY_BY	ID of the user or process that last modified the row
MODIFY_DATE	Date/time when the row was last modified

Requisite Processing

Now we understand the data, we can get into how the requisite data moves around in the mit-subjects API. The primary Java data structure used for requisite data in the API is the RequisiteNode class. Each object of this class represents a single "node" in the requisites tree (for a given subject), and contains references to the other linked nodes in the tree; the parent node and the child nodes. Much of the code involves translating from an input data format to this RequisiteNode tree, and from the RequisiteNode tree to an output format.

The main processes that send requisites data back and forth are:

The CIM Inbound feed: this is data moving from CIM Courses into the CTSS database structures.
The "get requisites" API endpoint: retrieves data from CTSS in JSON format through an HTTP API.
The "update requisites" API endpoint: receives requisite data in JSON format through an HTTP API and stores the data in CTSS.
The "flatten requisites" API endpoint: returns a simple standard English text representation of the requisites stored in CTSS (e.g "Physics II (GIR) and (5.60 or 8.044)")
The backfill process: synchronizes the CTSS data with the legacy MITSIS subject data so that older applications using legacy subject tables can see current data.

CIM Inbound Feed

The mit-subjects API periodically runs a scheduled process that retrieves a file of subject data from Courseleaf (using SFTP) and stores the data in the CTSS tables. This process is known as the "CIM Inbound Feed". The file retrieved from Courseleaf typically contains data for a collection of subjects, and is in XML format. The structure of the data is defined by the CIM Courses product. Our inbound feed translates this XML format into a format suitable for storing in CTSS database tables.

For each subject's requisites, the data is contained in an XML section named "mit_prereqs". This section contains a collection of "row" elements, each "row" taken from a line in the CIM Courses UI. The data is logically in "infix" format (see https://en.wikipedia.org/wiki/Infix_notation).

(Also included is the text representation of the data in the "prereq_example" section; this element is just informational and is not processed by the inbound feed). This is an XML snippet showing an example of one subject's requisites as they might appear in the XML data:

The inbound feed processing is complex and the flows are contained in the XML file "cim-inbound-feed-main.xml". In production, the feed is handled by the flow "CIM-inbound-feed_poll", which uses SFTP to pull an XML data file from the Courseleaf server. For testing purposes, there is also an HTTP endpoint "POST /test/courseleafFeed" which allows an XML file to be posted and processed from Postman or a similar client.

An early part of the inbound feed process converts the XML data to Java (Map) format. The rest of this section focuses only on the part of the inbound feed that processes the subject requisites sections of the converted Courseleaf data. The inbound feed parses the requisites data and converts from the Courseleaf (infix) format to "prefix" format. "Prefix" notation separates a subject requisites descriptor into Operators (ANDs and ORs) and Operands (Subject numbers, free-text pieces like “Permission of Instructor”). The notation’s convention places an operator before its operands. See https://en.wikipedia.org/wiki/Polish_notation for more details.

Here are some examples from the above table. Prefix notation does not usually include parentheses, but they are shown here as they make the structure easier to visualize:

Standard English Format	Prefix Format
8.03 and 18.03	AND 8.03 18.03
(8.04 and 8.044) or permission of instructor	OR (AND 8.04 8.044) permission of instructor

The data is then stored in the "tree" format in the form of RequisiteNode objects - this is done by the RequisitesParser class. The requisite data is then transformed to JSON before being passed to the main requisites processing flow.

The requisites are further processed by calling the flow "updateRequisitesApiEndpointFlow" in the XML file "requisites.xml" (see the "Update Requisites API Endpoint" section below for details).

Get Requisites API Endpoint

This endpoint retrieves the requisites in JSON format for a given container/template combination. The interface is described in the RAML and the generated API documentation for mit-subjects.

The endpoint URI is: GET /subjects/containers/{containerId}/templates/{templateId}/requisites

The flow that handles this endpoint is "getTemplateRequisitesApiEndpointFlow" in "requisites.xml".

Update Requisites API Endpoint

This endpoint accepts an HTTP POST of requisites in JSON format, and updates or creates requisites for a given container. The interface is described in the RAML and the generated API documentation for mit-subjects.

The endpoint URI is: POST /subjects/containers/{containerId}/requisites

The flow that handles this endpoint is "updateRequisitesApiEndpointFlow" in "requisites.xml". Note that this is the same flow used by the CIM Inbound feed (described above) to create/update requisites.

The logic flow is roughly:

Validate the requisites JSON data against the JSON schema
Convert the JSON data to "tree" format in the form of RequisiteNode objects
Validate the requisites tree
Prepare the requisites as a list of RequisiteNode objects ready for inserting into the SUBJECT_TMPL_REQUISITE table in the database.
Store the requisites either on a new template or an existing template, depending on the supplied "effective from" term code. Note that when storing requisites on an existing template, we delete the current requisites for the template and add all the incoming requisites as new records for the template.
Store the data necessary for the "backfill" process, which will synchronize the changes in requisite data with the legacy MITSIS subject tables.

Flatten Requisites Flow

This flow retrieves the requisites for a given template and converts them to a readable text format, e.g. "(8.04 and 8.044) or permission of instructor". (See Requisite Display below for more details.) The flow is used in various places in the mit-subjects application (inbound feed and backfill) and is also accessible via an HTTP URL for testing (URL is GET /test/flattenedRequisites/{templateId}). This URL is not part of the official consumer API.

The flattened text version of the requisites is not stored in the database, so any part of the application that needs requisites in this format calls this flow. These include subject searches and the creation of backfill queue records.

Requisite Display

The structured tree format is converted (aka flattened) to a display format to be used in the Bulletin, Online Subject Listing, etc. A strict set of rules governs the order of the elements, the punctuation used, and the selection of subject number to display for subject-type requisites. The rules were chosen to to maximize the clarity and readability of the display format.

General Handling

If there are no requisites, the display value is "None".

Multiple coreqs in sequence at the same level of nesting will be surrounded by [].

Normally, the first letter of the display value will be capitalized. If the 2nd letter is already capitalized, we assume that the first word doesn't conform to standard capitalization rules and leave its capitalization as entered.

Simple Requisite Display

Freetext Requisites

Freetext requisites display as entered.

GIR Requisites

GIRs display as "GIR:<value>", eg "GIR:CHEM2".

Permission Requisites

Permission displays as "permission of instructor"

Subject Requisites

The main subject number of one of the required container's templates is displayed. The template used is the one whose effective_from_term is closest to the requiring template's effective_thru_term without going over. This will be the template that is in effect during the effective_thru_term of the requiring template, or the most recent one before then if the required container was inactive during the requiring template's effective_thru_term. In the event that there is no such template, the requirement will be displayed as "/<containerId>/". (This would only happen if a container were listed as a requisite before its earliest template.)

Example 1

Subject A

Template ID*	Effective From Term	Effective Thru Term	Main Subject Number
A1	2015FA	2017SU	1.111
A2	2018FA	2019SU	2.222
A3	2020FA	999999	3.333

Subject B

Template ID*	Effective From Term	Effective Thru Term	Main Subject Number
B1	2019FA	999999	4.444

Subject A is a requisite of template B1. The main number from Template A3 will be used because 2020FA is closest to 999999. Therefore, 3.333 will be displayed.

Example 2

Subject C

Template ID*	Effective From Term	Effective Thru Term	Main Subject Number
C1	2015FA	2017SU	1.111
C2	2020FA	999999	3.333

Subject D

Template ID*	Effective From Term	Effective Thru Term	Main Subject Number
D1	2018FA	2019SU	4.444

Subject C is a requisite of template D1. The main number from Template C1 will be used because 2015FA is closest to 2019SU without going over. Therefore 4.444 will be used.

* Template IDs simplified for illustrative purposes. Actual template ids are guids.

Composite Requisite Display

Top-Level with distinct corequisite

A composite requisite at the root of the tree (top-level of nesting) the contains either a simple corequisite or a composite requisite consisting only of corequisites is separated into three parts:

non-permission prerequisites and mixed prereq/coreq compound requisites
the simple corequisites or the corequisite-only composites
permission prerequisites

The three parts are flattened independently according to the rules for "other levels" below, except that no outer parenthesis are used. If the top level composite is an "AND", the parts are combined with ";". If it is an "OR", they are combined with "; or"

Examples:

12.810; or [12.843]
[7.492 or 7.493]; permission of instructor
1.050; or [GIR:CHEM]; or permission of instructor

Top-Level without distinct corequisite

A composite requisite at the root of the tree (top-level of nesting) the does not contain a simple corequisite or corequisite-only composite is handled like any other level, except that there are no outer parenthesis.

Other levels

Composite requisites are surrounded by ().

When the composite contains two elements, the elements are separated by the operation. When there are three or more, they are separated by commas, with the operation preceding the last element.

Examples:

6.033 and 6.042
1.010, 1.011, and 1.036
18.745 or 21M.100
8.282, 12.409, or 18.181

The sort order for elements at the same level of nesting is:

GIR sorted alphabetically
SUBJECTS sorted alphabetically*
FREETEXT sorted alphabetically
COMPOSITE
1. sorted first by number of elements
2. then by number of leaves
3. then by first atomic elements
PERMISSION

Some diagrams of trees sorted in the above order can be seen in the attached file: RequisiteTreeSortExamples.png. [Source Gliffy image at https://wikis.mit.edu/confluence/display/EduSys/Requisite+Sorted+Trees]

Notes and Known Issues

1. The flattened display requisites aren't stored. They are calculated on demand. This is a non-trivial performance hit. Note that if the flattened display requisites were stored, they would need to be updated, not only when the requisites changed, but also when a required container's data changed, since the relevant main number may have changed.

2. As described above, the number displayed for each subject requisite will be the same for a given requiring template, regardless of the term for which the requisites are displayed. A more sophisticated approach would be to calculate the display based on a reference term that could be specified when the display data is requested.

3. It's unclear why the top-level handling is different when there is a distinct coreq than when there isn't. (The earliest drafts of proposed display formats show prereq/mixed and permission with the semi-colon.) However, the display has been generated as described above since the project went live, and there have been been no problems or concerns expressed by users.

4. Top-level permission entered as a coreq will not be separated by a semicolon. Example: "[1.456 or permission of instructor]" is generated rather than "[1.456]; or permission of instructor".