Requisites Documentation
This document describes the ways in which subject prerequisites and corequisites (known together as requisites) are represented in the CIM Courses product, the Container/Template Subject Structure (CTSS) database, and in the mit-subjects API.
Document contents:
Background
Historically, the MIT course catalog or subject listings have presented a subject’s requisites as a single text field written in English, with some conventions to indicate requisite characteristics. For example:
Subject | Requisites | Requisite Meaning |
---|---|---|
8.01 (Physics I) | None | There are no pre- or co-requisites |
8.022 (Physics II) | Physics I (GIR); Coreq: Calculus II (GIR) | To take 8.022, you must have completed the Physics I (GIR); you must also be taking the Calculus II (GIR) (i.e. Calculus II is a corequisite). |
8.05 (Quantum Physics II) | 8.04 | To take 8.05, you must have completed 8.04 |
8.07 (Electromagnetism II) | 8.03 and 18.03 | To take 8.07, you must have completed both 8.03 and 18.03 |
8.18 (Research Problems) | Permission of instructor | To take 8.18 you need only the instructor’s permission |
8.224 (Exploring Black Holes) | 8.033 or 8.20 | To take 8.224, you must have completed either 8.033 or 8.20 |
8.226 (43 Orders of Magnitude) | (8.04 and 8.044) or permission of instructor | To take 8.226, you must either have the instructor’s permission, or you must have completed both 8.04 and 8.044 |
8.241 (Intro to Biological Physics) | Physics II (GIR) and (5.60 or 8.044) | To take 8.241, you must have completed the Physics II GIR and you must also have completed either 5.60 or 8.044 |
As can be seen in the table above, parentheses,“ands”, and “ors” are used to logically express the combinations of pre- co-requisites for a subject.
This text format is generally comprehensible to a human being, but there are often inconsistencies in formatting which make it impossible for a software program to interpret the requisite rules. For example, “and” is sometimes represented by a semi-colon as for subject 8.022 above.
Requisites in the Container/Template Subject Structure (CTSS)
One of the goals of the CIM Courses implementation was to build a new database structure for subjects. This structure was designed to better represent the usage of subject data at MIT. A central aspect of this data model redesign was the representation of subjects as Containers and Templates, hence the general name CTSS for the new database. Another goal of this redesign was to represent requisites in a structured way that could improve the consistency of data entry, and could enable software systems to interpret and analyze requisites correctly.
A subject’s requisites are now represented as a tree structure in the Java code. In the database, the tree structure is stored as a set of rows, each row representing a node in the tree.
The Tree Structure
The tree structure (see https://en.wikipedia.org/wiki/Tree_(data_structure)) that we use in the code to represent the requisites stores the operators and operands in the requisites descriptor as a set of linked tree nodes. From the two examples above, the trees look like this:
8.03 and 18.03
(8.04 and 8.044) or permission of instructor
For more thorough information on tree representations of requisite data, see https://wikis.mit.edu/confluence/display/EduSys/Requisite+Tree+Tutorial
Database Structure
The requisite tree structure is stored in the database as a set of rows, one row for each node in the tree. Each row contains references to its parent node so that the data can be read into a Java tree structure when needed.
The database table holding this information is:
SUBJECT_TMPL_REQUISITE Name Null Type ------------------------ ------ ----------------- SUBJECT_TMPL_REQUISITE_ID NOT NULL VARCHAR2(32) SUBJECT_TEMPLATE_ID NOT NULL VARCHAR2(32) REQUISITE_TIMING VARCHAR2(1) REQUISITE_TYPE_CODE NUMBER(4) REQUISITE_VALUE VARCHAR2(150 CHAR) COMPOSITE_REQ_OPERATION VARCHAR2(3 CHAR) PARENT_REQ_ID VARCHAR2(32) CREATE_BY NOT NULL VARCHAR2(20 CHAR) CREATE_DATE NOT NULL TIMESTAMP(6) MODIFY_BY VARCHAR2(20 CHAR) MODIFY_DATE TIMESTAMP(6)
The relationship between a subject template and its requisites is one-to-many:
The columns in the table and their meanings are:
Column | Description |
---|---|
SUBJECT_TMPL_REQUISITE_ID |
Primary Key: unique identifier for the row. |
SUBJECT_TEMPLATE_ID |
Foreign key: identifies the subject template that the requisite belongs to. |
REQUISITE_TIMING |
Indicates whether the requisite is a Prerequisite or Corequisite. Value (P, C, or null). Rows that represent an AND or OR operator must have null in the Timing column. All other rows must have P or C. |
REQUISITE_TYPE_CODE |
Describes the type of requisite. Values:
|
REQUISITE_VALUE |
Value depends on type:
|
COMPOSITE_REQ_OPERATION |
If the row is a Composite requisite (represents an AND or OR operator), this column has value "AND" or "OR", otherwise it is null. |
PARENT_REQ_ID |
Primary key of the row that represents this row's parent node in the tree structure. This column is null if the row is the root node (top level) of the tree. The column must not be null if the row is not a root node. |
CREATE_BY |
ID of the user or process that created the row |
CREATE_DATE |
Date/time when the row was created |
MODIFY_BY |
ID of the user or process that last modified the row |
MODIFY_DATE |
Date/time when the row was last modified |
Requisite Processing
Now we understand the data, we can get into how the requisite data moves around in the mit-subjects API. The primary Java data structure used for requisite data in the API is the RequisiteNode class. Each object of this class represents a single "node" in the requisites tree (for a given subject), and contains references to the other linked nodes in the tree; the parent node and the child nodes. Much of the code involves translating from an input data format to this RequisiteNode tree, and from the RequisiteNode tree to an output format.
The main processes that send requisites data back and forth are:
- The CIM Inbound feed: this is data moving from CIM Courses into the CTSS database structures.
- The "get requisites" API endpoint: retrieves data from CTSS in JSON format through an HTTP API.
- The "update requisites" API endpoint: receives requisite data in JSON format through an HTTP API and stores the data in CTSS.
- The "flatten requisites" API endpoint: returns a simple standard English text representation of the requisites stored in CTSS (e.g "Physics II (GIR) and (5.60 or 8.044)")
- The backfill process: synchronizes the CTSS data with the legacy MITSIS subject data so that older applications using legacy subject tables can see current data.
CIM Inbound Feed
The mit-subjects API periodically runs a scheduled process that retrieves a file of subject data from Courseleaf (using SFTP) and stores the data in the CTSS tables. This process is known as the "CIM Inbound Feed". The file retrieved from Courseleaf typically contains data for a collection of subjects, and is in XML format. The structure of the data is defined by the CIM Courses product. Our inbound feed translates this XML format into a format suitable for storing in CTSS database tables.
For each subject's requisites, the data is contained in an XML section named "mit_prereqs". This section contains a collection of "row" elements, each "row" taken from a line in the CIM Courses UI. The data is logically in "infix" format (see https://en.wikipedia.org/wiki/Infix_notation).
(Also included is the text representation of the data in the "prereq_example" section; this element is just informational and is not processed by the inbound feed). This is an XML snippet showing an example of one subject's requisites as they might appear in the XML data:
An early part of the inbound feed process converts the XML data to Java (Map) format. The rest of this section focuses only on the part of the inbound feed that processes the subject requisites sections of the converted Courseleaf data. The inbound feed parses the requisites data and converts from the Courseleaf (infix) format to "prefix" format. "Prefix" notation separates a subject requisites descriptor into Operators (ANDs and ORs) and Operands (Subject numbers, free-text pieces like “Permission of Instructor”). The notation’s convention places an operator before its operands. See https://en.wikipedia.org/wiki/Polish_notation for more details.
Here are some examples from the above table. Prefix notation does not usually include parentheses, but they are shown here as they make the structure easier to visualize:
Standard English Format | Prefix Format |
---|---|
8.03 and 18.03 | AND 8.03 18.03 |
(8.04 and 8.044) or permission of instructor | OR (AND 8.04 8.044) permission of instructor |
The data is then stored in the "tree" format in the form of RequisiteNode objects - this is done by the RequisitesParser class. The requisite data is then transformed to JSON before being passed to the main requisites processing flow.
The requisites are further processed by calling the flow "updateRequisitesApiEndpointFlow" in the XML file "requisites.xml" (see the "Update Requisites API Endpoint" section below for details).
Get Requisites API Endpoint
This endpoint retrieves the requisites in JSON format for a given container/template combination. The interface is described in the RAML and the generated API documentation for mit-subjects.
The endpoint URI is: GET /subjects/containers/{containerId}/templates/{templateId}/requisites
The flow that handles this endpoint is "getTemplateRequisitesApiEndpointFlow" in "requisites.xml".
Update Requisites API Endpoint
This endpoint accepts an HTTP POST of requisites in JSON format, and updates or creates requisites for a given container. The interface is described in the RAML and the generated API documentation for mit-subjects.
The endpoint URI is: POST /subjects/containers/{containerId}/requisites
The flow that handles this endpoint is "updateRequisitesApiEndpointFlow" in "requisites.xml". Note that this is the same flow used by the CIM Inbound feed (described above) to create/update requisites.
The logic flow is roughly:
- Validate the requisites JSON data against the JSON schema
- Convert the JSON data to "tree" format in the form of RequisiteNode objects
- Validate the requisites tree
- Prepare the requisites as a list of RequisiteNode objects ready for inserting into the SUBJECT_TMPL_REQUISITE table in the database.
- Store the requisites either on a new template or an existing template, depending on the supplied "effective from" term code. Note that when storing requisites on an existing template, we delete the current requisites for the template and add all the incoming requisites as new records for the template.
- Store the data necessary for the "backfill" process, which will synchronize the changes in requisite data with the legacy MITSIS subject tables.
Flatten Requisites Flow
This flow retrieves the requisites for a given template and converts them to a readable text format, e.g. "(8.04 and 8.044) or permission of instructor". (See Requisite Display below for more details.) The flow is used in various places in the mit-subjects application (inbound feed and backfill) and is also accessible via an HTTP URL for testing (URL is GET /test/flattenedRequisites/{templateId}). This URL is not part of the official consumer API.
The flattened text version of the requisites is not stored in the database, so any part of the application that needs requisites in this format calls this flow. These include subject searches and the creation of backfill queue records.
Requisite Display
The structured tree format is converted (aka flattened) to a display format to be used in the Bulletin, Online Subject Listing, etc. A strict set of rules governs the order of the elements, the punctuation used, and the selection of subject number to display for subject-type requisites. The rules were chosen to to maximize the clarity and readability of the display format.
General Handling
If there are no requisites, the display value is "None".
Multiple coreqs in sequence at the same level of nesting will be surrounded by [].
Normally, the first letter of the display value will be capitalized. If the 2nd letter is already capitalized, we assume that the first word doesn't conform to standard capitalization rules and leave its capitalization as entered.
Simple Requisite Display
Freetext Requisites
Freetext requisites display as entered.
GIR Requisites
GIRs display as "GIR:<value>", eg "GIR:CHEM2".
Permission Requisites
Permission displays as "permission of instructor"
Subject Requisites
The main subject number of one of the required container's templates is displayed. The template used is the one whose effective_from_term is closest to the requiring template's effective_thru_term without going over. This will be the template that is in effect during the effective_thru_term of the requiring template, or the most recent one before then if the required container was inactive during the requiring template's effective_thru_term. In the event that there is no such template, the requirement will be displayed as "/<containerId>/". (This would only happen if a container were listed as a requisite before its earliest template.)
Example 1
Subject A
Template ID* | Effective From Term | Effective Thru Term | Main Subject Number |
---|---|---|---|
A1 | 2015FA | 2017SU | 1.111 |
A2 | 2018FA | 2019SU | 2.222 |
A3 | 2020FA | 999999 | 3.333 |
Subject B
Template ID* | Effective From Term | Effective Thru Term | Main Subject Number |
---|---|---|---|
B1 | 2019FA | 999999 | 4.444 |
Subject A is a requisite of template B1. The main number from Template A3 will be used because 2020FA is closest to 999999. Therefore, 3.333 will be displayed.
Example 2
Subject C
Template ID* | Effective From Term | Effective Thru Term | Main Subject Number |
---|---|---|---|
C1 | 2015FA | 2017SU | 1.111 |
C2 | 2020FA | 999999 | 3.333 |
Subject D
Template ID* | Effective From Term | Effective Thru Term | Main Subject Number |
---|---|---|---|
D1 | 2018FA | 2019SU | 4.444 |
Subject C is a requisite of template D1. The main number from Template C1 will be used because 2015FA is closest to 2019SU without going over. Therefore 4.444 will be used.
* Template IDs simplified for illustrative purposes. Actual template ids are guids.
Composite Requisite Display
Top-Level with distinct corequisite
A composite requisite at the root of the tree (top-level of nesting) the contains either a simple corequisite or a composite requisite consisting only of corequisites is separated into three parts:
- non-permission prerequisites and mixed prereq/coreq compound requisites
- the simple corequisites or the corequisite-only composites
- permission prerequisites
The three parts are flattened independently according to the rules for "other levels" below, except that no outer parenthesis are used. If the top level composite is an "AND", the parts are combined with ";". If it is an "OR", they are combined with "; or"
Examples:
- 12.810; or [12.843]
- [7.492 or 7.493]; permission of instructor
- 1.050; or [GIR:CHEM]; or permission of instructor
Top-Level without distinct corequisite
A composite requisite at the root of the tree (top-level of nesting) the does not contain a simple corequisite or corequisite-only composite is handled like any other level, except that there are no outer parenthesis.
Other levels
Composite requisites are surrounded by ().
When the composite contains two elements, the elements are separated by the operation. When there are three or more, they are separated by commas, with the operation preceding the last element.
Examples:
- 6.033 and 6.042
- 1.010, 1.011, and 1.036
- 18.745 or 21M.100
- 8.282, 12.409, or 18.181
The sort order for elements at the same level of nesting is:
- GIR sorted alphabetically
- SUBJECTS sorted alphabetically*
- FREETEXT sorted alphabetically
- COMPOSITE
- sorted first by number of elements
- then by number of leaves
- then by first atomic elements
- PERMISSION
Some diagrams of trees sorted in the above order can be seen in the attached file: RequisiteTreeSortExamples.png. [Source Gliffy image at https://wikis.mit.edu/confluence/display/EduSys/Requisite+Sorted+Trees]
Notes and Known Issues
1. The flattened display requisites aren't stored. They are calculated on demand. This is a non-trivial performance hit. Note that if the flattened display requisites were stored, they would need to be updated, not only when the requisites changed, but also when a required container's data changed, since the relevant main number may have changed.
2. As described above, the number displayed for each subject requisite will be the same for a given requiring template, regardless of the term for which the requisites are displayed. A more sophisticated approach would be to calculate the display based on a reference term that could be specified when the display data is requested.
3. It's unclear why the top-level handling is different when there is a distinct coreq than when there isn't. (The earliest drafts of proposed display formats show prereq/mixed and permission with the semi-colon.) However, the display has been generated as described above since the project went live, and there have been been no problems or concerns expressed by users.
4. Top-level permission entered as a coreq will not be separated by a semicolon. Example: "[1.456 or permission of instructor]" is generated rather than "[1.456]; or permission of instructor".