Q: What information do Athena machines collect and report?
Answer
Athena workstations report two primary pieces of data back to IS&T servers. The first is called counterlog, and is essentially an Athena machine reporting on its existence. The second is called metrics and only affects public IS&T-owned machines in clusters and quickstations.
counterlog
Why is this collected?: All Athena machines, whether public or private, report their existence back to the central IS&T syslog server once per day. It is important for IS&T to have a count of the total number of Athena machines around campus for multiple reasons. IS&T needs to be able to allocate resources (both money and staff) correctly, and knowing how many Athena machines we have to support is essential. Many of the software license agreements we sign are based on a specific number of public machines throughout campus. If that number changes substantially, we may need to notify the vendor or renegotiate the agreement.
What is collected?: The information sent includes the version of Athena (for 9.x releases) or metapackage installed (for Debathena), the hostname of the machine, the operating system, and the processor type. It also includes a unique identifier generated from the MAC (Ethernet) address of the workstation. For example:
counterlog: w20-575-1.mit.edu linux i686 debathena-workstation 4048dd06645beadca0adba359b1fc731 cron
The script that generates this information may be examined on any Athena workstation. On Athena 9.4 workstations, the script is located at /etc/athena/counterlog. On Debathena workstations, it's located at /usr/lib/debathena-counterlog/athena-counterlog.
Is there any personally identifying information?: No information about the user logged in at the time is collected. Note that the moira record may contain information about the administrative contact or billing contact for a machine, and that such information is available to any member of the MIT community.
metrics
This section only applies to public, IS&T-owned Debathena workstations. The metrics program is not installed on private workstations or Athena 9.4 workstations.
Why is this collected?: In recent years, IS&T management has requested additional information about what Athena is being used for. For example, is it used primarily for web browsing and e-mail? Is it used primarily for MATLAB and LaTeX? Or is it used for some balance between the two? This information is important for several reasons. First, it allows IS&T to justify its space allocation to the rest of the Institute (space in the main buildings has always been in high demand). Additionally, it lets us know what type of hardware is in demand in the Athena clusters. If Athena is being used primary for web browsing, then perhaps some high-end workstations should be replaced with dedicated web and e-mail kiosks. On the other hand, if Athena workstations are primarily being used for processor-intensive tasks, it allows us to justify the Athena workstation replacement budget. Lastly, it allows us to customize the Athena environment to better support the usage of Athena. Maintaining a general-purpose computing environment throughout campus is fairly difficult – if we know that certain aspects of it are more important than others, we can better tailor the user experience.
What is collected?: At the end of each login session, the following data – along with a random universally unique identifier (UUID) – is sent to the syslog server:
- the duration of the login session
- the names of any packages installed during the session
- the paths of any binaries run during the session
We realize that the last item is by far the most controversial, and have put in place the following safeguards to ensure that user privacy is retained to the maximum extent:
- No user information is collected
- The raw data is only available to IS&T Server Operations staff (approximately 19 people).
- Data analysis focuses on the type of application (e.g. Math Software, Web Browser, Office Suite) rather than the actual binary being run.
If you wish, you may examine the program that compiles and sends this data. It is available at /usr/share/python-support/debathena-metrics/debathena/metrics/gatherer.py, or /afs/dev.mit.edu/source/src-svn/debathena/debathena/metrics/debathena/metrics/gatherer.py from a non-cluster machine.
Is there any personally identifying information?: No. No information about the user logged in at the time is collected.