A long email header (or full email header) is the log of an email's journey from one email server to another. When a piece of email travels from a sender to a recipient, mail isn't simply put into the person's mailbox. Rather, the email usually goes through a list of servers which hand off the email to other servers before reaching the recipient. Because a long email header is a log of this journey, they are usually helpful in detecting the sender of spam mail.
Before going into the specifics, let's first find an email header...
Directions on how to find the long email header can be found here.
Now let's take a look at some email headers...
The most relevant parts of email headers when looking through the path an email traveled is the Received: sections.
In this test email sent from my Yahoo account to my Gmail account, you can see the path an email travels by looking at the Received: lines.
This area of the email header tells us that the email has gone through 4 steps from sender to recipient:
- Starting from the sender, the email is sent out via qmail.
Starting from the bottom, the first received message indicates:
Yahoo's mail system uses the qmail program, which is yahoo's program of mail delivery. More information on the qmail system can be found here. The time and date following this message indicates the time this process was carried out.
- Google verifies the authenticity of this message.
Next, Gmail verifies the authenticity of the sender through Receive-SPF (Sender Policy Framework):
In this case, this email has passed since yahoo's IP address was recognized as a permitted sender to google.com emails. Here the email has noted for us the IP address as the sender, which can be used to track the true sender, were it not from the sender it was said to be from, email@example.com.
- Google receives this message from yahoo.
After having verified the authenticity, Google now accepts this message:
Here, the mail server that sends the email is web57701.mail.re3.yahoo.com and the server that receives this mail is mx.google.com. The server which sent the email calls itself web57701.mail.re3.yahoo.com. The server receiving this email notes the IP address of the server that sends this mail (126.96.36.199) and does a reverse DNS lookup on it, showing that the true server name is web57701.mail.re3.yahoo.com, which is what the original server identified itself as. This message is received via SMTP, (Simple Mail Transfer Protocol), and is assigned a temporary ID of 12si3311531yxe.126.2009.07.06.07.38.20 for this transfer. The transfer occurs at the specified time and date.
- Google sends the message to the individual's mailbox server.
Having accepted this mail, Google transfers this email through a few more servers before reaching the destination mailbox.
Now let's take a look at a spam email that has been sent straight to my Spam Folder:
The sender of this email says he/she is from energybulletin.net, which is a legitimate site. However, this email is obviously not from this sender (You can't purchase anything off EnergyBulletin.net and I certainly didn't buy anything from there...). How can we trace who it's from?
If the proper information is available, this isn't so hard to find out. We simply look at the first Received: header which offers detailed information about servers:
Here, the sending server identifies itself as "Christine." However, a reverse DNS done by the receiving server (mit.edu) shows that the server is a part of gaoland.net and has an IP address of 188.8.131.52. If we check ourselves using a reverse DNS lookup, we find that this IP address originates from France. But from my DNS lookup, this IP address indicates a possibly forged server.
While we are unable to pinpoint a sender, we did find out the approximate origin of this email. Seeing as how I know no one from France, I can only assume this as a random spam mail. But while the path can tell us who sent it, we can forgo some of this work by looking at MIT's spam filter system.
|MIT's Email Architecture|
For a more visual explanation of the servers involved in MIT's mail filtering, reference Jacob's powerpoint.
Now let's take a look at another part of the email header: the Spam Level and Score. When an MIT email is recognized as spam, there are generally a few things on the long email header that will indicate that it is spam and the extent to which that email is "spammy." Specifically, these parts:
When email passes through MIT's central mail hub, email goes through a spam filter named Spam Assassin. This spam filter scores the email, indicating how much that email resembles spam. This is done by adding a few extra headers, including X-Spam-Score, X-Spam-Level and X-Spam-Flag. If this score passes a certain level, this email is blocked and given a spam flag when it reaches your mailbox. The higher the number of the spam score, the more the email will be flagged as spam. For those who are not on Spam Quarantine, this is the only part relating to the "spamminess" of an email.
For those who are on the Spam Quarantine filter, this email has a different line of header information:
With Spam Quarantine activated, Spam Assassin still assigns a spam score to the email but the Spam Quarantine server, Brightmail, takes the email before it reaches your mailbox and does its own assessment of the email. If it is considered Spam, it will stay on the server. If it is not, it will reach the inbox of the user. As mail leaves the Brightmail server, Brightmail automatically removes the Spam Assassin filter, possibly to prevent further confusion between spam filters.
Here, these settings reflect Gmail's spam filter. Since we don't support Gmail, less is known about the inner mechanisms of this server but it seems that even neutral email will be blocked.