Archived—Canadian Spam Database Concept Document

Archived Information

Archived information is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please contact us to request a format other than those available.

Task Force on Spam
Working Group on Technology and Network Management

May 2005


PDF Version (174 KB - 9 pages)


Introduction
Objectives of the Database
Technical Concept
Cost Estimates
Ownership/Management


Introduction

Canada's Task Force on Spam strategy is based on a tool-kit, or multifaceted, approach to fighting spam. The strategy involves: improved enforcement of existing laws; actions by the Internet and information technology industries to resolve network technology issues surrounding email abuse; improved best practices, by both service providers and industries that use email for legitimate commercial purposes; consumer education and awareness efforts; and international cooperation.

In particular, the Task Force is examining existing laws and legal processes to control spam, enforcement requirements, means of controlling spam (i.e. a complaint process, a platform to help enforcement agencies build legal cases, etc.) and the implications of these means; and is encouraging enforcement agencies to take action against spam. At the same time, the Task Force is also examining the technical tools available and the approaches that industry and government can take to reduce the flow of spam in Canada.

Through the Task Force's deliberations, a number of challenges emerged that limit its ability to effectively address the problem of spam. First, there is no clear process in place in Canada whereby Canadian consumers and businesses can complain about spam that they have received. Consumers are most likely to complain to their Internet service providers (ISPs), and there is no way of qualifying or quantifying the ongoing magnitude of the problem beyond the level of individual ISPs.

Second, spam investigations are difficult to carry out, requiring a high degree of technical expertise and the ability to locate examples of spam complaints/offences.

The Task Force is, as a result, exploring the possibility of establishing a database to which email users could send copies of spam they have received in their computers' email inboxes. Spam messages sent to this database would be inventoried, kept by a central Canadian organization for a set period of time and made available for law enforcement and research purposes.

This Canadian spam database (or "freezer") would be similar to the United States Federal Trade Commission's (FTC's) spam database, the U.S. Anti-Phishing Working Group database, and Canada's PhoneBusters database of telemarketing complaints. The Canadian Spam Freezer would be able to share database information with these other organizations and with other international sister organizations around the world.

Objectives of the Database

The Canadian Spam Freezer system would receive and store examples of spam submitted voluntarily by Canadians. It would provide Canadian victims of spam with a trustworthy and effective reporting mechanism to help in efforts to enforce Canadian laws applicable to spam and related types of email abuse.

The database would be a resource for agencies responsible for enforcing Canadian law, and could also be made available to private organizations involved in fighting spam. Permitted users may include the law enforcement community, Internet service providers, research organizations, and other relevant agencies, such as Industry Canada. Information in the database could be used as evidence in legal efforts, for statistical data analysis, and as the basis for determining Canadian government policy.

The database would need to provide value to Canadian citizens, law enforcement, and other government agencies, while safeguarding Canadian privacy rights as defined by the Personal Information Protection and Electronic Documents Act and other Canadian acts.

Technical Concept

This section of the document will define the technical concepts of the Spam Freezer to provide the reader with a sense of how it would work. The section is written in two parts:

  1. What Canadian Internet users would experience
  2. How the Freezer would be built

1) What Canadian Internet Users Would Experience

The purpose of the Freezer would be to provide Canadians with a central point of contact to which they could submit complaints about spam and samples of spam they have received to their personal and business email accounts.

The main method for complainants to submit spam to the Freezer would be through a generic email account, which they would know as the central Canadian repository for spam. For example, the FTC in the U.S. uses the spam@uce.org address. A website with an electronic forum could be implemented as a secondary system to educate users and ensure that all important information is collected for a clean submission to the Freezer. To ensure Canadians are aware of this complaint process and database, some communications and awareness activities would be necessary.

A few primary email accounts would help differentiate complaints by type. For example, spam@freezer.ca, phishing@freezer.ca and honeypots.spam@freezer.ca could all be standard Freezer accounts. In the future, it could be valuable to assign each submitter a unique spam-submission email address to let the Freezer system track spam submissions by individual submitter. This could allow the system to perhaps offer aggressive tips to email users heavily exposed to spam (i.e. to change their email address).

2) How the Freezer Would Be Built

The Freezer would require several layers of infrastructure in order to be able to provide message integrity, a search index, storage, encryption and evidence preservation.

Message Integrity

Email messages' integrity must be maintained by creating a "tag" for each message. These tags would provide a way of proving the data's integrity, and would also include time-stamps and other metadata (data that describes the message). The message itself must remain unmodified and unfiltered, and must be accepted by the Freezer even if the connecting server does not adhere to the Simple Mail Transfer Protocol (SMTP). When a message is retrieved from the Freezer a check should be performed to guarantee the message's integrity. Virus attachments should be stripped by the Freezer.

The system should be aware of black and white lists, and should mark the metadata with a rating on the connecting Internet Protocol (IP). This could also allow for future purging of messages that have been submitted to abuse the Freezer.

Search Index

Searchable indexes would be the cornerstone of successful Freezer implementation. There should be some preprocessing of the messages to reduce search queries and improve search performance. During the planning phase, some consideration should be given to indexing and searching. The database administrator should collaborate with the U.S. FTC to gain insights about their most popular searches and their index configuration. The index's primary keys must be unique, and the unique keys within each message's header information should be used to determine this. If required, a combination of fields could be used to create the necessary unique field. Messages should be ranked or rated as they are received, to determine their level of "spaminess" and their appropriate email category.

Storage

Messages should be stored on a redundant array of inexpensive disks so the failure of a single disk would not create a service outage or the loss or corruption of data. A journal-file system should be used to provide improved data integrity.

Security

The Freezer's security should allow for Transport Layer Security (TLS) for gateway-to-gateway server encryption. Whether it supports Pretty Good Privacy (PGP) or secure/Multipurpose Internet Mail Extensions (S/MIME) client software should not be considered, due to the insensitivity of spam messages.

Access

The system's metatags should keep track of access permissions for security and tracking purposes. A system for logging in should also be enabled, in order for the system to track searches and control access to the database.

Access to the database should be constrained by priority and the effect on the system's performance. The system should consider using an automated-systems resource manager to ensure that high-impact searches do not consume all resources.

The Task Force recommends that several post-data-processing routines be executed, including:

  1. Reporting on volumes, URLs, spam types, fraud, sources and target physical destinations.
  2. Creating custom reports and notifications based on the thresholds defined by enforcement.
  3. Capturing site information when the account is a "honeypot" source or a decoy account.

Many URLs sent in spam email contain a special string to identify to whom the spam message was sent. To avoid identifying real consumers spammers, it is critical that the Freezer database not perform automatic searches (i.e. web crawl or spider searches) of URL links to capture these websites for use as evidence.

For bulk submissions from ISPs from legitimate domains, where the target victim is a honeypot or decoy account, the Freezer implementation team could consider using back-end processes to capture the site.

Cost Estimates

Costing for a system such as this would be heavily influenced by policy decisions in areas such as submission eligibility, partnership opportunities and access control. As such, the costing in this document is based on the following set of core assumptions:

  • A public–private sponsorship program would donate the majority of the labour necessary for software development (the Task Force recognizes that this assumption may not be realistic).
  • Program oversight/planning/policy would be supplied by an unpaid volunteer review committee, with membership quotas based on the various types of organizations (e.g. government, commercial and non-government organizations).
  • Bulk submissions (e.g. a spam trap) would be used in some cases.
  • Access would extend to appropriate organizations, subject to review committee approval, not just to law enforcement.
  • Direct operational staffing would be shared with an existing data centre.

Budget (Rough)

  • Hardware purchase: $426 000 for the first year, and $158 000/year thereafter.
  • Hardware lease: $470 000 for the first year, and $230 000/year thereafter.

Each of the above costs could be reduced by $10 000 if additional network access capacity were not required.

Detailed cost estimates are included in Appendix I of this document.

Ownership/Management

To carry forward the multifaceted approach developed and used by the Task Force on Spam so far, and in order to provide a continuing coordinating mechanism for implementing its recommendations, the Task Force is exploring the possibility of establishing a central coordinating body, or Canadian Anti-Spam Action Centre, under which the Canadian Spam Freezer Database could be operated.

This central body could be organized within an existing government organization (e.g. Industry Canada, the Competition Bureau, the RCMP, other police forces, or the Canadian Radio-television and Telecommunications Commission), a new organization/agency, or a public–private partnership. The Canadian Spam Freezer Database could, in this way, be part of an entity to manage the overall Canadian spam complaint process.

Appendix I: Detailed Cost Estimates

Costing for a system such as this would be heavily influenced by policy decisions in areas such as submission eligibility, partnership opportunities and access control. As such, the costing in this document is based on two sets of assumptions.

Core assumptions

  • A public–private sponsorship program would donate the majority of the labour necessary for software development (the Task Force recognizes that this assumption may not be realistic).
  • Program oversight/planning/policy would be supplied by an unpaid volunteer review committee, with membership quotas based on the various types of organizations (e.g. government, commercial and non-government organizations).
  • Bulk submissions (e.g. a spam trap) would be used in some cases.
  • Access would extend to appropriate organizations, subject to review committee approval, not just to law enforcement.
  • Direct operational staffing would be shared with an existing data centre.

Detail assumptions

  • The system would have a data-fill rate of 500 000 messages per day (including modest batch submissions) at an average of 10Kb each, or 5Gb per day plus overhead (e.g. logs, indexing, reports), for a total of 1Gb per day or 2Tb per year.
  • The system would use basic backups (i.e. DVD-R or tape backups of intake data).
  • The system would retain data for two years, after which data would be removed from online storage.
  • Oversight would be supplied by an unpaid review committee handling planning and access-request review.
  • The system would share a data centre, supports and Internet-access facilities (i.e. it would not be a stand-alone entity)
  • Access would be by secured web facilities, standard browsers and standardized authentication infrastructure, with no custom clients or private wide-area-network (WAN) infrastructure.
  • A public–private volunteer partnership may provide leverage for software development costs.
  • Hardware costs would be highly dependent on whether support/maintenance is bundled into a leasing arrangement, or whether there is an outright purchase plus time and materials for installation/support. Both estimates are given at the end of this Appendix. Leasing costs are based on similar contracts within the industry.
  • As much FTC-originated software as possible would be used to reduce development costs and incorporate database/indexing software that has known costs and proven viability.
  • Legal consulting (i.e. on privacy, submissions, access policies/legal guidance) would be provided by sponsors (i.e. Privacy Commission, Industry Canada).

Implementation/Maintenance Costs

Hardware/Media

  • Two mid-range server-class PCs running Linux:
    • Leased (bundled maintenance): $24 000/year; or
    • Purchased: $30 000 capital, $6000/year depreciation, replacement and support.
  • 5Tb RAID storage (two-year retention):
    • Leased: $60 000/year; or
    • Purchased: $10 000 capital, $5000/year depreciation, replacement and support.
  • Archival media: $5000/year.

Infrastructure/Networking

  • Assumed most infrastructure (e.g. Domain Name System [DNS], Network Time Protocol [NTP], firewall, security) would be present through the sharing of facilities: $1000/year incidentals.
  • Network connection (5Gb/day, and may require additional connectivity over and above shared facilities): $10 000/year.

Software Licensing

  • The same storage, retrieval and indexing software as the FTC: $230 000 first year, and $20 000/year thereafter.
  • Other, additional software (antivirus software, operating systems, etc): $5000/year.

Software Development Costs

  • Professional consulting on web page–human interface: $10 000.
  • Other development donated by sponsors.

Operational Costs

  • Aggregate staffing: one (1) person (combined management/operational staffing): $100 000/year loaded labour rate.
  • Other staffing donated by sponsoring agencies as required.
  • Policy, planning and operational oversight to be provided by an unpaid committee.
  • Incidentals (e.g. certificates for acquiring access): $5000/year.
  • Marketing/public awareness efforts to be donated by sponsoring organizations.

Budget (Rough)

  • Hardware purchase: $426 000 for the first year, and $158 000/year thereafter.
  • Hardware lease: $470 000 for the first year, and $230 000/year thereafter.

Each of these hardware costs could be reduced by $10 000 if additional network access capacity is not required.