Regulating content on the Internet: A new technological perspective
Executive Summary
In 1999, Industry Canada commissioned a report investigating the technological feasibility of attempts to regulate content on the Internet. That report, entitled Regulation of the Internet: A Technological Perspective, discussed a number of technologies available at that time which potentially could have been applied toward the regulation of Canadians' access to Internet content. Despite the existence of these technologies, the report concluded that "none of these technological approaches would effectively prevent the Canadian Internet user from accessing content that violates pre-defined rules of acceptability, nor would they ensure that the user would be exposed to any measure of desirable content."1.
Since 1999, increases in computing power and broadband connectivity as well as improvements in display technologies have led to a significant growth in the use of the Internet as both a communications medium as well as an audio and video content distribution vehicle. In addition, the rise of peer-to-peer file sharing over the Internet has created increasingly significant challenges to traditional broadcast media distribution, as well as given rise to a significant number of copyright violations through digital media filesharing transfers. In light of these trends, questions have again arisen as to whether Canadians' access to content could or should be regulated (restricted or promoted) on the Internet.
Regardless of whether or not it is desirable to regulate content on the Internet from a policy perspective, the question becomes moot unless it is technologically possible to do so in a meaningful way. In order to answer the question of technical feasibility, Industry Canada in early 2008 commissioned an updated report examining the degree to which current technologies make it more or less difficult to regulate content on the Internet. Now almost a decade after the first report, this new report looks at how the Internet is driving major changes in the ways in which individuals experience audio and video content, and whether new and emerging information and communications technologies (ICTs) today have greater potential to determine the kinds of content Canadians are able to access. Like the first report, this updated study does not address public policy issues, but again focuses solely on issues of technological feasibility.
We begin by summarizing the findings of the 1999 study which essentially examined two main technology approaches to Internet content blocking and filtering technologies — IP or DNS blocking, and keyword filtering based on Packet header or file inspection — and concluded that an attempt to implement these on a national scale would be prohibitively expensive, would result in a serious negative impact on the Canadian telecommunications infrastructure (and consequently on the Canadian economy), and would be of limited effectiveness.
We then go on to discuss the research methodology used in preparing the report, and the wide range of published works and individuals we consulted, including business and technical experts from Canada's major ISPs, representatives of regulatory agencies, top academics, law enforcement agents engaged in detecting cybercrime, leading technology providers, business and technical representatives of search engine and social networking companies, and leading market research firms.
In order to investigate the capabilities and limitations of today's technology, the report then examines, in some detail, the changing technological landscape, to provide an understanding of both the context and the scale of the challenges inherent in implementing any technical approach to controlling content on the Internet. We show how since 1999 computing devices connected to the Internet have become increasingly faster, more numerous and more powerful, and how software companies have continued to develop and refine algorithms to create more sophisticated applications. We also demonstrate that far more people are using the Internet today for a much wider range of purposes: not just for Web browsing and email, but to call friends, family and business associates, download music, listen to the radio, watch videos and TV programs, use virtual classrooms or participate in virtual meetings, play online games, execute ecommerce transactions, share photos and other user-generated content, and engage in a growing variety of social, media-rich activities. In short, more people are using the Internet, and each of them (on average) is spending more time and doing more online than ever before. Moreover, along with an exponential growth in the number of Web hosts and Web sites, the volume of data traffic on the Internet has swollen to more than 3.7 petabytes2 of data per month, with Canadians accounting for approximately 74 terabytes of those monthly data transfers.3.
After a brief investigation of the 24 countries in the world most actively engaged in statemandated technical filtering and blocking of Internet content, the report then turns to an extensive analysis of the technologies that have the potential to promote or constrain access to content on the Internet today. Recognizing that the first requirement for regulating access to any specific content would be the ability to identify that content, we begin our discussion by examining available technological methods to identifying data transiting the Internet.
The report investigates five major approaches that are currently used to identify specific kinds of Internet content. These include:
Identification by Location — methods for identifying the location of either the user or the content source based on IP address or Domain name. (also called Geo-location);
Identifying information in the Packet Header — methods for identifying the protocol or alphanumeric strings contained in the URL by examination of the packet header;
Low Level Inspection — methods such as page blocking, key word filtering, and digital signature and content fingerprint detection, for identifying the alphanumeric strings inside the payload of a packet;
After-the-fact-Analysis — methods applied to determine the content of a file (image, video, document, email, conversation etc) where it is possible to capture and store the entire file so it can then be analysed asynchronously;
Search Engine Indexing — methods used by search engines to index the alphanumeric content of Web sites.
The report describes how each of these techniques are currently being used in today's world to restrict users' access to particular kinds of Internet content, and then discusses the inherent challenges and problems in their implementation, and the variety of countermeasures that have been developed specifically to undermine their effectiveness.
The report then looks at technological approaches to promoting user access to, or awareness of, particular types of content, including the possibility of extending the Canadian Audio-Visual Certification Office recognition to include IP addresses, domain names or URLs for specific online content, and the appropriate government department then entering into agreements with search engine providers to display certified Canadian online content through sponsored links or ad words. The report describes other promotional technological approaches that Canadian government agencies may wish to investigate, including hosting certified content on local servers as well as harnessing the potential of social media marketing and search engine optimization techniques to promote Canadians' awareness of specific kinds of content.
After examining current approaches to content identification and how they can be used to regulate access to Internet content, the report takes a realistic look at the limits of today's technology, and attempts to dispel some of the "urban myths" about what technology currently can and cannot do. We point out that efforts to block Internet traffic that is widely agreed to be undesirable (child pornography, email spam, and malicious software programs), where great efforts and millions of dollars have been expended in attempts at elimination, have had disappointingly limited success. We also examine a number of the misconceptions around the capabilities of "deep packet inspection" and show that, while DPI has its uses as a network management tool, as a result of costs, network performance issues, and technical limitations of carrier-grade devices, DPI technology is still dauntingly complex and immature, and is not a practical widespread solution to controlling undesirable Internet content.
Even in countries that are willing to engage in more severely restrictive content blocking measures than would be acceptable in Canada, we found empirical evidence to suggest that users are able to access the content they want. Moreover, even for the purposes of detecting criminal activities or for the purposes of detecting terrorist threats, with the vast resources that national security agencies have at their disposal, we have found that, at the end of the day, accurate content analysis demands after-the-fact inspection by human analysts. While technology filtering can help to identify material that may be of interest, human inspection is ultimately required to determine whether content reveals an actual security threat or criminal activity.
In summary, the report concludes (a) that it is as impossible today as it was in 1999 to create a meaningful list of pre-identified content or content sources to be blocked or promoted; (b) the costs today of implementing a program to filter the volumes of data transiting the Internet based on a rule-set of any complexity would make it as prohibitively expensive as it was in 1999; and (c) any regime that interfered with the fast and secure exchange of information over the Internet would have a negative effect on the competitiveness and profitability of Canadian businesses and on the Canadian economy.
Based upon the research that went into the writing of this report, we conclude that there are still no technologies today that can effectively or practically prevent Canadians from accessing specific kinds of Internet content, nor are there technologies that will ensure that Canadians are exposed to specific kinds of content while they are using the Internet. There is a group of emerging new promotional tools, however, such as search engine optimization and social media networking that have proven effective in increasing visibility for, and raising awareness of, content on the Internet, and significant opportunities exist for both Canadian content creators and government agencies to leverage these tools.
1 Miller, Gerry et al, Regulation of the Internet: A Technological Perspective, 1999
2 A byte is equal to 8 bits, and is the amount of storage required for a single text character. 3.7 petabytes is equal to 3.7 quadrillion bytes, or 37 million × 1 billion bytes (3 700 000 000 000 000 bytes). Appendix D contains more information on "bits" and "bytes".
3 Cisco Systems Inc., Global IP Traffic Forecast and Methodology, 2006–2011.