CFR Launches the Cyber Brief Series

The Council on Foreign Relations recently launched their new Cyber Briefs series through the Digital and Cyberspace Policy Program. The Cyber Briefs are short memos that offer concrete recommendations on topics such as cybersecurity, Internet governance, and online privacy that will be published bimonthly on the CFR website.

The first brief in the series titled ‘Promoting Norms for Cyberspace‘ is written by Henry Farrell, associate professor of political science and international affairs at George Washington University. Mr. Farrell has also provided some additional thoughts on the topic at the Washington Post.

Mr. Farrell makes the case that norms matter for (US) cybersecurity due to four reasons:

  1. The US is vulnerable to cyberattacks and this weakness is difficult to address using conventional tools of military statecraft.
  2. It is difficult to ensure that complex information systems are fully defended, since they may have subtle technical weaknesses.
  3. Classical deterrence is not easy in a world where it is often challenging to identify sophisticated attackers, or even to know when an attack has taken place.
  4. Treaties are hard to enforce because it is so difficult to verify compliance – particularly in cyberspace, where weapons are software, not missiles.

He further argues that it will be difficult for the U.S. to shape norms without making major changes to other aspects of their policy. His main recommendations for this policy alignment are to:

  1. Reform U.S. intelligence activities to make them more consistent with the publicly expressed norms of Internet openness that the United States is trying to establish.
  2. Disclose more convincing evidence when trying to shame actors that do not abide by cybersecurity norms.
  3. Encourage other states and civil society actors to take a leading role in norm promotion—even when this cuts against U.S. interests. To develop legitimate norms, the U.S. should let some of its partners take the lead. New norms will not be seen as legitimate if they are perceived to be solely a projection of U.S. interests.

Overall, Mr. Farell provides some important points in his brief but I think his comments in the Washington Post most succinctly summarises the challenges ahead:

When actors have many shared values, norm building is easier. When actors have few shared values, then norm building is hard. Furthermore, if you want to persuade others to accept norms, you will have a hard time unless you are obviously and sincerely committed to those norms yourself.

It is clear that the Snowden revelations have tarnished the U.S. reputation as a proponent for a free, open, and democratic Internet but perhaps more importantly it has also tarnished its reputation with key allies such as Germany and other European countries. Mr. Farrell correctly highlights that the US needs to work both on aligning its intelligence activities to its Internet policy and include other non-government actors such as the EFF in its norm advocacy. However, as with many policy suggestions in the cybersecurity arena they are easier said than done.

As long as the creation of common global cybersecurity norms will be challenging to develop, and they will be for a long time, the US will be faced with the decision of what will be most rewarding between NSA intelligence operations and their alignment with open Internet norms. My guess is that wide scale intelligence gathering will continue to hold the upper hand for some time to come.

Further, rebuilding the US reputation with key cybersecurity allies has and will continue to be a policy priority for the White House moving forward but the main challenge will be to build common norms with countries outside the American sphere of influence. In the Washington Post Mr. Farrell notes while the US has promoted an open and robust Internet, other important (authoritarian and semi-authoritarian) countries may view this as a threat to the stability of their governments.

There is a significant divide between the position for a free and open Internet, typically led by the US and the EU, and a more restricted, nation-state controlled, Internet, typically led by Russia, China, and supported by a majority of developing countries. For the creation of common Internet norms that will have a significant impact on the global level of cybersecurity, I believe this to be a fundamental challenge. The main challenges and cybersecurity threats that the US faces do not originate from its allies but from countries like Russia and China and it will be imperative that the US will be able to reach a common understanding about what is acceptable behaviour in cyberspace in order to reduce these risks.

And while there has been significant development in the last ten years with ICANN reform and dialogues like NetMundial there is still a huge divide to overcome, a divide that will be incredibly complex and difficult to solve. But I agree with Mr. Farrell that if the US is seriously committed to building norms in cyberspace, it is going to have to start thinking about how to do this.

Basic Web Scraping With Curl

This is a short write-up on how you can use curl to download multiple web pages and use command line tools to extract relevant text from HTML files. If you are looking to scrape a large amount of data or have specific data elements to scrape there are better ways of doing so using dedicated web scrapers (for example using BeautifulSoup or ScraperWiki).
Part 1

Our first step is to find the links to all the webpages that contain the information we need. Here you will most likely encounter either sequential or random link structures.

Example of sequential links:
www.site.com/archive/2015/01.html
www.site.com/archive/2015/02.html
www.site.com/archive/2015/03.html

Example of random links:
www.site.com/archive/q101.html
www.site.com/archive/abc.html
www.site.com/archive/root1.html

In our example we will be working with random link structure but we will also go through how to handle sequential links.

Introduction to curl

Curl is a command line tool to transfer data using various protocols such as HTML or FTP. If curl is not installed on your system you can install it using apt-get install curl.

The basic syntax to tell curl to download a single page is:

curl <url> -o file.extension

For example: curl www.site.com/page.html -o page.html

The –o option specifies the output file where the curled file will be saved. Without –o specified curl will simply show the contents of the file on screen.

Using curl to download multiple sequential links

For sequential links you can use curl to download a range of documents you want straight from the command line by using different brackets.

You can download multiple URLs or parts of URLs by using braces:

www.{subdomain,subdomain,subdomain}.site.com

You can also download alphanumeric sequences by using square brackets:

www.site.com/archive/2015/[01-10].html

www.site.com/archive/2015/[a-h].html

You can also use multiple brackets in one command:

www.site.com/archive/20[10-15]/[01-10].html

When using brackets to curl multiple files we can dynamically allocate file names using ‘#’ in the –o option, which will replace the ‘#’ with the current string in the URL being downloaded.

curl www.site.com/archive/20[10-15]/[01-10].html -o ‘#1_#2.html

This will save the files as 10_01.html 10_02.html etc.

Using curl to download multiple random links

With random link structure it is difficult to construct a curl command with brackets to download multiple files. Instead we can tell curl to download files from a list of links contained within a text file.

In our case study we are looking to download all the US Presidential speeches that talk about cybersecurity between 2014 and 2015. Our first step is to gather the links we want curl to download.

We can use the search function at the American Presidency Project to search for cybersecurity between 2014 and 2015, which should return a result that looks like this in our browser:

 

 

 

 

 

 

 

 

 

 

To extract the links for the search results we can use the text based browser lynx. If you do not have lynx installed you can install it using apt-get install lynx.

First we have to save a local copy of the search results (just right click in your browser and select save as…) to work with (in our case we have saved the file as index.html). Using lynx we can then use the –dump function to strip all html tags from the source file and save it to a text file called links.txt using the ‘>’ operator.

lynx -dump index.html > links.txt

We now have removed all html tags from our index.html file and only the text present in the document as well as a list of links remains. In our case, the end of the text file should look like this:

We can see that the links relevant to us are number 39 to 53 (the actual search results) and since we are not interested in the other links or the preceding text we will remove everything but those links using grep.

Grep is a command line tool for searching plain-text data sets for lines matching a regular expression search string, for example grep ‘http’ index.html will return a list of all the links in our index.html file.

Grep output

We can see that all our relevant links share the same structure up until ‘?pid=’ so we can grep up to that point to only return the relevant links.

grep ‘http://www.presidency.ucsb.edu/ws/index.php?’ links.txt

However, for curl to be able to process the information as links we need to remove any extra characters, which in our case means that we need to remove the numbers and the spaces preceding the URL.

To accomplish this we can combine grep with the cut command and its –c option. With –c# cut will only display the character number specified (in our case seven, as we have a number, a full stop, and four spaces preceding the URL). However, with -c7 cut will only return the seventh character (‘h’) and as we want to keep the whole URL that comes after the ‘h’ we can add a trailing ‘‘ after our number to keep everything that comes after the seventh character.

grep ‘http://www.presidency.ucsb.edu/ws/index.php?’ links.txt | cut –c7-

The output should now look like this:

Grep with cut

We also want to save this to a file so we can use that file to curl all the relevant links. As we did with lynx we use the ‘>’ operator.

grep ‘http://www.presidency.ucsb.edu/ws/index.php?’ links.txt | cut –c7- > links-clean.txt

So now we have our file with clean links we need to instruct curl to read each line of the file and download each URL using a simple bash script.

We will start by opening the text editor nano and create a new script file called curl.sh.

nano curl.sh

In nano, we will type the following script:

#!/bin/bash
 file="links-clean.txt"
 while read line
 do
 outfile=$(echo $line | awk 'BEGIN { FS = "/" } ; {print $NF}')
 curl -o "$outfile.html" "$line"
 sleep 1m
 done < "$file"

To exit nano press ctrl+x and then press y to save your changes. This script basically tells curl to read each line in ‘links-clean.txt’, download the file in that URL, save it to a file with same name as the URL, and then move on to the next line until it has processed all lines.

Remember, computers can execute commands much faster than people and in order to not overload the website host it is a good idea to add a sleep function between each curl. In our example we have added a sleep time of one minute between each curl.

To be able to execute the script we need to first give it appropriate file permissions using chmod.

chmod u+x curl.sh

We can then run the script.

./curl.sh

If everything has gone well we should now have a number of html files in our directory.

Curled files

Part 2

We now have all presidential speeches that mention cybersecurity from the period 2014 to 2015 in HTML format. However, in our case we want to do some simple text analysis on the speeches themselves and would therefore like to have only the speech texts (not the HTML code or anything else) in text format. Therefore we need to find a way to convert the HTML to text and remove all content besides the speeches themselves.

To convert the HTML files to text we can use lynx’s dump command again but rather than manually doing it for each file we can actually reuse the script we made for curling multiple files. First of all we can see that our files actually have a .php ending rather than .html ending which is required for the –dump command to work.

For the sake of simplicity we will first use the rename command to remove everything but the numbers in our file names (note that these commands affect all files so make sure you have put the HTML files in a new separate folder).

rename ‘s/[^0-9]*//g’ index.php*

We can then add the .html extension to all files.

rename ‘s/(.*)/$1.html/’ *

The next step is to create the list of files to lynx to process. The easiest way is to just list the contents of this directory (since it does not contain anything but our HTML files) and save the output to a file.

ls > files.text

Our files.txt now looks like this:

Files.txt

Then we repeat the process to create a bash script.

nano dump.sh

#!/bin/bash
 file="files.txt"
 while read line
 do
 outfile=$(echo $line | awk 'BEGIN { FS = "/" } ; {print $NF}')
 lynx -dump "$outfile" "$line" > $outfile.txt
 done < "$file"

Like with our curl example this script reads each line in files.txt but instead of running curl, the script now runs lynx -dump on each file.

We then make the script executable and run it.

chmod u+x dump.sh

./dump.sh

We now have the speeches in text format with the HTML tags removed. However, we still have two problems. First, the beginning of each file contains a number of rows that are not relevant to our text (the other text content on the web page that isn’t our speech).

Beginning

What we would like to do is to remove all the lines of text before the speech text begins. Luckily, all the files share the same structure and have 79 lines of irrelevant text before the speech text. We can then use sed to remove line 1 to 79 in all text files in our catalogue. Sed (stream editor) is a powerful Linux tool that parses and transforms text. The following command tells sed to remove lines 1 to 79 (1,79d) for all text files (*.txt) and write the changes to the files (-i).

sed -i  ‘1,79d’ *.txt

The second problem is that lynx -dump puts all the links found in the HTML document as a list at the end of each text file.

End

We can see that there are two types of links; one that begins with http:// and one that begins with file://. Since we would like to remove all instances of both we can again use sed but with now with this syntax:

sed -i ‘/file:\|http/d’ *.txt

And that’s it! We now have all the presidential speeches that mention cybersecurity between 2014 to 2015 as text files with most unnecessary text removed and can proceed with some text or word analysis of the corpus.

Thailand’s Cybersecurity Bill and Internet Censorship

Introduction

One of the early promises of the National Council for Peace and Order (NCPO), who seized power through a coup d’état in May 2014, was the development of a digital economy in Thailand. The Thai Government is currently considering a package of eight digital economy bills ranging from electronic transactions to cybersecurity to make Thailand more competitive and better equipped to compete in the global digital economy. You can find the full list of bills and translation of some of them here.

Update April 9, 2015:

Three of the digital economy bills, the Information and Communication Technology Ministry Reform Bill, the Digital Economy Bill, and the NBTC Bill, will be forwarded to the National Legislative Assembly next month and expected to be enforced by the end of 2015.

The other five digital-economy bills are currently at the Council of State consideration stage and expected to be passed to the National Legislative Assembly within the coming three months.[1]

Background

Due to a large rural population the Internet penetration in Thailand remains low at around 29% in 2013.[2] However, Thailand has recently seen a large increase in mobile penetration, mobile Internet, and social media adoption – especially among the young urban population. With 28 million Facebook users, one in three Thais has a Facebook account, making Thailand the ninth biggest Facebook country worldwide. Twitter is not as large, but also growing with around 4.5 million users.[3]

Besides Western social networks, the Japanese instant communications app Line has also reached immense popularity with over 24 million users in 2014, allowing instant communication through text and VOIP.[4]

The opportunities of mobile Internet and social media applications offer the Thai population a greater diversity of content and debate than previously available through traditional media. This has unfortunately also prompted the Thai Government to increase its efforts to control and censor information available on the Internet.

The lèse-majesté law, Article 112 of the criminal code, can be used to give up to 15 years in prison for anyone who “defames, insults, or threatens the King, Queen, the Heir-apparent, or the Regent” and has also been put to use on information and opinion published online. Similarly, the 2007 Computer Crimes Act (CCA) can give up to five years imprisonment for publication of content that jeopardises individuals, the public, or national security, or acts as a proxy to access restricted material.[5] The CCA also holds computer users liable for any content they import into a computer system as well as holds internet service providers at all levels liable for content published on them.[6]

The Cyber Security Operations Center (CSOC) established in 2011 the Ministry of Information and Communication Technology has the authority to shut down and block websites without a court order.[7] The CSOC extensively monitors the Thai Internet space and has forced Thai ISPs to block access to hundreds of websites including independent news websites, Government critics, and even social media sites such as Facebook.[8]

The harsh environment of censorship and stark punishments has stifled online discussion, increased self-censorship, and severely harmed Thai Internet freedom.

Reception

The Digital Economy Bills have been met with widespread skepticism and criticism from both domestic and international actors. The bills have been accused of being overly broad, vague, and too lenient in giving the government additional power in the Internet sphere. A wide range of critics voiced concerns that rather than being designed to invigorate the Thai digital economy, the bills seemed to be an attempt by the military government to take control of both private and corporate information by opening loopholes for increased surveillance, censorship, and blocking of websites.[9, 10, 11, 12]

The National Cybersecurity Bill, designed to complement the Computer Crimes Act, received the bulk of the criticism. The bill would establish a government-run cybersecurity committee charged with detecting and countering online threats to national security and give far reaching power to the officials tasked with cybersecurity work.[13]

NCPO chief Prayut Chan-O-Cha has insisted that the National Cybersecurity Bill is a necessary tool to protect the nation and eloquently explained to reporters that:

“We need to have national security otherwise everybody does what they want.”

However, he also reassured critics that the bill would only be used on occasions when the authorities suspect Thailand’s national security is at risk – a comfortably vague statement on multiple counts.[14]

The main issue with the bill is that it would give government officials de facto authority to read and seize virtually any communication transmitted over any digital means at the discretion of the National Cybersecurity Committee – without the need for a court order.

The strong opposition to the bill has forced the Thai Government to revise the current draft and it has promised that the new revision will have stronger checks and balances of the National Cybersecurity Committee’s powers. However, it remains to see if the government actually delivers on its promises as the latest draft of the National Cybersecurity Bill has yet to be revealed to the public. In the meantime, a closer reading of the current draft will reveal why the bill is so problematic and potentially harmful to Thai Internet freedom.

The National Cybersecurity Bill

The National Cybersecurity Bill (draft approved by the Cabinet on 6 January 2015) opens with a vague, and quite unusual, definition of cybersecurity.

 “Cybersecurity” means measures and operations that are conceived in order to maintain national Cybersecurity, enabling it to protect, prevent or tackle circumstances of cyber threats which may affect or pose risks to the service or application of computer network, internet, telecommunications network, or the regular service of satellites in ways that affect national security, which includes military security, domestic peace and order, and economic stability.

Unlike more traditional definitions of cybersecurity that typically encompass the confidentiality, availability and integrity of information, the bill’s definition puts cybersecurity in direct relation to national security. It also gives astonishing leeway in the interpretation of permissible ‘measures and operations’ as well as the range of services, applications, or networks it can apply to – in essence giving the interpreter (in this case the National Cybersecurity Committee) the power to justify anything as being required for national cybersecurity.

The bill also harbours several sections that are cause for concerns, in particular section 30, 33, 34, and 35.

Section 30 basically implies that the prime minister has unchecked power over cybersecurity in Thailand as well as direct control over the National Cybersecurity Committee and the operational officials under it. Recalling the broad and vague definition of cybersecurity within the bill this is a worrying display of executive power, in particular in relation to the Prime Minister’s de facto authority to deem any Internet-related communication a cybersecurity matter and thus under the purview of this bill. The Prime Minister also has the power and responsibility to appoint the operational officials under the National Cybersecurity Bill.

Section 30
The Prime Minister shall be in command with powers to control and direct the maintenance of Cybersecurity across the country in accordance with the operation plans on the maintenance of Cybersecurity and this Act. For this purpose, the Prime Minister shall have the power to command and order the persons responsible for the operation under Section 28 across the country.

The amount of executive power becomes increasingly troublesome when put in relation to sections 33 and 34.

Section 33
Upon the occurrence of an emergency or danger as a result of cyber threat that may affect national security, the NCSC shall have the power to order all State agencies to perform any act to prevent, solve the issues or mitigate the damage that has arisen or that may arise as it sees fit and may order a State agency or any person, including a person who has suffered from the danger or may suffer from such danger or damage, to act or co-operate in an act that will result in timely control, suspension, or mitigation of such danger and damage that have arisen.

Section 33 gives authority to the National Cybersecurity Committee to act in the event of an emergency or danger as the result of a cyber threat affecting national security. As we already know the definition of cybersecurity is incredibly wide and there are no further explanations on what constitutes as an ‘emergency’, ‘danger’, or ‘cyber threat’ – it will be up to the Committee (or the Prime Minister to decide.

The second part of section 33 is even worse. The National Cybersecurity Committee is supposed to have the power to, in the event of an emergency of the aforementioned kind, order all state agencies or any person to perform any act necessary to solve or mitigate the issue of national security. Read that sentence again. It is in essence a legal carte blanche for the Committee to do anything in the name of national cybersecurity.

Section 34 then extends this authority to include private corporations:

In case where it is necessary, for the purpose of maintaining cybersecurity, which may affect financial and commercial stability or national security, the NCSC may order a private sector to act or not to act in any way and to report the outcome of the order to the NCSC as required by the notification of the NCSC.

Finally, the section that has received most attention – section 35.

For the purpose of performing their duties under this Act, the Officials who have been entrusted in writing by the Secretary shall have the following powers:

(1) to issue letters asking questions or requesting a State agency or any person to give testimony, submit an explanation in writing, or submit any account, document, or evidence for the purpose of inspection or obtaining information for the benefit of the execution of this Act;

(2) to issue letters requesting State agencies or private agencies to act for the benefit of the NCSC’s performance of duty;

(3) to gain access to information on communications, either by post, telegram, telephone, fax, computer, any tool or instrument for electronic media communication or telecommunications, for the benefit of the operation for the maintenance of cybersecurity.

This is the section that gives the legal right to officials under the National Cybersecurity Committee to read and seize virtually any communication transmitted over any digital means without the need for a court order.

In Thailand, where insulting the Thai Royal Family is illegal and considered a threat to national security, there is little doubt this law will be used to identify and prosecute perceived critics of the monarchy. Not to mention stifling political opposition and dissent targeted towards the military government.

The recent development of Internet availability and social media have opened up new information channels and spaces for discussion of politics and controversial topics in Thailand. The Cyber Security Act and the other Digital Economy bills now run the risk of instead developing an Internet space with censorship and tight political control.