Plagiarism 2.0: battleground Internet

It is a frequently repeated truism that Internet has fundamentally altered the way individuals interact. It is another often-cited truism that the more things change, the more they stay the same. The case of plagiarism supports both versions.

College students “borrowing” from others instead of doing original work is timeless. According to TurnItIn, 30% of all papers turned in contain significant amounts of plagiarism. Instead of standing on the shoulders of giants, these individuals paint their own likeness on the giants’ face, taking credit for the heights reached in the process. This is an ongoing arms race between students who take the dishonest route for a variety of reasons– academic pressure and expectations of peers/parents, writers’ block or more often the burning desire to attend that fraternity/sorority party– and the professors trying to level the playing field and uphold the last remaining scraps of academic integrity in higher education.
As with countless other conflicts, the Internet in this case acts as a neutral arms dealer, arming both sides to the teeth with the latest gadgetry promising to lend an edge over the adversary. Students now have a choice of websites where they can download papers (to use for “inspiration” of course) or even order new work to specification. That’s quite an improvement over the historical status quo when the source of non-original work would have been limited to the immediate social circle of the student. Depending on personal contacts limited both the number of options and increases the risk of getting caught– work previously submitted by classmates may be recognized instanatly as fraud by colleagues in the same department. On the other hand, the defenses have improved too: professors can now take a suspicious submission and Google unusual phrases to check for an obscure uncredited source in cyberspace.

Has the power balance tipped? Two articles published in consecutive Sunday editions of the New York Times sheds some light on this question. In the first article published on Sep 10th, the NYT ran an article describing an experiment with made-to-order paper websites. The authors ordered a paper on the same subject from a sampling of these online businesses and were disappointed overall with the results. (This article titled “At $9.95 a Page, You Expected Poetry?” is available from the Times website with registration.) Quality of the writing was described as mediocre or sophomoric at best. Not exactly the right way to get on the academic fast-track but arguably good news for the purposes of stealth: the incoherent engineer stringing adjectives together like Tom Wolfe would a few eye-brows and draw unwanted attention. At least mediocre writing, is like, dude, that’s cool. But there are interesting questions around maintaining a consistent voice/tone: the professor might be worried if the student suddenly converts from new-age mystic to textbook libertarian. (Or is that explained away naturally by the soul-searching process of liberal education?) For repeat customers one would hope these fabricate-a-paper services are using the same author.

So much for the attackers trying to game the system. The second article takes up the defenders’ point of view, focusing on one particular service called TurnItIn which screens submissions for plagiarism. (The company offers other services including online grading and peer review, but the article focused exclusively on the plagiarism detection.) This web-based service compares a new submission against three sources: a propietary database of articles, a growing collection of work submitted by existing clients and some cache of pages from the wild, wild web. It is sophisticated enough to identify passages which are copied with slight alterations– a necessary capability because cosmetic changes are to be expected. Whether they are unintentional artifacts of copying a passage by hand instead of copy/paste or intentional variations introduced to create a veneer of originality, these “deviations” from the original source are the main challenge for detecting plagiarized text. In keeping with good security engineering we assume the attackers (eg cheating students) know their paper is going to be compared against existing sources and anticipate they will resort to intentional misspellings, changing order of words, substituting synonyms from a thesaurus, switching active/passive voice, even throwing ungrammatical decoy sentences to avoid detection.

If that sounds like basic spammer tectics, that’s because there is a parallel with spam here. Spammers have to contend with filters and craft their message to bypass existing defenses. The attackers’ advantage is that the design of filters allows gray-box testing: for client applications, the spammer can purchase the application and reverse engineer it, and for web services they can register for an account and spam themselves with version of a message until one gets through.

Clearly this has not yet dawned on the ghost-writer-for-hire websites. The Times found that at least one of the papers ordered was promptly exposed as fraud by TurnItIn. That is the next escalation in this conflict: ironically the very openness of the system allows it to be subverted. A more astute competitor would subscribe to anti-plagiarism services and verify that their work does not raise any red flags or tweak it until it flies under the radar, charging a few more dollars extra to students for that guarantee and upping the ante for the defenders.

And so the arms race continues.

cemp

What is in your laptop?

Toxic material, it turns out.

Hewlett-Packard hardly needs any more bad news in the aftermath of a board scandal involving amateur gum-shoe work which put “pretexting” into the mainstream lexicon. Now an article in Treehugger reports that Greenpeace has downgraded HP’s environmental credentials after finding the flame retardant decaBDE and lead in their laptops. And HP is by no means alone: none of the 5 laptops surveyed by the environmental advocacy group fared well. The full study compared Dell Lattitude, Acer Aspire, Sony VIAO,  HP Pavillion and MacBook Pro, searching for the presence of harmful chemicals such as lead, chromium, cadmium, mercury and bromime.

The impact of these chemicals could be more significant during production and after the lifetime of the actual product. Others have pointed at the subtle connection between environment and IT industry, not traditionally considered a major polluter. Andrew Shapiro of Harvard Law School for example has drawn attention to the growing incidence of discarded hardware from Western countries ending up in landfills of developing countries– the unlikeliest candidates for properly managing the disposal of toxic chemicals. Taking this argument one step further, Shapiro made the same case for software in a talk at MSFT research: efficient applications which can run on   existing hardare and not require upgrades (which would involve discarding existing hardware and adding to those growing piles) is more environmentally friendly than one that is hungry for CPU, memory or other computing resources.

Measuring the impact of software industry is a ways off. But Greenpeace has already stepped up to the plate to compare different vendors of hardware in a report card. According to that chart which rates companies on a scale of 0 to 10, Dell, Nokia and HP are leading the pack while Lenova is the worst offender. (No doubt pundits are commenting on whether exploding/burning batteries were taken into account.) Apple for all its image-conscious advertising and self-billing as the enlightened company, performs dismally, ranking in the bottom quartile. In all fairness, as Treehugger points out the Greenpeace study focuses exclusively on use of toxic chemicals to the exclusion of other ecological factors.

cemp

This privacy violation is not yet rated

“This film is not yet rated” is an interesting self-referential movie that reads on very different levels. Directed by Kirby Dick and produced by Eddie Schmidt who maintains the corresponding blog titled This blog is not yet rated, it is a critical examination of the rating system used by MPAA, considered one of the great legacies from Jack Valenti’s 30+ year tenure.

The documentary, which itself earned the scarlet-letter of an NC-17 rating, attempts to unearth the secretive rating board responsible for assigning the famous G/PG/PG-13/R/NC-17 classifications. Many people are interviewed, each one supporting the main contention: the ratings system is a cabal that restricts speech and/or artistic creativity by threatening to assign the dreaded NC-17 label that dooms an aspiring production to the obscurity of art-house theaters, away from the mainstream screens. (Those interviewed include Stanford’s Larry Lessig.)

Independent of whether one agrees with this line of argument, even more striking is the extent that Kirby Dick has gone to shine the light on the system. MPAA does not disclose the identity of raters, in order to protect them from influence according to the party line. No problem– the director hires a private investigator to unmask them. A good chunk of the documentary cover his pain-staking effort. She parks her minivan outside the MPAA headquarters in LA, watching for cars, watching the cars driving out, binoculars in hand, running license plates to get names and addresses.
Is it stalking? Arguably everything done is legal but very intrusive. License plates are blurred on screen as a nod to privacy, but in that very scene those same plates are being recited by the PI to her assistant, loud and clear. All the usual gumshoe/PI tricks are there: one scene filmed in greenish infrared camera hues shows the PI dumpster-diving outside a rater’s house. Along with the director, they drive to an empty spot and go over the trash, uncovering actual rating forms. There is even the mandatory high-speed pursuits,seen from the vantage point of the van when our fearless team chases a group of suspected raters to a restaurant.

As far as investigative reporting goes, this is a very comprehensive job. Not only are they taking names, but they are publishing them: along with pictures and low-resolution video. The viewers get treated to an “MPAA class of 2006” display, with names and pictures of the raters. For bonus points, add demographic information: age, marital status, children– the last one relevant to the subject because MPAA contends that the raters are representative parents with children aged between 5 and 17. (That claim is proven wrong for at least a handful of the raters.)

It is difficult to sympathize with MPAA or the ratings system, although it is also difficult to agree with one of the experts’ contention that the system is unconstituional on first amendment grounds. In its zeal to build a compelling case against the injustices of rating, the documentary decides to go after the raters themselves. Such anger is misdirected because the raters are not the architects of the system; they are low-level employees (“errand boy” is how Col Kurtz might put it from the R-rated Apocalypse Now) chosen for their role not because of a profound understanding of movie history, but precisely for the lack of any such unique talent, except for being the “average” parent. By publishing their names, filming them in everyday activities such as eating lunch and setting them up for easy ridicule in front of a movie audience, the documentary has made collateral damage out of their individual privacy.

cemp

How to DoS your Exchange server (part II)

A denial-of-service attack depends on asymmetry between attacker and victim. This is the leverage: when bad guys can expend a small amount of effort to cause the good guys to spend large amounts of resources, there is leverage that can lead to a DoS vulnerability. Any feature that creates such leverage increases risk for the system.Distribution lists are a prime example of leverage. One email from the sender morphs into hundreds or even thousands of messages destined for unsuspecting recipients’ inboxes. That is partly the reason Exchange allows controlling the users authorized to send email to a given DL. It is a good idea to restrict this to a small number of users in the case of large distribution lists. What happens when you don’t will be remembered as the infamous “Bedlam 3” debacle in Microsoft lore dating back to the late 1990s. (No wonder the old-timers who were around Bedlam– and it was remarkable enough to inspire its own tshirt with the slogan “I survived Bedlam 3”– were having a deja vu moment with the Blue Hat announcement.) As recounted in this entry from the Exchange team blog, Bedlam resulted from the interaction of an unrestricted DL with predictable human reactions to respond to spam with more spam commentary, some of it even urging other users to stop spamming the alias.

A distribution list with N users gives the attacker a leverage factor of N. For 1 message sent by the attacker, the system works N times as hard, using up roughly N times the storage space. (Interesting enough, the bandwidth requirements within one enterprise do not scale linearly because the system is intelligent enough to optimize delivery across different servers using a single copy.) That is not a bad starting point for a DoS attack: imagine sending a sizable message– perhaps containing an attachment such as image or video– to a very large DL with thousands of users, assuming you can find that misconfigured DL. But it does not work; in most cases only a single instance of that large attachment is stored for all the users sharing the same Exchange server.

But the experiment on Thursday proved one can do substantially better. Sending a message from the alias itself and requesting recall notifications– which are going to be delivered to the same alias, of course– broke new ground. Every one of those N users on the distribution list are going to get a status message for every one of the other N users. That’s N-square or quadratic leverage, achieved at a remarkable economy with only two messages. And 2K * 2K == 4 million messages is exactly what would have been exchanged if it were not for the fact that IT department stepped in after the backlogs grew out of control, legitimate email traffic slowed to a crawl and one of the Exchange servers pegged its memory.

Wreaking havoc on the enterprise scale with 2 messages ? Now that is an accomplishment worthy of its own Black Hat session.

cemp

How to DoS the company Exchange server (part I)

It all started with 2 seemingly bening messages appearing in the Outlook inbox. The first one was an announcement for the upcoming BlueHat event. (BlueHat is a series of security-focused presentations for Microsoft employees styled after the more famous Black Hat briefings held in Las Vegas every year.) Arriving within minutes after that was a second follow-up message, recalling the first one.

The recall feature is an interesting piece of functionality built into Outlook and Exchange. It was the subject of a Sunday New York Times article couple of weeks ago, discussing how to handle the situation when you send a message that in retrospect comes to be viewed as perhaps written in a moment of anger or indiscretion. But contrary to what users might hope for, the recall does not automatically yank the message from recipients’ inbox. Instead it depends on sending a follow-up message announcing the intention to recall the original one. “Intention” is the key word, and that request has to be honored by the sender and/or senders’ email application aka the MUA, mail user agent. Outlook recognizes these messages and in principle opening the second message– either intentionally or by browsing with preview pane for example– the first message is removed from the Inbox. The catch is that the recipient could choose to open the first message first, even if the recall message has already arrived. In reality the recall virtually ensures the original one will be opened and scrutinized very carefully, by drawing attention to the unintended error. (In one case HR emailed a document containing sensitive compensation information to an entire building full of employees, followed up with a recall message and an even more helpful second email explaining why the first message is “highly confidential” and urging recipients to delete it without reading.)

In this case there was nothing particularly confidential or inappropriate about the original message, perhaps a misspelling here and there or an incomplete sentence. But the original sender– identity unknown because the message was sent out on behalf of the distribution list– dutifully recalled it. And that is when all hell broke loose. The first indication of something amiss emerged when this author, along with 2000+ employees on that alias, started receiving recall success/failure messages in his inbox. All of this is by design: when you recall a message, you can ask for confirmation from Exchange as to whether the recall succeeded or whether the recipient read the message. That generates one status message per potential recipient, returned to the original sender.

Except that in this case the message was sent on behalf of the distribution list, “Bluehat Alerts” which contained over two thousand employees. So the status messages naturally were also sent to the same DL.

That would be 2000+ status messages each delivered to 2000+ members of the alias.

Unwittingly the sender had just pulled off a remarkable denial-of-service attack against the corporate email system and succeeded in bringing the pilot deployment of the new Exchange Titanium to its breaking point.

And it only took 2 messages. One of which was intended to announce an event focused on computer security , on building/breaking systems that can survive hostile attacks. The irony was inescapable.

(continued)

cemp

Quasi-secrets: Public vs private information at Facebook

The recent debacle over Facebook “News Feed” feature once again demonstrates the large gray-area of information that is neither public nor private. In this case the Facebook designers added a feature that updates users about their friends’ recent activities. On the face of it, this is a simple change: users can only see activity for their friends– information they already have access to. The only difference is delivery model; instead of having to “pull” the information by visiting each friend’s page, it is now delivered to you in the fashionable “push” model that was a great success when applied to blogs/RSS feeds.

Reaction was instant and swift, as described in this CNet article and slew of other coverage including ABC. The fact that no new information is made available to users and only the access mode is different was apparently lost on the critics. In a response posted on the website, Facebook founder emphasized this point:

“The privacy rules haven’t changed. None of your information is visible to anyone who couldn’t see it before the changes.”

This is not the first time that a disruptive change in the way information can be searched/accessed has changed its classification from being “private” to “public,” even when it was public to start with. Google experienced similar phenomenon when it supported reverse telephone record look-ups, mapping the number NNN-NNN-NNNN back to an individual and even home address. If this is creepy, then so is the plain old phone book. Granted the phone-book indexes by name and only allows one-way look ups; given name you can look up the number, but not the other way around. But in principle one could search every single page to locate the number. And for an 10 digit number in the US that expensive process would have to be repeated for every phone book from every region. That prohibitive cost creates the illusion that the information was private in the first place. Replace monkeys flipping through white-book pages with a database, and the same query can be executed in a fraction of a second. Same data-set, different implementation and you have a privacy nightmare. Another example: the majority of court records are public. Anybody willing to spend some time digging through dusty archives in the basement of the county court house could dig up juicy information on residents’ life. (No doubt this useful feature has created a livelihood for private investigators.) That availability has not worried many people, until some cities proposed to place the same information online, free for indexing by search engines.

Facebook discovered its own variant of “quasi-secrets:” information that is public if you know where to look and the resources to retrieve it. The scarcity of individuals with know-how and determination to access it created the sense of privacy. A novice developer could write a few lines of code to monitor the activities of his/her friends on Facebook. Yet when the website itself provides that functionality and makes it readily available to anybody, users cry foul.

cemp

Inaugural post with WordPress

After trying and failing to find a way to remove advertisement from Spaces blog, trying a new home here with WordPress. One would imagine that MSFT employees working on Windows Live should be able to use Live Spaces for blogging related to work (instead of the MSDN blogs) and that advertisements can be removed for this purpose. After all if you are printing your blog URL on a business card and expect professional contacts to visit that space, the last thing the space requires is information on treating nail infections. But that assumption is incorrect as the only way to remove banner ads from an MSN blog is signing for MSN Premium, another MSFT service. Spaces team did not have a process for tracking blogs for employees and perhaps it would have been difficult to manage that group when considering 1000+ potential users.

Assuming the initial experiences with WordPress prove positive, all the content from Random Oracle will be moved here going forward.
cemp