The Big Data challenge

We are hearing more and more the term Big Data, and I suppose like many vogue terms it can mean whatever we want it to mean. However, it seems that the basic idea it represents is that computer technology now allows us to collect more information/data than has ever been possible in human history, and that is precisely what we are doing – hence Big Data. Simple examples of Big Data in operation are the vast data warehouses operated globally by Google, and of course the governmental collecting of vast amounts of data in a digital format. But it is not merely giant corporations or governments that are able to establish vast repositories of digital data. Computer technology allows us as individuals to create our own large scale databases. The exponential growth in storage capacity means that we can store huge quantities of material on our personal digital devices, or alternatively store such material in ‘The Cloud’, that mysterious phenomenon provided by commercial operators that allows us limitless storage capacity out there in the cyber ether – although in reality it is held in places such as the Google data warehouses. Much of our own personal Big Data of course is likely to take the form of films, music and photographs.

For academics the revolution in data collection and storage is already fundamentally changing how some forms of research is conducted. A recent article in the New York Times by John Markoff  (20th May 2013), ‘New Research Tools Kick Up Dust in Archives’, showed how researchers no longer needed to spend days/weeks in libraries and archives painstakingly going through books and documents to find material of relevance to their research. Instead, using a digital camera and laptop they can simply photograph everything and download it into their computer, creating their own vast personal research database. Ah yes you might say, they will still need to read everything to find the relevant material. Perhaps….but as the Markoff article highlights, there are now being developed data mining tools which seek to provide a contextual basis to the vast quantities of data collected. Such tools and variants on them are being used increasingly widely. For example for professional lawyers E-discovery technologies are of growing importance in sifting through material to find what might be relevant to their particular cases.

If professionals and academics can effectively use such technology I wonder whether undergraduate students might over time seek to follow suit. What might be the purpose of them copying into digital devices pages of material and then using data mining tools other than to replicate on a far smaller scale what the academic researcher is undertaking? Well if we take this a stage further, could this be combined with essay writing software to actually write students essays for them? Without doubt essay writing software is growing in sophistication.

With forms of plagiarism a matter of huge concern throughout academia, how might we deal with this? Currently we use proprietary systems such as Turnitin; but that relies heavily on a database of material by which to compare student essays, which would be ineffective where material was in essence being ‘freshly created.’

Before any students who may stumble across this Blog start feeling aggrieved, let me balance things up a little. With the technology making copying so much easier, even supposedly respected professionals can fall into the trap of trying to make their lives easier, and seeking unethical short cuts to do so. The recent case of Crinion & Crinion v IG Markets (2013), will not be remembered for the specific issues of the case, but rather for the unfortunate way the judge, His Honour Simon Brown QC, chose to write up his judgment. After having been provided with closing submissions in electronic format by both parties legal counsel, the judge proceeded to use almost verbatim (about 94%), the electronic submission of the claimant’s counsel in producing his written judgment. Whilst not overruling the judgment, the Court of Appeal understandably made clear their displeasure at the judge’s working practices in the case. Sir Stephen Sedley considered that, ‘Information technology has made it seductively easy to do what the judge did in this case.”….the possibility of something approaching electronic plagiarism is new, and it needs to be said and understood that it is unacceptable.’ (para. 39)

We are seeking to develop the tools to effectively utilise the Big Data Everests we are creating, but with time pressures a continuous constant in our lives, in seeking to conquer our Big Data Everest, at times it is very likely we will be tempted just to use our shiny new Big Data helicopter.


Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s