Information explosion

From Wikipedia, the free encyclopedia
  (Redirected from Data deluge)
Jump to navigation Jump to search

The information explosion is the rapid increase in the amount of published information or data and the effects of this abundance.[1] As the amount of available data grows, the problem of managing the information becomes more difficult, which can lead to information overload. The Online Oxford English Dictionary indicates use of the phrase in a March 1964 New Statesman article.[2] The New York Times first used the phrase in its editorial content in an article by Walter Sullivan on June 7, 1964, in which he described the phrase as "much discussed". (p11.) [3] The earliest use of the phrase seems to have been in an IBM advertising supplement to the New York Times published on April 30, 1961,[4] and by Frank Fremont-Smith, Director of the American Institute of Biological Sciences Interdisciplinary Conference Program, in an April 1961 article in the AIBS Bulletin (p18.) [5]

Many sectors are seeing this rapid increase in the amount of information available such as healthcare, supermarkets, and even governments with birth certificate informations and immunization records.[6] Another sector that is being affected by this phenomenon is journalism. Such profession, which in the past was responsible for the dissemination of information, may be suppressed by so many sources of information today.[7]

Techniques to gather knowledge from an overabundance of electronic information (e.g., data fusion may help in data mining) have existed since the 1970s. Another common technique to deal with such amount of information is qualitative research.[8] Such approach aims at organizing the information, synthesizing, categorizing and systematizing in order to be more usable and easier to search.

Growth patterns[edit]

  • The world's technological capacity to store information grew from 2.6 (optimally compressed) exabytes in 1986 to 15.8 in 1993, over 54.5 in 2000, and to 295 (optimally compressed) exabytes in 2007. This is equivalent to less than one 730-MB CD-ROM per person in 1986 (539 MB per person), roughly 4 CD-ROM per person of 1993, 12 CD-ROM per person in the year 2000, and almost 61 CD-ROM per person in 2007. Piling up the imagined 404 billion CD-ROM from 2007 would create a stack from the Earth to the Moon and a quarter of this distance beyond (with 1.2 mm thickness per CD).[9]
  • The world’s technological capacity to receive information through one-way broadcast networks was 432 exabytes of (optimally compressed) information in 1986, 715 (optimally compressed) exabytes in 1993, 1,200 (optimally compressed) exabytes in 2000, and 1,900 in 2007.[9]
  • The world's effective capacity to exchange information through two-way telecommunication networks was 0.281 exabytes of (optimally compressed) information in 1986, 0.471 in 1993, 2.2 in 2000, and 65 (optimally compressed) exabytes in 2007.[9]

A new metric that is being used in an attempt to characterize the growth in person-specific information, is the disk storage per person (DSP), which is measured in megabytes/person (where megabytes is 106 bytes and is abbreviated MB). Global DSP (GDSP) is the total rigid disk drive space (in MB) of new units sold in a year divided by the world population in that year. The GDSP metric is a crude measure of how much disk storage could possibly be used to collect person-specific data on the world population.[6] In 1983, one million fixed drives with an estimated total of 90 terabytes were sold worldwide; 30MB drives had the largest market segment.[10] In 1996, 105 million drives, totaling 160,623 terabytes were sold with 1 and 2 gigabyte drives leading the industry.[11] By the year 2000, with 20GB drive leading the industry, rigid drives sold for the year are projected to total 2,829,288 terabytes Rigid disk drive sales to top $34 billion in 1997.

According to Latanya Sweeney, there are three trends in data gathering today:

Type 1. Expansion of the number of fields being collected, known as the “collect more” trend.

Type 2. Replace an existing aggregate data collection with a person-specific one, known as the “collect specifically” trend.

Type 3. Gather information by starting a new person-specific data collection, known as the “collect it if you can” trend.[6]

Related terms[edit]

Since "information" in electronic media is often used synonymously with "data", the term information explosion is closely related to the concept of data flood (also dubbed data deluge). Sometimes the term information flood is used as well. All of those basically boil down to the ever-increasing amount of electronic data exchanged per time unit. The awareness about non-manageable amounts of data grew along with the advent of ever more powerful data processing since the mid-1960s.[12]

Challenges[edit]

Even though the abundance of information can be beneficial in several levels, some problems may be of concern such as privacy, legal and ethical guidelines, filtering and data accuracy.[13] Filtering refers to finding useful information in the middle of so much data, which relates to the job of data scientists. A typical example of a necessity of data filtering (data mining) is in healthcare since in the next years is due to have EHRs (Electronic Health Records) of patients available. With so much information available, the doctors will need to be able to identify patterns and select important data for the diagnosis of the patient.[13] On the other hand, according to some experts, having so much public data available makes it difficult to provide data that is actually anonymous.[6] Another point to take into account is the legal and ethical guidelines, which relates to who will be the owner of the data and how frequently he/she is obliged to the release this and for how long.[13] With so many sources of data, another problem will be accuracy of such. An untrusted source may be challenged by others, by ordering a new set of data, causing a repetition in the information.[13] According to Edward Huth, another concern is the accessibility and cost of such information.[14] The accessibility rate could be improved by either reducing the costs or increasing the utility of the information. The reduction of costs according to the author, could be done by associations, which should assess which information was relevant and gather it in a more organized fashion.

Web servers[edit]

As of August 2005, there were over 70 million web servers.[15] As of September 2007 there were over 135 million web servers.[16]

Blogs[edit]

According to Technorati, the number of blogs doubles about every 6 months with a total of 35.3 million blogs as of April 2006.[17] This is an example of the early stages of logistic growth, where growth is approximately exponential, since blogs are a recent innovation. As the number of blogs approaches the number of possible producers (humans), saturation occurs, growth declines, and the number of blogs eventually stabilizes.

See also[edit]

References[edit]

  1. ^ Hilbert, M. (2015). Global information Explosion:https://www.youtube.com/watch?v=8-AqzPe_gNs&list=PLtjBSCvWCU3rNm46D3R85efM0hrzjuAIg. Digital Technology and Social Change [Open Online Course at the University of California] freely available at: https://canvas.instructure.com/courses/949415
  2. ^ “Information.” http://dictionary.oed.com. accessed January 4, 2008
  3. ^ https://www.nytimes.com/1964/06/07/u-s-will-remove-reactor-in-arctic.html?_r=0
  4. ^ http://www-03.ibm.com/ibm/history/ibm100/us/en/icons/translation/
  5. ^ Davis, Keith (1973). "The Case for and against Business Assumption of Social Responsibilities". Academy of Management Journal. 16 (2): 312–322. doi:10.2307/255331. JSTOR 255331.
  6. ^ a b c d Sweeney, Latanya. "Information explosion." Confidentiality, disclosure, and data access: Theory and practical applications for statistical agencies (2001): 43-74.
  7. ^ Fuller, Jack. What is happening to news: The information explosion and the crisis in journalism. University of Chicago Press, 2010.
  8. ^ Major, Claire Howell, and Maggi Savin-Baden. An introduction to qualitative research synthesis: Managing the information explosion in social science research. Routledge, 2010.
  9. ^ a b c "The Womartinhilbert.net/WorldInfoCapacity.html "free access to the study" and "video animation".
  10. ^ Disk/Trend report 1983,” Computer Week. Mountain View, CA. (46) 11/11/83.
  11. ^ Rigid disk drive sales to top $34 billion in 1997,” Disk/Trend News. Mountain View, CA: Disk/Trend, Inc., 1997.
  12. ^ Google Books Ngram viewer for the terms mentioned here
  13. ^ a b c d Berner, Eta S., and Jacqueline Moss. "Informatics challenges for the impending patient information explosion." Journal of the American Medical Informatics Association 12.6 (2005): 614-617.
  14. ^ Huth, Edward J. "The information explosion." Bulletin of the New York Academy of Medicine 65.6 (1989): 647.
  15. ^ Robert H Zakon (15 December 2010). "Hobbes' Internet Timeline 10.1". zakon.org. Retrieved 27 August 2011.
  16. ^ "August 2011 Web Server Survey". netcraft.com. August 2011. Retrieved 27 August 2011.
  17. ^ "State of the Blogosphere, April 2006 Part 1: On Blogosphere Growth". Sifry's Alerts (sifry.com). April 17, 2006. Archived from the original on 19 November 2012. Retrieved 27 August 2011.

External links[edit]