World Wide Web

The World Wide Web (also known as WWW or simply the Web^[1]) is an information system that enables content sharing over the Internet through user-friendly ways meant to appeal to users beyond IT specialists and hobbyists.^[2] It allows documents and other web resources to be accessed over the Internet according to specific rules of the Hypertext Transfer Protocol (HTTP).^[3]

Quick facts Abbreviation, Status ...

World Wide Web
Abbreviation	WWW
Status	Active
Year started	1989; 36 years ago (1989)
First published	6 August 1991; 34 years ago (1991-08-06)
Organization	CERN (1989–1994) W3C (1994–current)
Authors	Tim Berners-Lee

The Web was invented by English computer scientist Tim Berners-Lee while at CERN in 1989 and opened to the public in 1993. It was conceived as a "universal linked information system".^[4]^[5]^[6] Documents and other media content are made available to the network through web servers and can be accessed by programs such as web browsers. Servers and resources on the World Wide Web are identified and located through character strings called uniform resource locators (URLs).

The original and still very common document type is a web page formatted in Hypertext Markup Language (HTML). This markup language supports plain text, images, embedded video and audio contents, and scripts (short programs) that implement complex user interaction. The HTML language also supports hyperlinks (embedded URLs) which provide immediate access to other web resources. Web navigation, or web surfing, is the common practice of following such hyperlinks across multiple websites. Web applications are web pages that function as application software. The information in the Web is transferred across the Internet using HTTP. Multiple web resources with a common theme and usually a common domain name make up a website. A single web server may provide multiple websites, while some websites, especially the most popular ones, may be provided by multiple servers. Website content is provided by a myriad of companies, organizations, government agencies, and individual users; and comprises an enormous amount of educational, entertainment, commercial, and government information.

The Web has become the world's dominant information systems platform.^[7]^[8]^[9]^[10] It is the primary tool that billions of people worldwide use to interact with the Internet.^[3]

Remove ads

History

Summarize

Perspective

The Web was invented by English computer scientist Tim Berners-Lee while working at CERN.^[11]^[12] He was motivated by the problem of storing, updating, and finding documents and data files in that large and constantly changing organization, as well as distributing them to collaborators outside CERN. In his design, Berners-Lee dismissed the common tree structure approach, used for instance in the existing CERNDOC documentation system and in the Unix filesystem, as well as approaches that relied on tagging files with keywords, as in the VAX/NOTES system. Instead he adopted concepts he had put into practice with his private ENQUIRE system (1980) built at CERN. When he became aware of Ted Nelson's hypertext model (1965), in which documents can be linked in unconstrained ways through hyperlinks associated with "hot spots" embedded in the text, it helped to confirm the validity of his concept.^[13]^[14]

The model was later popularized by Apple's HyperCard system. Unlike Hypercard, Berners-Lee's new system from the outset was meant to support links between multiple databases on independent computers, and to allow simultaneous access by many users from any computer on the Internet. He also specified that the system should eventually handle other media besides text, such as graphics, speech, and video. Links could refer to mutable data files, or even fire up programs on their server computer. He also conceived "gateways" that would allow access through the new system to documents organized in other ways (such as traditional computer file systems or the Usenet). Finally, he insisted that the system should be decentralized, without any central control or coordination over the creation of links.^[5]^[15]^[11]^[12]

Berners-Lee submitted a proposal to CERN in May 1989, without giving the system a name.^[5] He got a working system implemented by the end of 1990, including a browser called WorldWideWeb (which became the name of the project and of the network) and an HTTP server running at CERN. As part of that development he defined the first version of the HTTP protocol, the basic URL syntax, and implicitly made HTML the primary document format.^[16] The technology was released outside CERN to other research institutions starting in January 1991, and then to the whole Internet on 23 August 1991. The Web was a success at CERN, and began to spread to other scientific and academic institutions. Within the next two years, there were 50 websites created.^[17]^[18]

CERN made the Web protocol and code available royalty free in 1993, enabling its widespread use.^[19]^[20] After the NCSA released the Mosaic web browser later that year, the Web's popularity grew rapidly as thousands of websites sprang up in less than a year.^[21]^[22] Mosaic was a graphical browser that could display inline images and submit forms that were processed by the HTTPd server.^[23]^[24] Marc Andreessen and Jim Clark founded Netscape the following year and released the Navigator browser, which introduced Java and JavaScript to the Web. It quickly became the dominant browser. Netscape became a public company in 1995 which triggered a frenzy for the Web and started the dot-com bubble.^[25] Microsoft responded by developing its own browser, Internet Explorer, starting the browser wars. By bundling it with Windows, it became the dominant browser for 14 years.^[26]

Berners-Lee founded the World Wide Web Consortium (W3C) which created XML in 1996 and recommended replacing HTML with stricter XHTML.^[27] In the meantime, developers began exploiting an IE feature called XMLHttpRequest to make Ajax applications and launched the Web 2.0 revolution. Mozilla, Opera, and Apple rejected XHTML and created the WHATWG which developed HTML5.^[28] In 2009, the W3C conceded and abandoned XHTML.^[29] In 2019, it ceded control of the HTML specification to the WHATWG.^[30]

The World Wide Web has been central to the development of the Information Age and is the primary tool billions of people use to interact on the Internet.^[31]^[32]^[33]^[10]

Remove ads

Nomenclature

Tim Berners-Lee states that World Wide Web is officially spelled as three separate words, each capitalised, with no intervening hyphens.^[34] Use of the www prefix has been declining, especially when web applications sought to brand their domain names and make them easily pronounceable. As the mobile web grew in popularity,^[35] services like Gmail.com, Outlook.com, Myspace.com, Facebook.com and Twitter.com are most often mentioned without adding "www." (or, indeed, ".com") to the domain.^[36]

In English, www is usually read as double-u double-u double-u.^[37] Some users pronounce it dub-dub-dub, particularly in New Zealand.^[38] Stephen Fry, in his "Podgrams" series of podcasts, pronounces it wuh wuh wuh.^[39] The English writer Douglas Adams once quipped in The Independent on Sunday (1999): "The World Wide Web is the only thing I know of whose shortened form takes three times longer to say than what it's short for".^[40]

Remove ads

Function

Summarize

Perspective

The terms Internet and World Wide Web are often used without much distinction. However, the two terms do not mean the same thing. The Internet is a global system of computer networks interconnected through telecommunications and optical networking. In contrast, the World Wide Web is a global collection of documents and other resources, linked by hyperlinks and URIs. Web resources are accessed using HTTP or HTTPS, which are application-level Internet protocols that use the Internet transport protocols.^[3]

Viewing a web page on the World Wide Web normally begins either by typing the URL of the page into a web browser or by following a hyperlink to that page or resource. The web browser then initiates a series of background communication messages to fetch and display the requested page. In the 1990s, using a browser to view web pages—and to move from one web page to another through hyperlinks—came to be known as 'browsing,' 'web surfing' (after channel surfing), or 'navigating the Web'. Early studies of this new behaviour investigated user patterns in using web browsers. One study, for example, found five user patterns: exploratory surfing, window surfing, evolved surfing, bounded navigation and targeted navigation.^[41]

The following example demonstrates the functioning of a web browser when accessing a page at the URL http://example.org/home.html. The browser resolves the server name of the URL (example.org) into an Internet Protocol address using the globally distributed Domain Name System (DNS). This lookup returns an IP address such as 203.0.113.4 or 2001:db8:2e::7334. The browser then requests the resource by sending an HTTP request across the Internet to the computer at that address. It requests service from a specific TCP port number that is well known for the HTTP service so that the receiving host can distinguish an HTTP request from other network protocols it may be servicing. HTTP normally uses port number 80 and for HTTPS it normally uses port number 443. The content of the HTTP request can be as simple as two lines of text:

GET /home.html HTTP/1.1
Host: example.org

The computer receiving the HTTP request delivers it to web server software listening for requests on port 80. If the web server can fulfil the request it sends an HTTP response back to the browser indicating success:

HTTP/1.1 200 OK
Content-Type: text/html; charset=UTF-8

followed by the content of the requested page. Hypertext Markup Language (HTML) for a basic web page might look like this:

<html>
  <head>
    <title>Example.org – The World Wide Web</title>
  </head>
  <body>
    <p>The World Wide Web, abbreviated as WWW and commonly known ...</p>
  </body>
</html>

The web browser parses the HTML and interprets the markup (<title>, <p> for paragraph, and such) that surrounds the words to format the text on the screen. Many web pages use HTML to reference the URLs of other resources such as images, other embedded media, scripts that affect page behaviour, and Cascading Style Sheets that affect page layout. The browser makes additional HTTP requests to the web server for these other Internet media types. As it receives their content from the web server, the browser progressively renders the page onto the screen as specified by its HTML and these additional resources.

HTML

Hypertext Markup Language (HTML) is the standard markup language for creating web pages and web applications. With Cascading Style Sheets (CSS) and JavaScript, it forms a triad of cornerstone technologies for the World Wide Web.^[42]

Web browsers receive HTML documents from a web server or from local storage and render the documents into multimedia web pages. HTML describes the structure of a web page semantically and originally included cues for the appearance of the document.

HTML elements are the building blocks of HTML pages. With HTML constructs, images and other objects such as interactive forms may be embedded into the rendered page. HTML provides a means to create structured documents by denoting structural semantics for text such as headings, paragraphs, lists, links, quotes and other items. HTML elements are delineated by tags, written using angle brackets. Tags such as <img /> and <input /> directly introduce content into the page. Other tags such as <p> surround and provide information about document text and may include other tags as sub-elements. Browsers do not display the HTML tags, but use them to interpret the content of the page.

HTML can embed programs written in a scripting language such as JavaScript, which affects the behaviour and content of web pages. Inclusion of CSS defines the look and layout of content. The World Wide Web Consortium (W3C), maintainer of both the HTML and the CSS standards, has encouraged the use of CSS over explicit presentational HTML since 1997.^[update]^[43]

Linking

Most web pages contain hyperlinks to other related pages and perhaps to downloadable files, source documents, definitions and other web resources. In the underlying HTML, a hyperlink looks like this: <a href="http://example.org/home.html">Example.org Homepage</a>.

Such a collection of useful, related resources, interconnected via hypertext links is dubbed a web of information. Publication on the Internet created what Tim Berners-Lee first called the WorldWideWeb (in its original CamelCase, which was subsequently discarded) in November 1990.^[44]

The hyperlink structure of the web is described by the webgraph: the nodes of the web graph correspond to the web pages (or URLs) the directed edges between them to the hyperlinks. Over time, many web resources pointed to by hyperlinks disappear, relocate, or are replaced with different content. This makes hyperlinks obsolete, a phenomenon referred to in some circles as link rot, and the hyperlinks affected by it are often called "dead" links. The ephemeral nature of the Web has prompted many efforts to archive websites. The Internet Archive, active since 1996, is the best known of such efforts.

WWW prefix

Many hostnames used for the World Wide Web begin with www because of the long-standing practice of naming Internet hosts according to the services they provide. The hostname of a web server is often www, in the same way that it may be ftp for an FTP server, and news or nntp for a Usenet news server. These hostnames appear as Domain Name System (DNS) or subdomain names, as in www.example.com. The use of www is not required by any technical or policy standard and many websites do not use it; the first web server was nxoc01.cern.ch.^[45] According to Paolo Palazzi, who worked at CERN along with Tim Berners-Lee, the popular use of www as subdomain was accidental; the World Wide Web project page was intended to be published at www.cern.ch while info.cern.ch was intended to be the CERN home page; however the DNS records were never switched, and the practice of prepending www to an institution's website domain name was subsequently copied.^[46]^{[better source needed]} Many established websites still use the prefix, or they employ other subdomain names such as www2, secure or en for special purposes. Many such web servers are set up so that both the main domain name (e.g., example.com) and the www subdomain (e.g., www.example.com) refer to the same site; others require one form or the other, or they may map to different web sites. The use of a subdomain name is useful for load balancing incoming web traffic by creating a CNAME record that points to a cluster of web servers. Since, currently^{[as of?]}, only a subdomain can be used in a CNAME, the same result cannot be achieved by using the bare domain root.^[47]^{[dubious – discuss]}

When a user submits an incomplete domain name to a web browser in its address bar input field, some web browsers automatically try adding the prefix "www" to the beginning of it and possibly ".com", ".org" and ".net" at the end, depending on what might be missing. For example, entering "microsoft" may be transformed to http://www.microsoft.com/ and "openoffice" to http://www.openoffice.org. This feature started appearing in early versions of Firefox, when it still had the working title 'Firebird' in early 2003, from an earlier practice in browsers such as Lynx.^[48]^{[unreliable source?]} It is reported that Microsoft was granted a US patent for the same idea in 2008, but only for mobile devices.^[49]

Scheme specifiers

The scheme specifiers http:// and https:// at the start of a web URI refer to Hypertext Transfer Protocol or HTTP Secure, respectively. They specify the communication protocol to use for the request and response. The HTTP protocol is fundamental to the operation of the World Wide Web, and the added encryption layer in HTTPS is essential when browsers send or retrieve confidential data, such as passwords or banking information. Web browsers usually automatically prepend http:// to user-entered URIs, if omitted.^{[citation needed]}

Pages

A web page (also written as webpage) is a document that is suitable for the World Wide Web and web browsers. A web browser displays a web page on a monitor or mobile device.

The term web page usually refers to what is visible, but may also refer to the contents of the computer file itself, which is usually a text file containing hypertext written in HTML or a comparable markup language. Typical web pages provide hypertext for browsing to other web pages via hyperlinks, often referred to as links. Web browsers will frequently have to access multiple web resource elements, such as reading style sheets, scripts, and images, while presenting each web page.

On a network, a web browser can retrieve a web page from a remote web server. The web server may restrict access to a private network such as a corporate intranet. The web browser uses the Hypertext Transfer Protocol (HTTP) to make such requests to the web server.

A static web page is delivered exactly as stored, as web content in the web server's file system. In contrast, a dynamic web page is generated by a web application, usually driven by server-side software. Dynamic web pages are used when each user may require completely different information, for example, bank websites, web email etc.

Static page

A static web page (sometimes called a flat page/stationary page) is a web page that is delivered to the user exactly as stored, in contrast to dynamic web pages which are generated by a web application.

Consequently, a static web page displays the same information for all users, from all contexts, subject to modern capabilities of a web server to negotiate content-type or language of the document where such versions are available and the server is configured to do so.

Dynamic pages

A server-side dynamic web page is a web page whose construction is controlled by an application server processing server-side scripts. In server-side scripting, parameters determine how the assembly of every new web page proceeds, including the setting up of more client-side processing.

A client-side dynamic web page processes the web page using JavaScript running in the browser. JavaScript programs can interact with the document via Document Object Model, or DOM, to query page state and alter it. The same client-side techniques can then dynamically update or change the DOM in the same way.

A dynamic web page is then reloaded by the user or by a computer program to change some variable content. The updating information could come from the server, or from changes made to that page's DOM. This may or may not truncate the browsing history or create a saved version to go back to, but a dynamic web page update using Ajax technologies will neither create a page to go back to nor truncate the web browsing history forward of the displayed page. Using Ajax technologies the end user gets one dynamic page managed as a single page in the web browser while the actual web content rendered on that page can vary. The Ajax engine sits only on the browser requesting parts of its DOM, the DOM, for its client, from an application server.

Dynamic HTML, or DHTML, is the umbrella term for technologies and methods used to create web pages that are not static web pages, though it has fallen out of common use since the popularization of AJAX, a term which is now itself rarely used. Client-side-scripting, server-side scripting, or a combination of these make for the dynamic web experience in a browser.^{[citation needed]}

JavaScript is a scripting language that was initially developed in 1995 by Brendan Eich, then of Netscape, for use within web pages.^[50] The standardised version is ECMAScript.^[50] To make web pages more interactive, some web applications also use JavaScript techniques such as Ajax (asynchronous JavaScript and XML). Client-side script is delivered with the page that can make additional HTTP requests to the server, either in response to user actions such as mouse movements or clicks, or based on elapsed time. The server's responses are used to modify the current page rather than creating a new page with each response, so the server needs only to provide limited, incremental information. Multiple Ajax requests can be handled at the same time, and users can interact with the page while data is retrieved. Web pages may also regularly poll the server to check whether new information is available.^[51]

Website

A website^[52] is a collection of related web resources including web pages, multimedia content, typically identified with a common domain name, and published on at least one web server. Notable examples are wikipedia.org, google.com, and amazon.com.

A website may be accessible via a public Internet Protocol (IP) network, such as the Internet, or a private local area network (LAN), by referencing a uniform resource locator (URL) that identifies the site.

Websites can have many functions and can be used in various fashions; a website can be a personal website, a corporate website for a company, a government website, an organization website, etc. Websites are typically dedicated to a particular topic or purpose, ranging from entertainment and social networking to providing news and education. All publicly accessible websites collectively constitute the World Wide Web, while private websites, such as a company's website for its employees, are typically a part of an intranet.

Web pages, which are the building blocks of websites, are documents, typically composed in plain text interspersed with formatting instructions of Hypertext Markup Language (HTML, XHTML). They may incorporate elements from other websites with suitable markup anchors. Web pages are accessed and transported with the Hypertext Transfer Protocol (HTTP), which may optionally employ encryption (HTTP Secure, HTTPS) to provide security and privacy for the user. The user's application, often a web browser, renders the page content according to its HTML markup instructions onto a display terminal.

Hyperlinking between web pages conveys to the reader the site structure and guides the navigation of the site, which often starts with a home page containing a directory of the site web content. Some websites require user registration or subscription to access content. Examples of subscription websites include many business sites, news websites, academic journal websites, gaming websites, file-sharing websites, message boards, web-based email, social networking websites, websites providing real-time price quotations for different types of markets, as well as sites providing various other services. End users can access websites on a range of devices, including desktop and laptop computers, tablet computers, smartphones and smart TVs.

Browser

A web browser (commonly referred to as a browser) is a software user agent for accessing information on the World Wide Web. To connect to a website's server and display its pages, a user needs to have a web browser program. This is the program that the user runs to download, format, and display a web page on the user's computer.

In addition to allowing users to find, display, and move between web pages, a web browser will usually have features like keeping bookmarks, recording history, managing cookies (see below), and home pages and may have facilities for recording passwords for logging into websites.

The most popular browsers are Chrome, Safari, Edge, Samsung Internet and Firefox.^[53]

Server

A Web server is server software, or hardware dedicated to running said software, that can satisfy World Wide Web client requests. A web server can, in general, contain one or more websites. A web server processes incoming network requests over HTTP and several other related protocols.

The primary function of a web server is to store, process and deliver web pages to clients.^[54] The communication between client and server takes place using the Hypertext Transfer Protocol (HTTP). Pages delivered are most frequently HTML documents, which may include images, style sheets and scripts in addition to the text content.

A user agent, commonly a web browser or web crawler, initiates communication by making a request for a specific resource using HTTP and the server responds with the content of that resource or an error message if unable to do so. The resource is typically a real file on the server's secondary storage, but this is not necessarily the case and depends on how the webserver is implemented.

While the primary function is to serve content, full implementation of HTTP also includes ways of receiving content from clients. This feature is used for submitting web forms, including uploading of files.

Many generic web servers also support scripting using Active Server Pages (ASP), PHP (Hypertext Preprocessor), or other scripting languages. This means that the behaviour of the webserver can be scripted in separate files, while the actual server software remains unchanged. Usually, this function is used to generate HTML documents dynamically ("on-the-fly") as opposed to returning static documents. The former is primarily used for retrieving or modifying information from databases. The latter is typically much faster and more easily cached but cannot deliver dynamic content.

Web servers can also frequently be found embedded in devices such as printers, routers, webcams and serving only a local network. The web server may then be used as a part of a system for monitoring or administering the device in question. This usually means that no additional software has to be installed on the client computer since only a web browser is required (which now is included with most operating systems).

Optical Networking

Optical networking is a sophisticated infrastructure that utilizes optical fiber to transmit data over long distances, connecting countries, cities, and even private residences. The technology uses optical microsystems like tunable lasers, filters, attenuators, switches, and wavelength-selective switches to manage and operate these networks.^[55]^[56]

The large quantity of optical fiber installed throughout the world at the end of the twentieth century set the foundation of the Internet as it is used today. The information highway relies heavily on optical networking, a method of sending messages encoded in light to relay information in various telecommunication networks.^[57]

The Advanced Research Projects Agency Network (ARPANET) was one of the first iterations of the Internet, created in collaboration with universities and researchers 1969.^[58]^[59]^[60]^[61] However, access to the ARPANET was limited to researchers, and in 1985, the National Science Foundation founded the National Science Foundation Network (NSFNET), a program that provided supercomputer access to researchers.^[61]

Limited public access to the Internet led to pressure from consumers and corporations to privatize the network. In 1993, the US passed the National Information Infrastructure Act, which dictated that the National Science Foundation must hand over control of the optical capabilities to commercial operators.^[62]^[63]

The privatization of the Internet and the release of the World Wide Web to the public in 1993 led to an increased demand for Internet capabilities. This spurred developers to seek solutions to reduce the time and cost of laying new fiber and increase the amount of information that can be sent on a single fiber, in order to meet the growing needs of the public.^[64]^[65]^[66]^[67]

In 1994, Pirelli S.p.A.'s optical components division introduced a wavelength-division multiplexing (WDM) system to meet growing demand for increased data transmission. This four-channel WDM technology allowed more information to be sent simultaneously over a single optical fiber, effectively boosting network capacity.^[68]^[69]

Pirelli wasn't the only company that developed a WDM system; another company, the Ciena Corporation (Ciena), created its own technology to transmit data more efficiently. David Huber, an optical networking engineer and entrepreneur Kevin Kimberlin founded Ciena in 1992.^[70]^[71]^[72] Drawing on laser technology from Gordon Gould and William Culver of Optelecom, Inc., the company focused on utilizing optical amplifiers to transmit data via light.^[73]^[74]^[75] Under chief executive officer Pat Nettles, Ciena developed a dual-stage optical amplifier for dense wavelength-division multiplexing (DWDM), patented in 1997 and deployed on the Sprint network in 1996.^[76]^[77]^[78]^[79]^[80]

An HTTP cookie (also called web cookie, Internet cookie, browser cookie, or simply cookie) is a small piece of data sent from a website and stored on the user's computer by the user's web browser while the user is browsing. Cookies were designed to be a reliable mechanism for websites to remember stateful information (such as items added in the shopping cart in an online store) or to record the user's browsing activity (including clicking particular buttons, logging in, or recording which pages were visited in the past). They can also be used to remember arbitrary pieces of information that the user previously entered into form fields such as names, addresses, passwords, and credit card numbers.

Cookies perform essential functions in the modern web. Perhaps most importantly, authentication cookies are the most common method used by web servers to know whether the user is logged in or not, and which account they are logged in with. Without such a mechanism, the site would not know whether to send a page containing sensitive information or require the user to authenticate themselves by logging in. The security of an authentication cookie generally depends on the security of the issuing website and the user's web browser, and on whether the cookie data is encrypted. Security vulnerabilities may allow a cookie's data to be read by a hacker, used to gain access to user data, or used to gain access (with the user's credentials) to the website to which the cookie belongs (see cross-site scripting and cross-site request forgery for examples).^[81]

Tracking cookies, and especially third-party tracking cookies, are commonly used as ways to compile long-term records of individuals' browsing histories – a potential privacy concern that prompted European^[82] and U.S. lawmakers to take action in 2011.^[83]^[84] European law requires that all websites targeting European Union member states gain "informed consent" from users before storing non-essential cookies on their device.

Google Project Zero researcher Jann Horn describes ways cookies can be read by intermediaries, like Wi-Fi hotspot providers. When in such circumstances, he recommends using the browser in private browsing mode (widely known as Incognito mode in Google Chrome).^[85]

Search engine

A web search engine or Internet search engine is a software system that is designed to carry out web search (Internet search), which means to search the World Wide Web in a systematic way for particular information specified in a web search query. The search results are generally presented in a line of results, often referred to as search engine results pages (SERPs). The information may be a mix of web pages, images, videos, infographics, articles, research papers, and other types of files. Some search engines also mine data available in databases or open directories. Unlike web directories, which are maintained only by human editors, search engines also maintain real-time information by running an algorithm on a web crawler. Internet content that is not capable of being searched by a web search engine is generally described as the deep web.

In 1990, Archie, the world's first search engine, was released. The technology was originally an index of File Transfer Protocol (FTP) sites, which was a method for moving files between a client and a server network.^[86]^[87] This early search tool was superseded by more advanced engines like Yahoo! in 1995 and Google in 1998.^[88]^[89]

Deep web

Deep web diagram

Deep web vs surface web

Surface Web & Deep Web

The deep web,^[90] invisible web,^[91] or hidden web^[92] are parts of the World Wide Web whose contents are not indexed by standard web search engines. The opposite term to the deep web is the surface web, which is accessible to anyone using the Internet.^[93] Computer scientist Michael K. Bergman is credited with coining the term deep web in 2001 as a search indexing term.^[94]

The content of the deep web is hidden behind HTTP forms,^[95]^[96] and includes many very common uses such as web mail, online banking, and services that users must pay for, and which is protected by a paywall, such as video on demand, some online magazines and newspapers, among others.

The content of the deep web can be located and accessed by a direct URL or IP address and may require a password or other security access past the public website page.

Caching

A web cache is a server computer located either on the public Internet or within an enterprise that stores recently accessed web pages to improve response time for users when the same content is requested within a certain time after the original request. Most web browsers also implement a browser cache by writing recently obtained data to a local data storage device. HTTP requests by a browser may ask only for data that has changed since the last access. Web pages and resources may contain expiration information to control caching to secure sensitive data, such as in online banking, or to facilitate frequently updated sites, such as news media. Even sites with highly dynamic content may permit basic resources to be refreshed only occasionally. Web site designers find it worthwhile to collate resources such as CSS data and JavaScript into a few site-wide files so that they can be cached efficiently. Enterprise firewalls often cache Web resources requested by one user for the benefit of many users. Some search engines store cached content of frequently accessed websites.

Remove ads

Security

Summarize

Perspective

For criminals, the Web has become a venue to spread malware and engage in a range of cybercrime, including (but not limited to) identity theft, fraud, espionage, and intelligence gathering.^[97] Web-based vulnerabilities now outnumber traditional computer security concerns,^[98]^[99] and as measured by Google, about one in ten web pages may contain malicious code.^[100] Most web-based attacks take place on legitimate websites, and most, as measured by Sophos, are hosted in the United States, China and Russia.^[101] The most common of all malware threats is SQL injection attacks against websites.^[102] Through HTML and URIs, the Web was vulnerable to attacks like cross-site scripting (XSS) that came with the introduction of JavaScript^[103] and were exacerbated to some degree by Web 2.0 and Ajax web design that favours the use of scripts.^[104] In one 2007 estimate, 70% of all websites are open to XSS attacks on their users.^[105] Phishing is another common threat to the Web. In February 2013, RSA (the security division of EMC) estimated the global losses from phishing at $1.5 billion in 2012.^[106] Two of the well-known phishing methods are Covert Redirect and Open Redirect.

Proposed solutions vary. Large security companies like McAfee already design governance and compliance suites to meet post-9/11 regulations,^[107] and some, like Finjan Holdings have recommended active real-time inspection of programming code and all content regardless of its source.^[97] Some have argued that for enterprises to see Web security as a business opportunity rather than a cost centre,^[108] while others call for "ubiquitous, always-on digital rights management" enforced in the infrastructure to replace the hundreds of companies that secure data and networks.^[109] Jonathan Zittrain has said users sharing responsibility for computing safety is far preferable to locking down the Internet.^[110]

Remove ads

Privacy

Summarize

Perspective

Every time a client requests a web page, the server can identify the request's IP address. Web servers usually log IP addresses in a log file. Also, unless set not to do so, most web browsers record requested web pages in a viewable history feature, and usually cache much of the content locally. Unless the server-browser communication uses HTTPS encryption, web requests and responses travel in plain text across the Internet and can be viewed, recorded, and cached by intermediate systems. Another way to hide personally identifiable information is by using a virtual private network. A VPN encrypts traffic between the client and VPN server, and masks the original IP address, lowering the chance of user identification.

When a web page asks for, and the user supplies, personally identifiable information—such as their real name, address, e-mail address, etc. web-based entities can associate current web traffic with that individual. If the website uses HTTP cookies, username, and password authentication, or other tracking techniques, it can relate other web visits, before and after, to the identifiable information provided. In this way, a web-based organization can develop and build a profile of the individual people who use its site or sites. It may be able to build a record for an individual that includes information about their leisure activities, their shopping interests, their profession, and other aspects of their demographic profile. These profiles are of potential interest to marketers, advertisers, and others. Depending on the website's terms and conditions and the local laws that apply information from these profiles may be sold, shared, or passed to other organizations without the user being informed. For many ordinary people, this means little more than some unexpected emails in their inbox or some uncannily relevant advertising on a future web page. For others, it can mean that time spent indulging an unusual interest can result in a deluge of further targeted marketing that may be unwelcome. Law enforcement, counterterrorism, and espionage agencies can also identify, target, and track individuals based on their interests or proclivities on the Web.

Social networking sites usually try to get users to use their real names, interests, and locations, rather than pseudonyms, as their executives believe that this makes the social networking experience more engaging for users. On the other hand, uploaded photographs or unguarded statements can be identified to an individual, who may regret this exposure. Employers, schools, parents, and other relatives may be influenced by aspects of social networking profiles, such as text posts or digital photos, that the posting individual did not intend for these audiences. Online bullies may make use of personal information to harass or stalk users. Modern social networking websites allow fine-grained control of the privacy settings for each posting, but these can be complex and not easy to find or use, especially for beginners.^[111] Photographs and videos posted onto websites have caused particular problems, as they can add a person's face to an online profile. With modern and potential facial recognition technology, it may then be possible to relate that face with other, previously anonymous, images, events, and scenarios that have been imaged elsewhere. Due to image caching, mirroring, and copying, it is difficult to remove an image from the World Wide Web.

Remove ads

Standards

Summarize

Perspective

Web standards include many interdependent standards and specifications, some of which govern aspects of the Internet, not just the World Wide Web. Even when not web-focused, such standards directly or indirectly affect the development and administration of websites and web services. Considerations include the interoperability, accessibility and usability of web pages and web sites.

Web standards, in the broader sense, consist of the following:

Recommendations published by the World Wide Web Consortium (W3C)^[112]
"Living Standard" made by the Web Hypertext Application Technology Working Group (WHATWG)
Request for Comments (RFC) documents published by the Internet Engineering Task Force (IETF)^[113]
Standards published by the International Organization for Standardization (ISO)^[114]
Standards published by Ecma International (formerly ECMA)^[115]
The Unicode Standard and various Unicode Technical Reports (UTRs) published by the Unicode Consortium^[116]
Name and number registries maintained by the Internet Assigned Numbers Authority (IANA)^[117]

Web standards are not fixed sets of rules but are constantly evolving sets of finalized technical specifications of web technologies.^[118] Web standards are developed by standards organizations—groups of interested and often competing parties chartered with the task of standardization—not technologies developed and declared to be a standard by a single individual or company. It is crucial to distinguish those specifications that are under development from the ones that already reached the final development status (in the case of W3C specifications, the highest maturity level).

Remove ads

Accessibility

There are methods for accessing the Web in alternative mediums and formats to facilitate use by individuals with disabilities. These disabilities may be visual, auditory, physical, speech-related, cognitive, neurological, or some combination. Accessibility features also help people with temporary disabilities, like a broken arm, or ageing users as their abilities change.^[119] The Web is receiving information as well as providing information and interacting with society. The World Wide Web Consortium claims that it is essential that the Web be accessible, so it can provide equal access and equal opportunity to people with disabilities.^[120] Tim Berners-Lee once noted, "The power of the Web is in its universality. Access by everyone regardless of disability is an essential aspect."^[119] Many countries regulate web accessibility as a requirement for websites.^[121] International co-operation in the W3C Web Accessibility Initiative led to simple guidelines that web content authors as well as software developers can use to make the Web accessible to persons who may or may not be using assistive technology.^[119]^[122]

Remove ads

Internationalisation

The W3C Internationalisation Activity assures that web technology works in all languages, scripts, and cultures.^[123] Beginning in 2004 or 2005, Unicode gained ground and eventually in December 2007 surpassed both ASCII and Western European as the Web's most frequently used character map.^[124] Originally RFC 3986 allowed resources to be identified by URI in a subset of US-ASCII. RFC 3987 allows more characters—any character in the Universal Character Set—and now a resource can be identified by IRI in any language.^[125]

Remove ads

References

Loading content...

External links

Loading content...

Loading related searches...

Wikiwand - on

Seamless Wikipedia browsing. On steroids.

Remove ads

History

Nomenclature

Function

HTML

Linking

WWW prefix

Scheme specifiers

Pages

Static page

Dynamic pages

Website

Browser

Server

Optical Networking

Search engine

Deep web

Caching

Security

Privacy

Standards

Accessibility

Internationalisation

See also

References

Further reading

External links

Wikiwand - on