User:Dr pda/prosesize

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

This script adds a Page size link to the toolbox, i.e. the box in the left hand column (by default) which also contains What links here (among other things). Clicking on this link displays some statistics about the page and prose size (see below), and highlights the 'readable prose'. Clicking the link again turns these off. Sizes are displayed in kilobytes (kB), or in bytes if the value is less than 10kB.

For the alternative version which always displays sizes in bytes, not kilobytes, see User:Dr pda/prosesizebytes.js.

How to get it working[edit]

Installing the script[edit]

  1. Open your common.js file and click Edit
  2. Add the follow to the beginning or the end of the file:
    // [[User:Dr pda/prosesize]]
    mw.loader.load('//en.wikipedia.org/w/index.php?title=User%3ADr+pda%2Fprosesize.js&action=raw&ctype=text/javascript');
    
  3. Save your changes

Since 2014, you should no longer need to bypass your browser's cache, although pages that are already loaded need to be reloaded.

To try without installing[edit]

Via console

Internet Explorer, Firefox and Google Chrome come with built-in developer tools, activated with F12 key. All three support invoking JavaScript commands on pages. Navigate to the page in which you are interested. Open the developer tools. Activate console. Execute the following commands. Each line should be executed separately.

importScript('User:Dr pda/prosesize.js');
getDocumentSize();

Via address bar

An alternative way to run the script without installing it is to go to the page you are interested in, then paste the following into the address bar of your browser instead of the URL.

javascript:importScript('User:Dr pda/prosesize.js'); getDocumentSize();

This does not work in modern web browsers.

Via bookmarklets

It's possible to make a bookmark and supply the above instead of the URL. But tests in IE, Firefox and Chrome shows that you might need to click the bookmark twice and wait a few seconds between each attempt, depending on your connection speed.

Sample output[edit]

Document statistics:

  • File size: 89 kB
  • Prose size (including all HTML code): 28 kB
  • References (including all HTML code): 10 kB
  • Wiki text: 31.8 kB
  • Prose size (text only): 18 kB (3310 words) "readable prose size"
  • References (text only): 4 kB
  • Images: 443 kB

Quick summary[edit]

  • File size: size of HTML document
  • Prose size (including all HTML code): size of HTML within <p></p> tags
  • References (including all HTML code): size of HTML for cite.php references
  • Wiki text: size of text+markup within the edit box
  • Prose size (text only): size of text within <p></p> tags. This is the so-called "readable prose size"
  • References (text only): size of text for cite.php references
  • Images: size of image thumbnails (Internet Explorer only)

File size[edit]

This is the total size of the HTML document. If you went to View->Page Source (or the equivalent) in your browser, and saved the resulting output to your computer, the file size would be the size of this file. This number does not include any images. The file size (plus the image size) is what you need to look at when considering how long a page will take to load.

For Internet Explorer this number is obtained from the document.fileSize property. For other browsers it is obtained by loading the page again with an XMLHttpRequest, so this number may take a few seconds to appear.

Prose size[edit]

Wikipedia:Article size says

there [are] stylistic reasons why the main body of an article should not be unreasonably long, including readability issues ... For stylistic purposes, only the main body of prose (excluding links, see also, reference and footnote sections, and lists/tables) should be counted toward an article's total size, since the point is to limit the size of the main body of prose.

One of the main motivations for this script was to provide a convenient way of calculating the prose size. The technique used is to just count the text within <p></p> tags in the HTML source of the document, which corresponds almost exactly to the definition of 'readable prose'. (Feb 2011: The script has been updated to now count text in <blockquote> tags as well.) This method is not perfect however and may include text which isn't prose (eg in navboxes), or exclude text which is (eg in {{cquote}}, or prose written in bullet-point form, eg Anarchism#Recent developments within Anarchism). The text counted as prose is highlighted in yellow, so it is easy to see whether the prose size is over or underestimated.

Two numbers are given for the prose size: HTML and text only. The HTML size is the size of the HTML code contained within <p></p> tags. This number can be compared to the file size to see how much of the document consists of readable prose. The text-only size is the size of just the words, without any formatting. (This is what you would get if you copied and pasted the prose from the article into something like notepad, which strips out all the formatting). The word count is self-explanatory, and is calculated from the number of spaces in the text-only prose. Note that Internet Explorer highlights the section headings, but does not count them as prose. (This is because there is an 'invisible' <p></p> before them containing a link so that you jump to the right place when you click the appropriate section in the table of contents.)

References size[edit]

Now that cite.php inline citations are becoming very common, it is often useful to know how much of the article size comes from these references. The HTML references size is the size of what is produced by the <references/> tag, plus the size of the HTML to produce the markers (i.e. [1]). The text-only size is again just the text of the references, plus the text of the markers. Note that the contribution of the markers is explicitly subtracted from both prose size numbers. The markers also should not affect the word count, since there should be no spaces between them and the preceding word/punctuation.

Wiki text size[edit]

In addition to the above numbers, which are calculated from the HTML source of the page, there is also the size of the text plus wiki markup which appears in the edit box when you edit a page. This number is shown next to each revision on the History tab, and is also the same number which appears in warnings about page length (e.g. Note: This page is 37 kilobytes long.). The prose size script queries the API automatically to retrieve this value for the current article. This involves another XMLHttpRequest, so it may take a few seconds for the number to appear; if there is a problem with the search, the script will display an error message.

Images size[edit]

N.B. This only works in Internet Explorer (or browsers supporting the element.fileSize parameter).

This number is the total size of the image thumbnails, i.e. the size of the images which actually appear on the page, not the full size versions they link to. The total number/size of images affects how long the page takes to load, although the text of the page is loaded first and hence readable while the images are still loading. It is also possible to turn off images to speed up loading of the page. Note that the script only counts images within the article (i.e. not the WP logo, skin background, etc). It also currently counts every occurrence of a repeated image, whereas the browser only needs to download it once (this would have an effect on pages with many flags denoting nationality for example).