Last updated: 25.05.2020

RDS-Q No. #4

Quran Statistics (Verse Level)

Reproducible Data Science - Quran | quran.telematika.org

Home GitHub PDF

Overview

The Holy Book Quran consists of 114 Chapters (Surah) and totally 6236 Verses (Ayah). The Quran was orally revealed over a period of 23 years and written for the first time at the time of Khulafaur Rasyidin. In the most printed editions, the Quran consists of 604 pages, which furtherly organized into parts, notably a so-called Juz. There is 30 Juz in total.

For the results presented in this document, the Quran text is based on the Uthmani version published by Tanzil project (http://tanzil.net/). Based on this text, the Quran is composed of 77430 words and 325666 letters (note: un-numbered Basmallah at the beginning of 112 chapters is not counted). For comparison, based on data published by corpus.quran.com (Kais Dukes, University of Leeds) the figures are 77429 words and 623638 (join letters and diacritics/harakah). The only difference is at QS 37:130 for the arabic words إِلْ يَاسِينَ (trans: Prophet Elijah / Ilyas a.s.). Quran corpus counts it as one word, while it in Tanzil's version of the Uthmani text is written as two words.

This document mainly presents some numbers and figures with minimal narration since it is meant to be a quick reference for supporting further research in some aspects of The Noble Quran. Data used in RDS-Q #4 is at verse level (length of 6236) i.e. data related to words and letters are pre-processed for the each verse, while data for juz and page is aggregated correspondingly.

All Verses - C,V,C+V

The 6236-length verse data is formed orderly by sequential chapter number (C) and sequential verse number (V). The Figure below clearly shows this arragement. The change in C is marked by resetting V to the base value of 1, forming a sawtooth-like graph. Length of the chapter is the distance beetween two points at this base value in the x-axis.

...

The following Figures give some histogram plots for C, V and the value of C+V. For each Figure, the number of bins are varied to see the behaviour of the parameters in different number of groups. While all parameters have shown some degree of possible patterns, the most interesting ones can be seen in the histograms of V and C+V.

... ... ...

The last Figure in this section depicts standard density functions for each parameters.

...

The following Table gives some statistical values for C,V and C+V parameters.

max 75% 50% 25% min mean std count sum
Chapter No. (C) 114 51.0 26.0 11.0 1 33.52 26.46 6236 209029
Verse No. (V) 286 75.0 38.0 16.0 1 53.51 50.46 6236 333667
C+V 288 107.0 83.0 58.0 2 87.03 43.98 6236 542696

All Verses - W,L

The following five Figures focus on the number of words (W) and letters (L).

... ...

The first Figure depicts linear plot of W over verse sequence. The second Figure gives some higtogram of W for different number of bins.

... ...

The third Figure depicts linear plot of L over verse sequence. The fourth Figure gives some higtogram of L for different number of bins. The last Figure below shows density plot of both W and L.

...

The following Table gives some statistical values for W and L.

max 75% 50% 25% min mean std count sum
Num. of Words 128 16.0 10.0 6.0 1 12.42 9.42 6236 77430
Num. of Letters 547 68.0 43.0 23.0 2 52.22 39.25 6236 325666

Split [1-3118] & [3119-6236]

For some investigations we might want to look at a segment of the data. The following Figure shows density functions of C,V and C+V for the segment [1-3118] and [3119-6236].

...

The following Table gives some statistical values for C,V and C+V for segmen [1-3118].

max 75% 50% 25% min mean std count sum
Chapter No. (C) 26 19.0 11.0 5.0 1 12.44 7.8 3118 38779
Verse No. (V) 286 103.0 64.0 31.0 1 74.05 55.08 3118 230879
C+V 288 116.0 78.0 45.0 2 86.48 53.19 3118 269658

The following Table gives some statistical values for C,V and C+V for segmen [3119-6236].

max 75% 50% 25% min mean std count sum
Chapter No. (C) 114 72.0 51.0 37.0 26 54.6 21.23 3118 170250
Verse No. (V) 227 44.0 23.0 10.0 1 32.97 34.88 3118 102788
C+V 253 103.0 85.0 67.0 28 87.57 32.26 3118 273038

...

The following Table gives some statistical values for W and L for segmen [1-3118].

max 75% 50% 25% min mean std count sum
Num. of Words 128 19.0 13.0 9.0 1 15.46 10.02 3118 48210
Num. of Letters 547 82.0 55.0 37.0 2 65.0 41.79 3118 202681

The following Table gives some statistical values for W and L for segmen [3119-6236].

max 75% 50% 25% min mean std count sum
Num. of Words 78 13.0 7.0 4.0 1 9.37 7.64 3118 29220
Num. of Letters 330 53.0 29.0 17.0 2 39.44 31.75 3118 122985

Odd/Even - Chapters & Verses

For symmetry investigations we might want to look at a segment of the data which is based on certain criteria, in this case: odd/even C+V criteria as reported in RDS-Q #1 and RDS-Q #2. The following Figure shows density functions of C,V and C+V for the odd and even segment.


...

As shown above, the odd and even parameters share (almost) the same density curve. This is due to the nature of sequential number of both chapter number C and verse number V (within the same chapter). From the shape perspective, all curves are similar to those without data split except that the amplitude is lower.


The following Table gives some statistical values for C,V and C+V for odd segmen.

max 75% 50% 25% min mean std count sum
Chapter No. (C) 114 51.0 26.0 11.0 1 33.52 26.43 3118 104516
Verse No. (V) 285 75.0 38.0 16.0 1 53.54 50.48 3118 166926
C+V 287 107.0 83.0 57.0 3 87.06 44.0 3118 271442

The following Table gives some statistical values for C,V and C+V for even segmen.

max 75% 50% 25% min mean std count sum
Chapter No. (C) 114 50.75 26.0 11.0 1 33.52 26.49 3118 104513
Verse No. (V) 286 75.0 38.0 16.0 1 53.48 50.45 3118 166741
C+V 288 106.0 82.0 58.0 2 87.0 43.97 3118 271254

Odd/Even - Words & Letters

The following Figure shows density functions of W and L for the odd and even C+V segment.

...

It is quite interesting that the density curves of both W and L for odd/even segment are almost identical. Differ to sequential parameter C and V previously mentioned, W and L are entirely derived from the verse text. The values in the table below are also interesting and have to some extent justified these curves.

The number of words for the odd/even group is 38716 / 38714. The number of letters for the odd/even group is 162821 / 162845. Both sum of W and L are almost halved. Note that if we are using data from corpus.quran.com, the number of words W for the odd group is 38715, while that of the even group is 38714.


The following Table gives some statistical values for W and L for odd segmen.

max 75% 50% 25% min mean std count sum
Num. of Words 78 16.0 10.0 5.0 1 12.42 9.54 3118 38716
Num. of Letters 344 68.75 42.0 23.0 2 52.22 39.92 3118 162821

The following Table gives some statistical values for W and L for even segmen.

max 75% 50% 25% min mean std count sum
Num. of Words 128 16.0 10.0 6.0 1 12.42 9.3 3118 38714
Num. of Letters 547 68.0 43.0 24.0 2 52.23 38.58 3118 162845

Pages - Chapters & Verses

As briefly mentioned in the overview, verse level data can be aggregated according to pages or juz. The Figure below depicts the number of chapters and verses for each page. Sure, the most pages contain only verses of a single chapter. The sum of the number of chapters includes repetitions.


...

The following Table gives some statistical values for C and V for pages data.

max 75% 50% 25% min mean std count sum
Num. of Chapters 3 1.0 1.0 1.0 1 1.1 0.33 604 662
Num. of Verses 42 11.0 8.0 7.0 1 10.32 6.18 604 6236

Pages - Words & Letters

Data can be further view to the level of words and letters. The following Figure gives the number of words and letters for each page.


...

The following Table gives some statistical values for W and L for pages data.

max 75% 50% 25% min mean std count sum
Num. of Words 161 137.0 129.0 121.0 29 128.2 14.92 604 77430
Num. of Letters 693 573.0 543.0 514.0 139 539.18 58.83 604 325666

Juz - Pages, Chapters & Verses

The following two Figures illustrate the number of pages (P), chapters (C) and verses (V) for each Juz.

... ...

The plot of P and C are given in the first Figure, while the number of V is in the second one. As we might already be realized, the most of Juz consists of 20 pages.

The following Table gives some statistical values for P, C and V for Juz data.

max 75% 50% 25% min mean std count sum
Num. of Pages 23 20.0 20.0 20.0 20 20.27 0.64 30 608
Num. of Chapters 37 4.0 3.0 2.0 1 4.5 6.55 30 135
Num. of Verses 564 220.75 170.5 143.5 110 207.87 107.57 30 6236

Note that a chapter can span Juz boundaries and a page can contain end/start of Juz simultaneously (i.e. page: 62, 121, 201, 502).


Juz - Words & Letters

The following Figure gives the number of words and letters for each Juz.


...

The following Table gives some statistical values for W and L for Juz data.

max 75% 50% 25% min mean std count sum
Num. of Words 2774 2640.25 2596.5 2520.5 2308 2581.0 100.27 30 77430
Num. of Letters 11497 11050.0 10900.0 10727.75 9704 10855.53 349.25 30 325666

Resources

URL
Data https://github.com/eueung/rds-q/tree/master/data
PDF https://github.com/eueung/rds-q/tree/master/PDF
Project (All) https://github.com/eueung/rds-q/
Web https://quran.telematika.org/00004/quran-statistics-6236.html
Web (All) https://quran.telematika.org/

Sample Data

ve_no_g ch_no ve_no page juz t_w_nb t_c_nb cav cavoe
3791 37 3 446 23 2 11 40 even
3381 29 41 401 20 19 88 70 even
2091 17 62 288 15 15 63 79 odd
2485 21 2 322 17 11 42 23 odd
545 4 52 87 5 11 43 56 even
2457 20 109 319 16 12 43 129 odd
3384 29 44 401 20 10 44 73 odd
5091 57 16 539 27 28 117 73 odd
4640 50 10 518 26 5 21 60 even
2180 18 40 298 15 15 62 58 even
5504 74 9 575 29 4 16 83 odd
2725 23 52 345 18 8 32 75 odd
839 6 50 133 7 28 103 56 even
1262 9 27 191 10 12 40 36 even
2465 20 117 320 16 12 48 137 odd
20 2 13 3 1 19 80 15 odd
3500 31 31 414 21 19 68 62 even
4593 48 10 512 26 25 104 58 even
5392 70 17 569 29 4 16 87 odd
1541 11 68 229 12 12 45 79 odd
829 6 40 132 7 15 61 46 even
4525 46 15 504 26 45 186 61 odd
358 3 65 58 3 15 67 68 even
4596 48 13 512 26 9 42 61 odd