procrastination = analysis of Google chars
I just did a fast-and-lousy analysis of the URLs of 78 of our GoogleDocs. In particular, I looked at the 40-character random-gobbledy-gook looking part after “/document/d/”. Call it “rgg”. Interestingly (to me, at least) each one is an NMTOKEN (in SGML parlance; in XML that is an Nmtoken (production [7]), or in W3C regex that means would match "\c{40}", or more precisely "[A-Za-z_-]{40}"). None are NAMES (or Names, production [5]), because they *all* start with the character ‘1’. Thus they all match the regular expression "1[A-Za-z_-]{39}". Using Unicode general category symbols, the frequency of character occurrences is as follows. 1409 Lu 1358 Ll 564 Nd 54 Pd 47 Pc Remember that each rgg starts with ‘1’. So if we factor those out, we get a frequency set of 1409 Lu [A-Z] 1358 Ll [a-z] 486 Nd [0-9] - ^1 78 I1 ^1 # I made that symbol up; it's for Initial 1 54 Pd - 47 Pc _ If characters are assigned randomly, we would expect the frequencies of Lu and Ll to be roughly equal, and roughly 2.6 times that of Nd; further, we would expect Pd and Pc (as single characters each) to occur 1/10th as often as Nd, and 1/26th as of Lu or Ll. And, on eyeballing them, those values are not far from expected. (For example, 54 * 26 is 1404, which falls in the 1358–1409 range.) But not all that close, either. (For example, 2.6 × the higher number for Nd is 1466, somewhat higher than Lu; but 2.6 × the lower number for Nd is 1264, somewhat lower than Ll.) But my guess is that another few hundred URLs would be needed to be more confident. (I am not a statistician, though.) Raw frequencies: 135 1 75 E 67 J 66 C 65 e 64 s 61 V 60 Z 58 4 58 B 58 D 58 h 58 M 58 n 57 d 57 g 57 G 57 k 57 l 56 b 56 P 56 Q 55 6 55 K 55 X 55 y 54 - 54 a 54 H 54 w 54 Y 53 A 53 j 53 t 53 v 53 x 52 L 52 N 51 f 49 0 49 8 49 q 49 W 48 7 48 F 48 p 48 U 47 _ 47 3 47 I 47 u 46 O 46 r 45 2 45 o 45 R 44 c 43 T 42 z 41 9 41 i 41 m 37 5 36 S OK. Back to what I should have been doing, which is review the minutes Martina asked me to look at yesterday.
participants (1)
-
Syd Bauman