| > |
| > |
===========================================
In this Worksheet, I discuss the derivation of the
Importance weights as function of the city population,
such that the label density on the screen reamins constant
at all distances.
===========================================
| > |
| > | with(plots): |
Warning, the name changecoords has been redefined
| > | with(stats): with(stats[statplots]): |
The file 'people.txt' contains a 1-dim array of populations of the 41245 cities/towns in my huge data base (Gazetteer). Read it into Maple:
| > | data:=readdata("people.txt",integer): |
Calculate log base 10 for each element:
| > | people:=evalf(map(log10,data)): |
histogram plot:
| > | ph:=histogram(people,area=count,axes=boxed,labels=["log10(population)","number of cities"],labeldirections=[horizontal,vertical]): |
| > | display(ph); |
Extract the numerical values
| > | lprint(ph); |
PLOT(POLYGONS([[0., 0], [0., 6.000000000], [.4433019370, 6.000000000], [.4433019370, 0]],[[.4433019370, 0], [.4433019370, 11.00000000], [.8866038740, 11.00000000], [.8866038740, 0]],[[.8866038740, 0], [.8866038740, 31.99999999], [1.329905811, 31.99999999], [1.329905811, 0]],[[1.329905811, 0], [1.329905811, 31.00000001], [1.773207748, 31.00000001], [1.773207748, 0]],[[1.773207748, 0], [1.773207748, 104.0000000], [2.216509685, 104.0000000], [2.216509685, 0]],[[2.216509685, 0], [2.216509685, 449.9999999], [2.659811622, 449.9999999], [2.659811622, 0]],[[2.659811622, 0], [2.659811622, 911.9999999], [3.103113559, 911.9999999], [3.103113559, 0]],[[3.103113559, 0], [3.103113559, 3465.000001], [3.546415496, 3465.000001], [3.546415496, 0]],[[3.546415496, 0], [3.546415496, 7563.999999], [3.989717433, 7563.999999], [3.989717433, 0]],[[3.989717433, 0], [3.989717433, 14508.00000], [4.433019370, 14508.00000], [4.433019370, 0]],[[4.433019370, 0], [4.433019370, 8753.999999], [4.876321307, 8753.999999], [4.876321307, 0]],[[4.876321307, 0], [4.876321307, 3689.000001], [5.319623244, 3689.000001], [5.319623244, 0]],[[5.319623244, 0], [5.319623244, 1188.000000], [5.762925181, 1188.000000], [5.762925181, 0]],[[5.762925181, 0], [5.762925181, 395.0000000], [6.206227118, 395.0000000], [6.206227118, 0]],[[6.206227118, 0], [6.206227118, 105.0000000], [6.649529055, 105.0000000], [6.649529055, 0]],[[6.649529055, 0], [6.649529055, 30.00000000], [7.092830992, 30.00000000], [7.092830992, 0]]),AXESLABELS("log10(population)","number of cities",FONT(DEFAULT),HORIZONTAL,VERTICAL),COLOR(RGB,.7,.7,1.0),AXESSTYLE(BOX),VIEW(-.7092830992 .. 7.802114091,DEFAULT))
List of x-values (log10(population))
| > | dbx:=[.2216509685,.2216509685+.4433019370,.2216509685+2*.4433019370,.2216509685+3*.4433019370,.2216509685+4*.4433019370,.2216509685+5*.4433019370,.2216509685+6*.4433019370,.2216509685+7*.4433019370,.2216509685+8*.4433019370,.2216509685+9*.4433019370,4.654670338,5.097972275,5.541274210,5.984576150,6.427878085,6.871180025]; |
List of corresponding y-values (number of labels, i.e. cities/towns):
| > | dby:=[6.0,11.0,32.0,31.0,104.0,450.0,912.0,3465.0,7564.0,14508.0, 8754.0, 3689.0, 1188.0, 395.0, 105.0, 30.0]; |
Compute the total number of towns in all 16 bins:
| > | GG:=j->sum(dby[i],i=j..16); |
| > | for j from 1 to 16 do |
| > | GG(j); |
| > | od; |
Aha, they sum up to the total count of cities in the data base.
List of [x,y], suitable for pointplots:
| > | dbxy:=[[.2216509685,6.0],[.6649529055,11.0],[1.108254842,32.0],[1.551556780,31.0],[1.994858716,104.0000000],[2.438160654,450.0],[2.881462590,912.0],[3.324764528,3465.0],[3.768066464,7564.0],[4.211368402, 14508.00000], [4.654670338, 8753.999999], [5.097972275, 3689.000001], [5.541274210, 1188.000000], [5.984576150, 395.0000000], [6.427878085, 105.0000000], [6.871180025, 30.00000000]]; |
x-values only up to maximum:
| > | X:=[4.211368402,4.654670338,5.097972275,5.541274210,5.984576150,6.427878085,6.871180025]; |
corresponding y-values:
| > | Y:=[14508.00000,8753.999999,3689.000001,1188.000000,395.0000000,105.0000000,30.00000000]; |
Form suitable for pointplots:
| > | XY:=[[4.211368402,14508.00000],[4.654670338,8753.999999],[5.097972275,3689.000001],[5.541274210,1188.000000],[5.984576150,395.0000000],[6.427878085,105.0000000],[6.871180025,30.00000000]]; |
Next, check for normal distribution for towns right of the peak ! :
================================================
After trial and error: Like in France, 500 people fits best as the true peak value!
| > | X2:=evalf(map(x->(x-log10(500))^2,X)); |
| > | Pop:=map(x->10^x,X); |
form log10 of y-values (counts), to make the fit function linear in parameters:
| > | Yl:=map(x->log10(x),Y); |
Do a leastsquare fit to log(normal distribution):
| > | fit[leastsquare[[x,y],y=a+b*x]]([X2,Yl]); |
| > | p1:=logplot(XY,style=point,symbol=BOX,symbolsize=20,color=blue): |
| > | p2:=plot(4.589613571-.1818895205*(x-2.698970004)^2,x=2..7,color=red,thickness=2): |
| > | display({p1,p2},axes=boxed,labels=["log10( population )","Number of Labels ( Towns )"],labeldirections=[horizontal,vertical], title="Normal Distribution of Labels", titlefont=[HELVETICA,20],OPTS); |
Aha, works very well! Christophe is right in that there is a large sampling bias left of the peak.
| > | pt:=plot(10^(4.589613571-.1818895205*(x-2.698970004)^2),x=4..7,color=red): |
| > | display({ph,pt},labels=["log10( population )","Number of Labels ( Towns )"],labeldirections=[horizontal,vertical], title="Normal Distribution of Labels around 500 people", titlefont=[HELVETICA,20],OPTS); |
Total number of towns on earth would then be the integral:
| > | evalf(Int(10^(4.589613571-.1818895205*(x-2.698970004)^2),x=0..10)); |
-------------------------------------------------------------------------------------
Next, want to derive the Importance weights,
such that the label density on the monitor remains always constant!
-------------------------------------------------------------------------------------
Strategy:
=======
The label density =
, area ~
,
Determine empirically on my 1600x1200 monitor the Importance weights
that just become visible at a range of distances d
It is a linear relation as expected (see below). Thus the requirement is
that the number of labels (nLabels) at distance d,
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
Moreover, we know from above that ,
Hence
if we assign the importance weights as:
+++++++++++++++++++++++++++++++++++++++++
+++++++++++++++++++++++++++++++++++++++++
our problem of expressing the weights as function of the known population
such as to keep the label density on the screen constant is solved!
The only constant in the game is adjusted by requiring something like
20 labels visible at 40000km distance from earth, which is a good value.
Let's get quantitative:
| > | distance:=[187,325,520,999,2085,6107,9835,15708,24880,33900]; |
| > | importance:=[2.2,3.84,6.11,11.49,24.08,70.13,112.72,178.9,285.68,391]; |
| > | distimp:=[[2.2,187],[3.84,325],[6.11,520],[11.49,999],[24.08,2085],[70.13,6107],[112.72,9835],[178.9,15708],[285.68,24880],[391,33900]]; |
| > | q0:=pointplot(distimp,symbol=BOX,color=blue,symbolsize=20): |
Again: least square fit of linear relation: min. distance <=> Importance weight
| > | fit[leastsquare[[x,y],y=a+b*x]]([importance,distance]); |
| > | q1:=plot(14.82791092+86.91039073*imp,imp=1..1000,color=red,thickness=2): |
| > | display({q0,q1},axes=boxed,labels=["Importance weight","min. distance [ km ], where visibility starts"], labeldirections=[horizontal,vertical]); |
Aha, an excellent fit!
--------------------------
Calculate the log (base e) of the dby for the fit
| > | dbylog:=map(x->log10(x),dby); |
Shift abszissa to the peak position and take a most simple (integrable) form.
There are better ones, but this one can be analytically integrated
| > | dbx2:=map(x->abs(x-4.211368402),dbx); |
Fitting:
| > | fit[leastsquare[[x,y],y=a+b*x]]([dbx2,dbylog]); |
Plotting the result:
| > | pdb1:=logplot(dbxy,style=point,symbol=BOX,symbolsize=20,color=blue): |
| > | pdb2:=logplot(10^(4.249678825-.9327271384*abs(x-4.211368402)),x=0..8,color=red,thickness=2): |
| > | display({pdb1,pdb2},axes=boxed,labels=["log10( population )","log10( Number of Labels ( Towns ) )"], title=" Empirical Fit of Label Distribution", titlefont=[HELVETICA,16],labeldirections=[horizontal,vertical],labelfont=[HELVETICA,14],OPTS); |
That's globally good enough! Next, since we want the total number of visible labels for a given population xt,
we must integrate from xt to 'infinity' (all labels corresponding to a higher population than xt are also visible!):
| > | Int(10^(4.249678825-.9327271384*abs(x-4.211368402)),x=xt..infinity)=int(10^(4.249678825-.9327271384*abs(x-4.211368402)),x=xt..infinity); |
Define a function from the result and divide by binwidth:
| > | nLabels:=xt->1/.4433019370*(PIECEWISE([-.4656179328*10.^(.9327271384*xt+.3216212267), xt <= 4.211368402],[46561793.26*10.^(-.9327271384*xt+.1777364233)-16547.73353, 4.211368402 < xt])+16547.73353); |
Let's see what the total number of labels becomes? Close to 41245?
| > | nLabels(0); |
OK, given the qualitative nature of the fit, it seems acceptable.
Plot the integrated number of totally visible labels vs. xt=log10(population):
| > | logplot(nLabels(x),x=0..7); |
Next we calculate the Importance weights, as outlined above, from
| > | expand(solve(nLabels=(c/(14.82791092+86.91039073*imp))^2,imp)[1]); |
c is the constant to be determined e.g. from the requirement of seeing 20 labels (10/hemisphere) at a distance of 40000km:
| > | evalf(solve(20=(c/40000)^2,c)); |
For general c, we get:
| > | Imp:=evalf(subs(nLabels=nLabels(log10(pop)),-.1706114861+.1150610407e-1/nLabels^(1/2)*c)); |
| > | evalf(ln(10)*4.211368402); |
| > | solve(nLabs=(C/40000)^2,C); |
| > | w1:=loglogplot(subs(c=40000*sqrt(20),Imp),pop=10..10000000,axes=boxed,labels=["Population","Importance Weight"],color=red,OPTS): |
| > | w2:=loglogplot(subs(c=40000*sqrt(40),Imp),pop=10..10000000,axes=boxed,labels=["Population","Importance Weight"],color=blue,OPTS): |
| > | w3:=loglogplot(subs(c=40000*sqrt(10),Imp),pop=10..10000000,axes=boxed,labels=["Population","Importance Weight"],color=green,OPTS): |
| > | display({w1,w2,w3}); |
| > |
-------------------------------------------------------------------------------------
This solves the problem, the above function is entered into my Perl script
which assigns the Importance weights accordingly!
--------------------------------------------------------------------------------------