earth.mws

>   

libname :=

savelibname :=

>   

===========================================

In this Worksheet, I discuss the derivation of the

Importance weights as function of the city population,

such that the label density on the screen reamins constant

at all distances.

===========================================

>   

>    with(plots):

Warning, the name changecoords has been redefined

>    with(stats): with(stats[statplots]):

The file 'people.txt' contains a 1-dim array of populations of the 41245 cities/towns in my huge data base (Gazetteer). Read it into Maple:

>    data:=readdata("people.txt",integer):

Calculate log base 10 for each element:

>    people:=evalf(map(log10,data)):

histogram plot:

>    ph:=histogram(people,area=count,axes=boxed,labels=["log10(population)","number of cities"],labeldirections=[horizontal,vertical]):

>    display(ph);

[Maple Plot]

Extract the numerical values

>    lprint(ph);

PLOT(POLYGONS([[0., 0], [0., 6.000000000], [.4433019370, 6.000000000], [.4433019370, 0]],[[.4433019370, 0], [.4433019370, 11.00000000], [.8866038740, 11.00000000], [.8866038740, 0]],[[.8866038740, 0], [.8866038740, 31.99999999], [1.329905811, 31.99999999], [1.329905811, 0]],[[1.329905811, 0], [1.329905811, 31.00000001], [1.773207748, 31.00000001], [1.773207748, 0]],[[1.773207748, 0], [1.773207748, 104.0000000], [2.216509685, 104.0000000], [2.216509685, 0]],[[2.216509685, 0], [2.216509685, 449.9999999], [2.659811622, 449.9999999], [2.659811622, 0]],[[2.659811622, 0], [2.659811622, 911.9999999], [3.103113559, 911.9999999], [3.103113559, 0]],[[3.103113559, 0], [3.103113559, 3465.000001], [3.546415496, 3465.000001], [3.546415496, 0]],[[3.546415496, 0], [3.546415496, 7563.999999], [3.989717433, 7563.999999], [3.989717433, 0]],[[3.989717433, 0], [3.989717433, 14508.00000], [4.433019370, 14508.00000], [4.433019370, 0]],[[4.433019370, 0], [4.433019370, 8753.999999], [4.876321307, 8753.999999], [4.876321307, 0]],[[4.876321307, 0], [4.876321307, 3689.000001], [5.319623244, 3689.000001], [5.319623244, 0]],[[5.319623244, 0], [5.319623244, 1188.000000], [5.762925181, 1188.000000], [5.762925181, 0]],[[5.762925181, 0], [5.762925181, 395.0000000], [6.206227118, 395.0000000], [6.206227118, 0]],[[6.206227118, 0], [6.206227118, 105.0000000], [6.649529055, 105.0000000], [6.649529055, 0]],[[6.649529055, 0], [6.649529055, 30.00000000], [7.092830992, 30.00000000], [7.092830992, 0]]),AXESLABELS("log10(population)","number of cities",FONT(DEFAULT),HORIZONTAL,VERTICAL),COLOR(RGB,.7,.7,1.0),AXESSTYLE(BOX),VIEW(-.7092830992 .. 7.802114091,DEFAULT))

List of x-values (log10(population))

>    dbx:=[.2216509685,.2216509685+.4433019370,.2216509685+2*.4433019370,.2216509685+3*.4433019370,.2216509685+4*.4433019370,.2216509685+5*.4433019370,.2216509685+6*.4433019370,.2216509685+7*.4433019370,.2216509685+8*.4433019370,.2216509685+9*.4433019370,4.654670338,5.097972275,5.541274210,5.984576150,6.427878085,6.871180025];

dbx := [.2216509685, .6649529055, 1.108254842, 1.551556780, 1.994858716, 2.438160654, 2.881462590, 3.324764528, 3.768066464, 4.211368402, 4.654670338, 5.097972275, 5.541274210, 5.984576150, 6.427878085...
dbx := [.2216509685, .6649529055, 1.108254842, 1.551556780, 1.994858716, 2.438160654, 2.881462590, 3.324764528, 3.768066464, 4.211368402, 4.654670338, 5.097972275, 5.541274210, 5.984576150, 6.427878085...

List of corresponding y-values (number of labels, i.e. cities/towns):

>    dby:=[6.0,11.0,32.0,31.0,104.0,450.0,912.0,3465.0,7564.0,14508.0, 8754.0, 3689.0, 1188.0, 395.0, 105.0, 30.0];

dby := [6.0, 11.0, 32.0, 31.0, 104.0, 450.0, 912.0, 3465.0, 7564.0, 14508.0, 8754.0, 3689.0, 1188.0, 395.0, 105.0, 30.0]

Compute the total number of towns in all 16 bins:

>    GG:=j->sum(dby[i],i=j..16);

GG := proc (j) options operator, arrow; sum(dby[i],i = j .. 16) end proc

>    for j from 1 to 16 do

>    GG(j);

>    od;

41244.0

41238.0

41227.0

41195.0

41164.0

41060.0

40610.0

39698.0

36233.0

28669.0

14161.0

5407.0

1718.0

530.0

135.0

30.0

Aha, they sum up to the total count of cities in the data base.

List of [x,y], suitable for pointplots:

>    dbxy:=[[.2216509685,6.0],[.6649529055,11.0],[1.108254842,32.0],[1.551556780,31.0],[1.994858716,104.0000000],[2.438160654,450.0],[2.881462590,912.0],[3.324764528,3465.0],[3.768066464,7564.0],[4.211368402, 14508.00000], [4.654670338, 8753.999999], [5.097972275, 3689.000001], [5.541274210, 1188.000000], [5.984576150, 395.0000000], [6.427878085, 105.0000000], [6.871180025, 30.00000000]];

dbxy := [[.2216509685, 6.0], [.6649529055, 11.0], [1.108254842, 32.0], [1.551556780, 31.0], [1.994858716, 104.0000000], [2.438160654, 450.0], [2.881462590, 912.0], [3.324764528, 3465.0], [3.768066464, ...
dbxy := [[.2216509685, 6.0], [.6649529055, 11.0], [1.108254842, 32.0], [1.551556780, 31.0], [1.994858716, 104.0000000], [2.438160654, 450.0], [2.881462590, 912.0], [3.324764528, 3465.0], [3.768066464, ...
dbxy := [[.2216509685, 6.0], [.6649529055, 11.0], [1.108254842, 32.0], [1.551556780, 31.0], [1.994858716, 104.0000000], [2.438160654, 450.0], [2.881462590, 912.0], [3.324764528, 3465.0], [3.768066464, ...
dbxy := [[.2216509685, 6.0], [.6649529055, 11.0], [1.108254842, 32.0], [1.551556780, 31.0], [1.994858716, 104.0000000], [2.438160654, 450.0], [2.881462590, 912.0], [3.324764528, 3465.0], [3.768066464, ...

x-values only up to maximum:

>    X:=[4.211368402,4.654670338,5.097972275,5.541274210,5.984576150,6.427878085,6.871180025];

X := [4.211368402, 4.654670338, 5.097972275, 5.541274210, 5.984576150, 6.427878085, 6.871180025]

corresponding y-values:

>    Y:=[14508.00000,8753.999999,3689.000001,1188.000000,395.0000000,105.0000000,30.00000000];

Y := [14508.00000, 8753.999999, 3689.000001, 1188.000000, 395.0000000, 105.0000000, 30.00000000]

Form suitable for pointplots:

>    XY:=[[4.211368402,14508.00000],[4.654670338,8753.999999],[5.097972275,3689.000001],[5.541274210,1188.000000],[5.984576150,395.0000000],[6.427878085,105.0000000],[6.871180025,30.00000000]];

XY := [[4.211368402, 14508.00000], [4.654670338, 8753.999999], [5.097972275, 3689.000001], [5.541274210, 1188.000000], [5.984576150, 395.0000000], [6.427878085, 105.0000000], [6.871180025, 30.00000000]...
XY := [[4.211368402, 14508.00000], [4.654670338, 8753.999999], [5.097972275, 3689.000001], [5.541274210, 1188.000000], [5.984576150, 395.0000000], [6.427878085, 105.0000000], [6.871180025, 30.00000000]...

Next, check for normal distribution for towns right of the peak ! :

================================================

After trial and error: Like in France, 500 people fits best as the true peak value!

>    X2:=evalf(map(x->(x-log10(500))^2,X));

X2 := [2.287348914, 3.824763796, 5.755211896, 8.078693199, 10.79520775, 13.90475548, 17.40733646]

>    Pop:=map(x->10^x,X);

Pop := [16269.28256, 45151.30821, 125306.1178, 347755.6622, 965108.5209, 2678416.335, 7433272.002]

form log10 of y-values (counts), to make the fit function linear in parameters:

>    Yl:=map(x->log10(x),Y);

Yl := [4.161607547, 3.942206542, 3.566908655, 3.074816441, 2.596597096, 2.021189299, 1.477121255]

Do a leastsquare fit to log(normal distribution):

>    fit[leastsquare[[x,y],y=a+b*x]]([X2,Yl]);

y = 4.589613571-.1818895205*x

>    p1:=logplot(XY,style=point,symbol=BOX,symbolsize=20,color=blue):

>    p2:=plot(4.589613571-.1818895205*(x-2.698970004)^2,x=2..7,color=red,thickness=2):

>    display({p1,p2},axes=boxed,labels=["log10( population )","Number of Labels ( Towns )"],labeldirections=[horizontal,vertical], title="Normal Distribution of Labels", titlefont=[HELVETICA,20],OPTS);

[Maple Plot]

Aha, works very well! Christophe is right in that there is a large sampling bias left of the peak.

>    pt:=plot(10^(4.589613571-.1818895205*(x-2.698970004)^2),x=4..7,color=red):

>    display({ph,pt},labels=["log10( population )","Number of Labels ( Towns )"],labeldirections=[horizontal,vertical], title="Normal Distribution of Labels around 500 people", titlefont=[HELVETICA,20],OPTS);

[Maple Plot]

Total number of towns on earth would then be the integral:

>    evalf(Int(10^(4.589613571-.1818895205*(x-2.698970004)^2),x=0..10));

105738.7643

-------------------------------------------------------------------------------------

Next, want to derive the Importance weights,

such that the label density on the monitor remains always constant!

-------------------------------------------------------------------------------------

Strategy:

=======

The label density = number*of*labels/(area*of*earth*on*screen) ,  area ~ (const/distance)^2 ,

Determine empirically on my 1600x1200 monitor the Importance weights

that just become visible at a range of distances d

It is a linear relation  as expected (see below). Thus the requirement is

that the number of labels (nLabels) at distance d,

------------------------------------------------------------------------------------------------

   nLabels*(importance*weight*just*visible*at*d)^2 = constant

------------------------------------------------------------------------------------------------

Moreover, we know from above that ,

nLabels = function(population)

Hence

if we assign the importance weights as:

+++++++++++++++++++++++++++++++++++++++++

importance*weight = constant/sqrt(nLabels(population))

+++++++++++++++++++++++++++++++++++++++++

our problem of expressing the weights as function of the known population

such as to keep the label density on the screen constant  is solved!

The only constant in the game is adjusted by requiring something like

20 labels visible at 40000km distance from earth, which is a good value.

Let's get quantitative:

>    distance:=[187,325,520,999,2085,6107,9835,15708,24880,33900];

distance := [187, 325, 520, 999, 2085, 6107, 9835, 15708, 24880, 33900]

>    importance:=[2.2,3.84,6.11,11.49,24.08,70.13,112.72,178.9,285.68,391];

importance := [2.2, 3.84, 6.11, 11.49, 24.08, 70.13, 112.72, 178.9, 285.68, 391]

>    distimp:=[[2.2,187],[3.84,325],[6.11,520],[11.49,999],[24.08,2085],[70.13,6107],[112.72,9835],[178.9,15708],[285.68,24880],[391,33900]];

distimp := [[2.2, 187], [3.84, 325], [6.11, 520], [11.49, 999], [24.08, 2085], [70.13, 6107], [112.72, 9835], [178.9, 15708], [285.68, 24880], [391, 33900]]
distimp := [[2.2, 187], [3.84, 325], [6.11, 520], [11.49, 999], [24.08, 2085], [70.13, 6107], [112.72, 9835], [178.9, 15708], [285.68, 24880], [391, 33900]]

>    q0:=pointplot(distimp,symbol=BOX,color=blue,symbolsize=20):

Again: least square fit of linear relation: min. distance <=> Importance weight

>    fit[leastsquare[[x,y],y=a+b*x]]([importance,distance]);

y = 14.82791092+86.91039073*x

>    q1:=plot(14.82791092+86.91039073*imp,imp=1..1000,color=red,thickness=2):

>    display({q0,q1},axes=boxed,labels=["Importance weight","min. distance [ km ], where visibility starts"], labeldirections=[horizontal,vertical]);

[Maple Plot]

Aha, an excellent fit!

 --------------------------

Calculate the log (base e) of the dby for the fit

>    dbylog:=map(x->log10(x),dby);

dbylog := [.7781512504, 1.041392685, 1.505149978, 1.491361694, 2.017033339, 2.653212514, 2.959994838, 3.539703239, 3.878751520, 4.161607547, 3.942206542, 3.566908655, 3.074816441, 2.596597096, 2.021189...
dbylog := [.7781512504, 1.041392685, 1.505149978, 1.491361694, 2.017033339, 2.653212514, 2.959994838, 3.539703239, 3.878751520, 4.161607547, 3.942206542, 3.566908655, 3.074816441, 2.596597096, 2.021189...

Shift abszissa to the peak position and take a most simple (integrable) form.

There are better ones, but this one can be analytically integrated

>    dbx2:=map(x->abs(x-4.211368402),dbx);

dbx2 := [3.989717434, 3.546415496, 3.103113560, 2.659811622, 2.216509686, 1.773207748, 1.329905812, .886603874, .443301938, 0., .443301936, .886603873, 1.329905808, 1.773207748, 2.216509683, 2.65981162...
dbx2 := [3.989717434, 3.546415496, 3.103113560, 2.659811622, 2.216509686, 1.773207748, 1.329905812, .886603874, .443301938, 0., .443301936, .886603873, 1.329905808, 1.773207748, 2.216509683, 2.65981162...

Fitting:

>    fit[leastsquare[[x,y],y=a+b*x]]([dbx2,dbylog]);

y = 4.249678825-.9327271384*x

Plotting the result:

>    pdb1:=logplot(dbxy,style=point,symbol=BOX,symbolsize=20,color=blue):

>    pdb2:=logplot(10^(4.249678825-.9327271384*abs(x-4.211368402)),x=0..8,color=red,thickness=2):

>    display({pdb1,pdb2},axes=boxed,labels=["log10( population )","log10( Number of Labels ( Towns )  )"], title="           Empirical Fit  of Label Distribution", titlefont=[HELVETICA,16],labeldirections=[horizontal,vertical],labelfont=[HELVETICA,14],OPTS);

[Maple Plot]

That's globally good enough! Next, since we want the total number of visible labels for a given population xt,

we must integrate from xt to 'infinity' (all labels corresponding to a higher population than xt are also visible!):

>    Int(10^(4.249678825-.9327271384*abs(x-4.211368402)),x=xt..infinity)=int(10^(4.249678825-.9327271384*abs(x-4.211368402)),x=xt..infinity);

Int(10^(4.249678825-.9327271384*abs(x-4.211368402)),x = xt .. infinity) = PIECEWISE([-.4656179328*10.^(.9327271384*xt+.3216212267), xt <= 4.211368402],[46561793.26*10.^(-.9327271384*xt+.1777364233)-165...
Int(10^(4.249678825-.9327271384*abs(x-4.211368402)),x = xt .. infinity) = PIECEWISE([-.4656179328*10.^(.9327271384*xt+.3216212267), xt <= 4.211368402],[46561793.26*10.^(-.9327271384*xt+.1777364233)-165...

Define a function from the result and divide by binwidth:

>    nLabels:=xt->1/.4433019370*(PIECEWISE([-.4656179328*10.^(.9327271384*xt+.3216212267), xt <= 4.211368402],[46561793.26*10.^(-.9327271384*xt+.1777364233)-16547.73353, 4.211368402 < xt])+16547.73353);

nLabels := proc (xt) options operator, arrow; 1/.4433019370*(PIECEWISE([-.4656179328*10.^(.9327271384*xt+.3216212267), xt <= 4.211368402],[46561793.26*10.^(-.9327271384*xt+.1777364233)-16547.73353, 4.2...
nLabels := proc (xt) options operator, arrow; 1/.4433019370*(PIECEWISE([-.4656179328*10.^(.9327271384*xt+.3216212267), xt <= 4.211368402],[46561793.26*10.^(-.9327271384*xt+.1777364233)-16547.73353, 4.2...

Let's see what the total number of labels becomes? Close to 41245?

>    nLabels(0);

37326.15561

OK, given the qualitative nature of the fit, it seems acceptable.

Plot the integrated number of totally visible labels vs. xt=log10(population):

>    logplot(nLabels(x),x=0..7);

[Maple Plot]

Next we calculate the Importance weights, as outlined above, from

>    expand(solve(nLabels=(c/(14.82791092+86.91039073*imp))^2,imp)[1]);

-.1706114861+.1150610407e-1/nLabels^(1/2)*c

c is the constant to be determined e.g.  from the requirement of seeing 20 labels (10/hemisphere) at a distance of 40000km:

>    evalf(solve(20=(c/40000)^2,c));

-178885.4382, 178885.4382

For general c, we get:

>    Imp:=evalf(subs(nLabels=nLabels(log10(pop)),-.1706114861+.1150610407e-1/nLabels^(1/2)*c));

Imp := -.1706114861+.1150610407e-1/(2.255798851*PIECEWISE([-.4656179328*10.^(.4050782493*ln(pop)+.3216212267), .4342944819*ln(pop) <= 4.211368402],[46561793.26*10.^(-.4050782493*ln(pop)+.1777364233)-16...
Imp := -.1706114861+.1150610407e-1/(2.255798851*PIECEWISE([-.4656179328*10.^(.4050782493*ln(pop)+.3216212267), .4342944819*ln(pop) <= 4.211368402],[46561793.26*10.^(-.4050782493*ln(pop)+.1777364233)-16...
Imp := -.1706114861+.1150610407e-1/(2.255798851*PIECEWISE([-.4656179328*10.^(.4050782493*ln(pop)+.3216212267), .4342944819*ln(pop) <= 4.211368402],[46561793.26*10.^(-.4050782493*ln(pop)+.1777364233)-16...
Imp := -.1706114861+.1150610407e-1/(2.255798851*PIECEWISE([-.4656179328*10.^(.4050782493*ln(pop)+.3216212267), .4342944819*ln(pop) <= 4.211368402],[46561793.26*10.^(-.4050782493*ln(pop)+.1777364233)-16...

>    evalf(ln(10)*4.211368402);

9.697034104

>    solve(nLabs=(C/40000)^2,C);

40000*nLabs^(1/2), -40000*nLabs^(1/2)

>    w1:=loglogplot(subs(c=40000*sqrt(20),Imp),pop=10..10000000,axes=boxed,labels=["Population","Importance Weight"],color=red,OPTS):

>    w2:=loglogplot(subs(c=40000*sqrt(40),Imp),pop=10..10000000,axes=boxed,labels=["Population","Importance Weight"],color=blue,OPTS):

>    w3:=loglogplot(subs(c=40000*sqrt(10),Imp),pop=10..10000000,axes=boxed,labels=["Population","Importance Weight"],color=green,OPTS):

>    display({w1,w2,w3});

[Maple Plot]

>   

-------------------------------------------------------------------------------------

This solves the problem, the above function is entered into my Perl script

which assigns the Importance weights accordingly!

--------------------------------------------------------------------------------------