Mars.mws

>   

libname :=

savelibname :=

file: Mars.mws

===========

7.3.2004

8.3.2004

------------------------------------------------------------------------

In this Worksheet, I discuss the derivation of the

Importance weights as function of the location size,

such that the label density on the screen remains constant

at all distances.

Typical applications will be the crowded locations on Mars and Venus

-------------------------------------------------------------------------

>    with(plots):

Warning, the name changecoords has been redefined

>    with(stats): with(stats[statplots]):

The file 'Mars_sizes.txt' contains a 1-dim array of (nonvanishing) sizes [km]  of the 1327 locations for Mars . Read it into Maple:

>    data:=readdata("Mars_sizes.txt",float):

Calculate log base 10 for each element:

>    size:=evalf(map(log10,data)):

Histogram plot of log10(size) distribution:

>    ph:=histogram(size,area=count,axes=boxed,labels=["log10(size)","number of labels"],labeldirections=[horizontal,vertical],OPTS):

>    display(ph);

[Maple Plot]

Extract the numerical values from histogram plot structure 'ph':

>    Binwidth:=op([1,2],ph)[2][1]-op([1,1],ph)[2][1];

Binwidth := .3546060421

>    dbx:=[];dby:=[];dbxy:=[];

dbx := []

dby := []

dbxy := []

>    for i from 1 to 12 do

>    dbx:=[op(dbx),op([1,i],ph)[2][1]+0.5*Binwidth];

>    dby:=[op(dby),op([1,i],ph)[2][2]];

>    dbxy:=[op(dbxy),[op([1,i],ph)[2][1]+0.5*Binwidth,op([1,i],ph)[2][2]]];

>    end do:

>    dbxy;

[[-.3455757243, 7.000000001], [.90303178e-2, 19.00000000], [.3636363599, 67.00000005], [.7182424017, 105.0000000], [1.072848444, 214.9999999], [1.427454486, 174.9999996], [1.782060529, 170.0000001], [2...
[[-.3455757243, 7.000000001], [.90303178e-2, 19.00000000], [.3636363599, 67.00000005], [.7182424017, 105.0000000], [1.072848444, 214.9999999], [1.427454486, 174.9999996], [1.782060529, 170.0000001], [2...
[[-.3455757243, 7.000000001], [.90303178e-2, 19.00000000], [.3636363599, 67.00000005], [.7182424017, 105.0000000], [1.072848444, 214.9999999], [1.427454486, 174.9999996], [1.782060529, 170.0000001], [2...

>    dbx;

[-.3455757243, .90303178e-2, .3636363599, .7182424017, 1.072848444, 1.427454486, 1.782060529, 2.136666571, 2.491272613, 2.845878655, 3.200484697, 3.555090739]
[-.3455757243, .90303178e-2, .3636363599, .7182424017, 1.072848444, 1.427454486, 1.782060529, 2.136666571, 2.491272613, 2.845878655, 3.200484697, 3.555090739]

>    dby;

[7.000000001, 19.00000000, 67.00000005, 105.0000000, 214.9999999, 174.9999996, 170.0000001, 238.0000001, 139.0000000, 119.0000000, 53.00000001, 20.00000001]
[7.000000001, 19.00000000, 67.00000005, 105.0000000, 214.9999999, 174.9999996, 170.0000001, 238.0000001, 139.0000000, 119.0000000, 53.00000001, 20.00000001]

Compute the total number of non-vanishing location sizes in all 12 bins:

>    GG:=j->sum(dby['i'],'i'=j..12);

GG := proc (j) options operator, arrow; sum(dby['i'],('i') = j .. 12) end proc

>    for j from 1 to 12 do

>    GG(j);

>    od:

>    GG(1);

1327.000000

Aha, they sum up to the total count of (non-vanishing) location sizes in the data base.

Test on a Normal distribution of log10(size) around log10(s0) ,

>    assume(v>0);

>    dnLabels:=sqrt(v/Pi)*binwidth*nLabels_tot*exp(-v*(x-log10(s0))^2);

dnLabels := (v/Pi)^(1/2)*binwidth*nLabels_tot*exp(-v*(x-ln(s0)/ln(10))^2)

>    int(dnLabels/binwidth,x=-infinity..infinity);

nLabels_tot

OK, the distribution correctly  integrates to the total number

of Mars locations = 1327 with non-vanishing size

Form ln of y-values (counts), to make the fit function linear in parameters:

>    YL:=map(x->ln(x),dby);

YL := [1.945910149, 2.944438979, 4.204692620, 4.653960350, 5.370638028, 5.164785972, 5.135798438, 5.472270674, 4.934473933, 4.779123493, 3.970291914, 2.995732274]
YL := [1.945910149, 2.944438979, 4.204692620, 4.653960350, 5.370638028, 5.164785972, 5.135798438, 5.472270674, 4.934473933, 4.779123493, 3.970291914, 2.995732274]

Expand the Gaussian exponent and substitute parameters to generate the required linear dependence:

>    combine(collect(simplify(ln(subs(log10(s0)=x0,dnLabels)),symbolic),[x],factor));

-v*x^2+2*v*x*x0-v*x0^2+ln(1/Pi^(1/2)*v^(1/2)*binwidth)+ln(nLabels_tot)

>    Y:=y=subs(v*x0^2=-C+ln(1/Pi^(1/2)*v^(1/2)*binwidth)+ln(nLabels_tot),x0=d/(2*v),combine(collect(simplify(ln(subs(log10(s0)=x0,dnLabels)),symbolic),[x],factor)));

Y := y = -v*x^2+x*d+C

Do a leastsquare fit of the size data to a Normal distribution:

>    w:=fit[leastsquare[ [x,y],Y] ]([dbx,YL]);

w := y = -.7633314738*x^2+2.711025300*x+3.056731711

solve for the original parameters (x0, nLabels_tot, v):

>    eq0:=coeff(rhs(w),x,2)=-v;

eq0 := -.7633314738 = -v

>    eq1:=coeff(rhs(w),x,1)=2*v*x0;

eq1 := 2.711025300 = 2*v*x0

>    eq2:=subs(binwidth=Binwidth,coeff(rhs(w),x,0)=-v*x0^2+ln(1/Pi^(1/2)*v^(1/2)*binwidth)+ln(nLabels_tot));

eq2 := 3.056731711 = -v*x0^2+ln(.3546060421/Pi^(1/2)*v^(1/2))+ln(nLabels_tot)

>    ccc:=solve({eq0,eq1,eq2},{v,x0,nLabels_tot});

ccc := {nLabels_tot = 1350.155577, x0 = 1.775785090, v = .7633314738}

The average location size on Mars is [km]:

>    S0:=evalf(10^1.775785090);

S0 := 59.67399176

>    pt:=plot(exp(rhs(w)),x=-1..4.5,color=red,thickness=2):

>    display({ph,pt},labels=["log10( size )","Number of Labels"],labeldirections=[horizontal,vertical], title="Normal Distribution of log10(size) around log10(s0=59.67)", titlefont=[HELVETICA,20],OPTS);

[Maple Plot]

OK, not perfect, but quite well compatible with a Normal distribution...

Except for the top bins, the fit goes nicely through the centers of the bins in the

left and right tails of the Normal distribution!

-------------------------------------------------------------------------------------

Next, want to derive the Importance weights I,

such that the label density on the monitor remains always constant!

-------------------------------------------------------------------------------------

Strategy:

=======

i) Let   nLabels  = ( number of visible labels ) at distance d  of our object (Mars, Venus,...),

   having an  area A(d) = (const/d)^2 on screen in [pix^2].

 

   =============================================

   Require that the visible label density is about constant

   at all distances d [FoV's] of our object, i.e  

   

    nLabels/A(d) = const*nLabels*d^2  = constant

   =============================================

 

ii) For the given monitor resolution, and a range of 'importance weights I',

    determine empirically the distances d = d_vis(I) of our object, for which

    the associated labels just become visible .

    It is a linear relation  as expected (see below).

   

    d_vis = 14.8 +86.9*I  [km]

  

   Thus the requirement of a constant label density turns into a formula

   for the importance weights I

  

    I = const/sqrt(nLabels)-14.8/86.9

iii) On Earth I calculated nLabels = nLabels(population)  from

    the known data on city populations. For Mars, Venus,...

    we may as well take the number-distribution of the

    location sizes .

   Above we obtained approximately a Normal distribution

    nLabels = Normal(log10(size))

   around an average location size of s0 = 59.67 km.

iv) We may feed this in and determine the only unknown constant

     by requiring a convenient number of visible labels at a certain

     distance of the object. E.g.  for Earth, 10/hemisphere   at a distance

     of 40000 km.

-------------------------------------------------------------------------

 

Our problem of expressing the weights as function of the known location sizes

such as to keep the label density on the screen constant,  is solved!

Let's get quantitative:

>    distance:=[187,325,520,999,2085,6107,9835,15708,24880,33900];

distance := [187, 325, 520, 999, 2085, 6107, 9835, 15708, 24880, 33900]

>    importance:=[2.2,3.84,6.11,11.49,24.08,70.13,112.72,178.9,285.68,391];

importance := [2.2, 3.84, 6.11, 11.49, 24.08, 70.13, 112.72, 178.9, 285.68, 391]

>    distimp:=[[2.2,187],[3.84,325],[6.11,520],[11.49,999],[24.08,2085],[70.13,6107],[112.72,9835],[178.9,15708],[285.68,24880],[391,33900]];

distimp := [[2.2, 187], [3.84, 325], [6.11, 520], [11.49, 999], [24.08, 2085], [70.13, 6107], [112.72, 9835], [178.9, 15708], [285.68, 24880], [391, 33900]]

>    q0:=pointplot(distimp,symbol=BOX,color=blue,symbolsize=20):

Again: least square fit of linear relation: min. distance <=> Importance weight

>    fit[leastsquare[[x,y],y=a+b*x]]([importance,distance]);

y = 14.82791092+86.91039073*x

>    q1:=plot(14.82791092+86.91039073*imp,imp=1..1000,color=red,thickness=2):

>    display({q0,q1},axes=boxed,labels=["Importance weight","min. distance [ km ], where visibility starts"], labeldirections=[horizontal,vertical]);

[Maple Plot]

Aha, an excellent fit!

 --------------------------

Next, since we want the total number of visible labels for a given log10(st)= xt,

we must divide by the binwidth and integrate from xt to 'infinity' (all labels corresponding to a bigger size  than xt are also visible!):

>    Int(exp(rhs(w))/Binwidth,x=xt..infinity)=int(exp(rhs(w))/Binwidth,x=xt..infinity);

Int(2.820030911*exp(-.7633314738*x^2+2.711025300*x+3.056731711),x = xt .. infinity) = -675.0777883*erf(.8736884307*xt-1.551482889)+675.0777883

Define a function from the result:

>    nLabels:=xt->evalf(-675.0777883*erf(.8736884307*xt-1.551482889)+675.0777883);

nLabels := proc (xt) options operator, arrow; evalf(-675.0777883*erf(.8736884307*xt-1.551482889)+675.0777883) end proc

Let's see what the total number of labels becomes? Close to 1327?

>    nLabels(-infinity);

1350.155577

YES, indeed, it's not at all bad, compared to the exact value of 1327!

Plot the integrated number of totally visible labels vs. xt=log10(st):

>    plot(nLabels(x),x=-1..4,OPTS,labels=["xt=log10(st)","number of  visible labels for sizes >xt"]);

[Maple Plot]

Next we calculate the Importance weights, as outlined above, from

>    Importance:=expand(solve(nLabels=(c/(14.82791092+86.91039073*imp))^2,imp)[1]);

Importance := -.1706114861+.1150610407e-1/nLabels^(1/2)*c

c is the constant to be determined e.g.  from the requirement of seeing 10 labels (5/hemisphere) at a distance of 40000km:

For general c, we get:

>    solve(nLabs=(C/40000)^2,C)[1];

40000*nLabs^(1/2)

====================================Final Result =================================================

>    Imp:=evalf(subs(nLabels=nLabels(log10(s)),c=40000*nLabs^(1/2),Importance));

Imp := -.1706114861+460.2441628/(-675.0777883*erf(.3794380644*ln(s)-1.551482889)+675.0777883)^(1/2)*nLabs^(1/2)

==============================================================================================

>    w1:=loglogplot(subs(nLabs=10,Imp),s=0.1..10000,axes=boxed,labels=["Size [km]","Importance Weight"],color=red,OPTS,numpoints=5000):

>    w2:=loglogplot(subs(nLabs=20,Imp),s=0.1..10000,axes=boxed,labels=["Size [km]","Importance Weight"],color=blue,numpoints=5000,OPTS):

>    w3:=loglogplot(subs(nLabs=5,Imp),s=0.1..10000,axes=boxed,labels=["Size [km]","Importance Weight"],color=green,numpoints=5000,OPTS):

>    display({w1,w2,w3});

[Maple Plot]

>   

-------------------------------------------------------------------------------------

This solves the problem, the above function is entered into my Perl script

which assigns the Importance weights accordingly!

--------------------------------------------------------------------------------------