Shallows: 5 - 2 - Poss Redo, Only 98

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Shallows​ ​preference:​ ​9,​ ​14,​ ​3,​ ​13,​ ​6

Report:
Download​ ​papers​ ​to​ ​write​ ​related​ ​work​ ​areas
Charge​ ​laptop/phone​ ​fully

Send​ ​wendy​ ​research​ ​prop


Do​ ​new​ ​loss​ ​-​ ​one​ ​that​ ​shows​ ​high/100%​ ​accurate?​ ​-​ ​recall​ ​-​ ​proportion​ ​in​ ​top​ ​layer​ ​only?

Explain​ ​why​ ​it​ ​is​ ​blobby​ ​-​ ​local​ ​peakiness,​ ​std​ ​will​ ​be​ ​less​ ​for​ ​lines

Shallow9​ ​process​ ​quant​ ​analysis​ ​148306

Report
Report​ ​nets:
Just​ ​static​ ​images
Run​ ​g,​ ​b_net_script1_*
Ch​ ​xent​ ​(alone,​ ​sp​ ​xent,​ ​sp​ ​one​ ​hot​ ​xent,​ ​sp​ ​min​ ​var)

Run​ ​d,​ ​b_net_script2_*


Ch​ ​one​ ​hot​ ​xent​ ​(alone,​ ​sp​ ​xent,​ ​sp​ ​one​ ​hot​ ​xent,​ ​sp​ ​min​ ​var)

Run​ ​d,​ ​b_net_script5_*


Ch​ ​orthogonal​ ​(alone,​ ​sp​ ​xent,​ ​sp​ ​one​ ​hot​ ​xent,​ ​sp​ ​min​ ​var)

Movies,​ ​movement​ ​filter


Run​ ​g,​ ​movie_net5_*
Ch​ ​xent​ ​(alone,​ ​sp​ ​xent,​ ​sp​ ​one​ ​hot​ ​xent,​ ​sp​ ​min​ ​var)
5_2​ ​-​ ​poss​ ​redo,​ ​only​ ​98

Run​ ​f,​ ​movie_net6_*


Ch​ ​one​ ​hot​ ​xent​ ​(alone,​ ​sp​ ​xent,​ ​sp​ ​one​ ​hot​ ​xent,​ ​sp​ ​min​ ​var)

Run​ ​f,​ ​movie_net7_*


Ch​ ​orthogonal​ ​(alone,​ ​sp​ ​xent,​ ​sp​ ​one​ ​hot​ ​xent,​ ​sp​ ​min​ ​var)

Movies,​ ​trans​ ​filter


Run​ ​h,​ ​grey_net1_*
1_4​ ​-​ ​poss​ ​redo,​ ​spatial​ ​goes​ ​up

Run​ ​h,​ ​grey_net2_*

Run​ ​h,​ ​grey_net3_*


Movie
1​ ​-​ ​supervised​ ​segmentation​ ​with​ ​1​ ​conv​ ​layer,​ ​simply​ ​learning​ ​whatever​ ​moves​ ​+​ ​sparsity----

Run​ ​a

For​ ​8
Movie_net1:​ ​no​ ​l2,​ ​sum​ ​encourage,​ ​sh9​ ​gpu3​ ​117845
Movie_net2:​ ​no​ ​l2,​ ​one_hot_xent,​ ​sh9​ ​gpu4​ ​117959​ ​(should​ ​have​ ​worked​ ​but​ ​didn’t​ ​bc​ ​exploding
gradients​ ​made​ ​filters​ ​huge)
Movie_net3:​ ​yes​ ​l2,​ ​sum​ ​encourage,​ ​sh13​ ​gpu2​ ​78387
Movie_net4:​ ​yes​ ​l2,​ ​one​ ​hot​ ​xent,​ ​sh13​ ​gpu3​ ​78499​ ​(worked​ ​-​ ​difference​ ​-​ ​normaliser)

Run​ ​b
For​ ​2
Movie_net2:​ ​no​ ​l2,​ ​one_hot_xent,​ ​sh9​ ​gpu3​ ​120572
Movie_net4:​ ​l2,​ ​one_hot_xent,​ ​sh9​ ​gpu4​ ​120139

2​ ​-​ ​with​ ​existing​ ​nets,​ ​to​ ​remove​ ​background​ ​being​ ​learned​ ​--------------------------------------
Sparse​ ​things​ ​that​ ​occur​ ​frequently​ ​->​ ​sparse​ ​things​ ​that​ ​move​ ​that​ ​occur​ ​frequently

Run​ ​e
For​ ​existing​ ​net​ ​structures
Movie_net5:​ ​b_net_script1​ ​but​ ​with​ ​mvt​ ​filter,​ ​sh13​ ​gpu2​ ​43597
Movie_net6:​ ​b_net_script2​ ​ ​^,​ ​sh13​ ​gpu3​ ​43732
Movie_net7:​ ​b_net_script5​ ​^,​ ​sh14​ ​gpu3​ ​81133

Heatmaps​ ​became​ ​very​ ​sparse​ ​(y)


Validation​ ​data​ ​not​ ​plotted
Constants​ ​for​ ​ch_xent,​ ​sp_one_hot​ ​were​ ​too​ ​high​ ​for​ ​sp,​ ​so​ ​ch_xent​ ​never​ ​changed​ ​-​ ​redo​ ​all​ ​constants?

Run​ ​f
With​ ​better​ ​constants
Channel​ ​xent,​ ​spatial​ ​xent,​ ​ ​2.35e-09
Channel​ ​xent​ ​->​ ​Channel​ ​softmax:​ ​ ​ ​ ​1.2799130e-08
Spatial​ ​xent:​ ​ ​ ​ ​5.4544282

Channel​ ​xent,​ ​spatial​ ​one​ ​hot​ ​xent,​ ​6.66e-12


Channel​ ​xent​ ​->​ ​Spatial​ ​softmax:​ ​1.3924083e-12
Spatial​ ​one​ ​hot​ ​xent:​ ​0.2089325

Channel​ ​xent,​ ​spatial​ ​min​ ​var,​ ​1.14e-09


Channel​ ​xent​ ​->​ ​channel​ ​softmax:​ ​ ​ ​ ​1.3007988e-08
Spatial​ ​min​ ​var:​ ​ ​ ​11.4543428
----
Channel​ ​one​ ​hot​ ​xent,​ ​spatial​ ​xent,​ ​0.0324
Channel​ ​one​ ​hot​ ​xent:​ ​ ​ ​ ​0.1769506
Spatial​ ​xent:​ ​ ​ ​ ​5.4544282

Channel​ ​one​ ​hot​ ​xent,​ ​spatial​ ​one​ ​hot​ ​xent,​ ​4.44e-05


Channel​ ​one​ ​hot​ ​xent​ ​->​ ​spatial​ ​softmax:​ ​ ​ ​ ​6.3367952e-06
Spatial​ ​one​ ​hot​ ​xent:​ ​ ​ ​ ​0.1428607

Channel​ ​one​ ​hot​ ​xent,​ ​spatial​ ​min​ ​var,​ ​0.0154


Channel​ ​one​ ​hot​ ​xent:​ ​ ​ ​0.1767074
Spatial​ ​min​ ​var:​ ​ ​ ​11.4543428
----
Channel​ ​orthogonal,​ ​spatial​ ​xent,​ ​0.165
Channel​ ​orthogonal:​ ​ ​0.9008957
Spatial​ ​xent:​ ​ ​ ​5.4544277

Channel​ ​orthogonal,​ ​spatial​ ​one​ ​hot​ ​xent,​ ​1.34e-04


Channel​ ​orthogonal​ ​->​ ​spatial​ ​softmax:​ ​ ​ ​ ​6.8951454e-06
Spatial​ ​one​ ​hot​ ​xent:​ ​ ​ ​ ​0.0515237

Channel​ ​orthogonal,​ ​spatial​ ​min​ ​var,​ ​0.0783


Channel​ ​orthogonal:​ ​ ​ ​ ​0.8967892
Spatial​ ​min​ ​var:​ ​ ​ ​11.4543428

Movie_net5​ ​(channel​ ​xent)​ ​sh9​ ​gpu3​ ​124473​ ​-​ ​Channel​ ​xent​ ​still​ ​not​ ​working
Movie_net6​ ​(channel​ ​one​ ​hot​ ​xent)​ ​sh12​ ​gpu1,​ ​131275
Movie_net7​ ​(channel​ ​orthog)​ ​sh13​ ​gpu4​ ​71823

Run​ ​g
Make​ ​channel​ ​loss​ ​a​ ​fn​ ​of​ ​m1​ ​instead​ ​of​ ​m2​ ​(no​ ​sp​ ​softmax)
Shallow​ ​9​ ​Movie_net5​ ​(channel​ ​xent)​ ​35088,​ ​3,​ ​2,​ ​4,​ ​1
Channel​ ​xent​ ​->​ ​ch​ ​softmax:​ ​ ​ ​ ​3.3714298e-07
Sp​ ​xent​ ​->​ ​sp​ ​softmax:​ ​ ​ ​ ​8.8479215e-09​ ​(*​ ​38.1)

Channel​ ​xent​ ​->​ ​ch​ ​softmax:​ ​ ​ ​ ​3.1880083e-07


Sp​ ​one​ ​hot​ ​xent:​ ​ ​ ​ ​0.0571243​ ​(*​ ​5.58e-06)

Channel​ ​xent​ ​->​ ​ch​ ​softmax:​ ​ ​ ​ ​2.3704401e-07


Sp​ ​min​ ​var​ ​->​ ​sp​ ​softmax:​ ​ ​ ​ ​0.0032277​ ​(*​ ​7.34e-05)

Shallow​ ​9​ ​B_net_script1​ ​(channel​ ​xent)​ ​35010,​ ​3,​ ​2,​ ​4,​ ​1


Channel​ ​xent​ ​(ch​ ​sm)​ ​ ​ ​5.5253104e-06
Sp​ ​xent​ ​(sp​ ​sm)​ ​ ​ ​ ​5.3376596e-08​ ​(*​ ​1.04e+02)

Channel​ ​xent​ ​ ​(ch​ ​sm)​ ​ ​1.6892529e-05


Sp​ ​one​ ​hot​ ​xent​ ​ ​0.0307308​ ​(*​ ​5.50e-04)

Channel​ ​xent​ ​(ch​ ​sm)​ ​ ​ ​ ​3.6484821e-06


Sp​ ​min​ ​var​ ​(sp​ ​sm)​ ​ ​ ​ ​0.0032292​ ​(*​ ​0.00113)

Run​ ​h
(No​ ​spatial​ ​before​ ​channel)
Channel​ ​xent,​ ​spatial​ ​xent,​ ​ ​48.4
Channel​ ​xent​ ​->​ ​Channel​ ​softmax:​ ​ ​ ​ ​5.0673049e-07
Spatial​ ​xent​ ​->​ ​sp​ ​softmax:​ ​ ​ ​ ​1.0474341e-08

Channel​ ​xent,​ ​spatial​ ​one​ ​hot​ ​xent,​ ​2.06e-05


Channel​ ​xent​ ​->​ ​channel​ ​softmax:​ ​ ​ ​ ​1.1555122e-06
Spatial​ ​one​ ​hot​ ​xent:​ ​ ​ ​ ​0.0560629

Channel​ ​xent,​ ​spatial​ ​min​ ​var,​ ​1.67e-04


Channel​ ​xent​ ​->​ ​channel​ ​softmax:​ ​ ​ ​ ​ ​5.3857474e-07
Spatial​ ​min​ ​var​ ​->​ ​sp​ ​softmax:​ ​ ​ ​ ​ ​ ​0.0032294
----
(no​ ​ch​ ​sm)
Channel​ ​one​ ​hot​ ​xent,​ ​spatial​ ​xent,​ ​0.0303
Channel​ ​one​ ​hot​ ​xent:​ ​ ​ ​ ​0.1652550
Spatial​ ​xent:​ ​ ​ ​ ​5.4539595

Channel​ ​one​ ​hot​ ​xent,​ ​spatial​ ​one​ ​hot​ ​xent,​ ​2.94e-04


Channel​ ​one​ ​hot​ ​xent​ ​->​ ​spatial​ ​softmax:​ ​ ​ ​ ​9.8324863e-06
Spatial​ ​one​ ​hot​ ​xent:​ ​ ​ ​ ​0.0334366

Channel​ ​one​ ​hot​ ​xent,​ ​spatial​ ​min​ ​var,​ ​0.0145


Channel​ ​one​ ​hot​ ​xent:​ ​ ​ ​ ​0.1667462
Spatial​ ​min​ ​var:​ ​ ​ ​11.4701481
----
(sp​ ​before​ ​ch)
Channel​ ​orthogonal,​ ​spatial​ ​xent,​ ​0.166
Channel​ ​orthogonal:​ ​ ​ ​ ​0.9036126
Spatial​ ​xent:​ ​ ​ ​ ​5.4539595

Channel​ ​orthogonal,​ ​spatial​ ​one​ ​hot​ ​xent,​ ​9.15e-04


Channel​ ​orthogonal​ ​->​ ​spatial​ ​softmax:​ ​ ​ ​ ​1.0179439e-05
Spatial​ ​one​ ​hot​ ​xent:​ ​ ​ ​ ​0.0111206

Channel​ ​orthogonal,​ ​spatial​ ​min​ ​var,​ ​ ​0.0783


Channel​ ​orthogonal:​ ​ ​ ​ ​ ​0.8980038
Spatial​ ​min​ ​var:​ ​ ​ ​ ​11.4701481

Grey_net1​ ​sh9​ ​gpu1,​ ​done​ ​3​ ​2​ ​4


Grey_net1​ ​sh13​ ​gpu4​ ​85781
Grey_net2​ ​sh9​ ​gpu4​ ​79405,​ ​done​ ​3​ ​2​ ​4​ ​1
Grey_net3​ ​sh13​ ​gpu3​ ​68261,​ ​done​ ​3​ ​2​ ​4​ ​1
Both

Losses
- Proportion-all%:​ ​Measure​ ​proportion​ ​of​ ​inside-blobs​ ​(vs​ ​all​ ​in​ ​whole​ ​stack),​ ​for​ ​top​ ​n%​ ​of​ ​pixels
- Proportion-layer%:​ ​Measure​ ​proportion​ ​of​ ​inside-blobs​ ​(vs​ ​all​ ​in​ ​same​ ​layer),​ ​for​ ​top​ ​n%​ ​of​ ​pixels
per​ ​layer
- Proportion-squashed%:​ ​Sum​ ​across​ ​all​ ​non-bg​ ​layers,​ ​pick​ ​top​ ​n%​ ​of​ ​pixels,​ ​proportion​ ​of​ ​them
inside​ ​ground​ ​truth​ ​boxes​ ​vs​ ​all
- Channel-spread-n%:​ ​Sum​ ​across​ ​all​ ​non-bg​ ​layers,​ ​pick​ ​top​ ​n%​ ​of​ ​pixels,​ ​entropy
- Top​ ​pixel​ ​across​ ​all​ ​layers
- Most​ ​spatially/channel​ ​sparse​ ​pixel​ ​across​ ​all​ ​layers

*Type-per-layer:​ ​for​ ​each​ ​val​ ​image,​ ​for​ ​each​ ​layer,​ ​for​ ​each​ ​obj,​ ​add​ ​sum​ ​of​ ​bounding​ ​box​ ​to​ ​vote​ ​for
(layer,​ ​obj​ ​type).​ ​Work​ ​down​ ​this​ ​list​ ​in​ ​terms​ ​of​ ​vote​ ​intensity,​ ​assigning​ ​obj​ ​to​ ​highest​ ​layer​ ​that​ ​has​ ​not
been​ ​taken.​ ​For​ ​each​ ​image,​ ​count​ ​proportion​ ​inside​ ​ground​ ​truth​ ​boxes​ ​in​ ​greedily​ ​determined​ ​layers.
Pick​ ​top​ ​n​ ​blobs​ ​and​ ​measure​ ​avg​ ​count​ ​per​ ​obj​ ​(want​ ​low)​ ​(coverage/consistency​ ​over​ ​whole​ ​stack​ ​of
layers)​ ​-​ ​need​ ​to​ ​know​ ​the​ ​classes​ ​of​ ​the​ ​objects!​ ​Not​ ​objs​ ​being​ ​learned...

A​ ​Channel​ ​sparsity​ ​(non​ ​overlap)


B​ ​Spatial​ ​sparsity​ ​(dots)
C​ ​Background​ ​lines​ ​not​ ​prominent
D​ ​Background​ ​layer​ ​accurate
What​ ​may​ ​be​ ​sharp​ ​heatmap​ ​may​ ​not​ ​be​ ​best…​ ​all​ ​about​ ​ranking?​ ​Non-0​ ​cutoff​ ​for​ ​presence​ ​is​ ​fine?​ ​Yep,
use​ ​top-n

Non-movie

Run​ ​D
With​ ​more​ ​numerically​ ​precise​ ​constants​ ​as​ ​detailed​ ​with​ ​->​ ​below.
Script1,​ ​sh12,​ ​gpu2,​ ​24349,​ ​done:​ ​1,​ ​2,​ ​3,​ ​4
Script2,​ ​sh12,​ ​gpu3,​ ​24492,​ ​done:​ ​1,​ ​2,​ ​3,​ ​4
Script5,​ ​sh9,​ ​gpu4,​ ​41321,​ ​done:​ ​1,​ ​2,​ ​3,​ ​4

Run​ ​C
shallow14
Script5​ ​(orthogonal​ ​channel​ ​loss),​ ​gpu​ ​3,​ ​17366,​ ​done:​ ​2,​ ​3,​ ​4
Constants

Ch_orthogonal​ ​(no​ ​ch_sm)


Alone
Ch_orthogonal​ ​0.9067919,​ ​sp_sm​ ​level​ ​ ​ ​ ​9.9738660e-05

Sp_xent:​ ​(from​ ​sp_sm)​ ​1​ ​->​ ​0.166


Sp_xent​ ​ ​ ​ ​5.4539809​ ​(rounds​ ​to​ ​5)
Ch_ortho​ ​ ​ ​ ​0.9074898​ ​(rounds​ ​to​ ​1)
(Sp_sm​ ​level​ ​ ​ ​ ​9.5911339e-05)

Sp_one_hot:​ ​(from​ ​m1)​ ​1e-2​ ​->​ ​0.00722


Sp_one_hot​ ​ ​ ​ ​0.0143866

Ch_ortho​ ​ ​ ​ ​0.9059003
Sp_sm_level​ ​ ​ ​ ​1.0393703e-04

Sp_min_var:​ ​(from​ ​sp_sm)​ ​1e-1​ ​->​ ​0.0789


Sp_min_var​ ​ ​ ​11.4700451
Ch_ortho​ ​ ​ ​0.9052058
(Sp_sm​ ​level​ ​ ​ ​ ​0.0032429)

Lr​ ​e-6

Run​ ​B
Shallow6
Script1,​ ​1,​ ​28615​ ​(done:​ ​2,​ ​3,​ ​4,​ ​1)
Script2,​ ​2,​ ​66732​ ​(done:​ ​2,​ ​3,​ ​4,​ ​1)
Script2,​ ​4,​ ​16126,​ ​3​ ​for​ ​200​ ​epochs
Script3,​ ​3​ ​(1​ ​but​ ​no​ ​layer​ ​4)​ ​-​ ​these​ ​constants​ ​lrs​ ​may​ ​be​ ​diff,​ ​esp​ ​for​ ​one​ ​hot,​ ​66884​ ​(done:​ ​2,​ ​3,​ ​4,​ ​1)
Script4,​ ​4​ ​(2​ ​but​ ​no​ ​layer​ ​4)​ ​-​ ​these​ ​constants​ ​lrs​ ​may​ ​be​ ​diff,​ ​esp​ ​for​ ​one​ ​hot,​ ​67026​ ​(done:​ ​2,​ ​3,​ ​4,​ ​1)

Most​ ​promising:
1_2​ ​(ch​ ​xent​ ​+​ ​sp​ ​xent)
1_3​ ​(ch​ ​xent​ ​+​ ​sp​ ​one​ ​hot​ ​xent)
1_4​ ​(ch​ ​xent​ ​+​ ​min​ ​var)

2_*​ ​seem​ ​to​ ​be​ ​good​ ​for​ ​C​ ​-​ ​channel​ ​sparsity​ ​very​ ​important​ ​for​ ​spatial​ ​sparsity!
2_1​ ​(ch​ ​one​ ​hot​ ​xent)
2_2​ ​(ch​ ​one​ ​hot​ ​xent​ ​+​ ​sp​ ​xent)
2_3​ ​(ch​ ​one​ ​hot​ ​xent​ ​+​ ​sp​ ​one​ ​hot​ ​xent)
2_4​ ​(ch​ ​one​ ​hot​ ​xent​ ​+​ ​sp​ ​min​ ​var)

<​ ​have​ ​spatial​ ​loss​ ​mean​ ​be​ ​1.1​ ​*​ ​channel​ ​loss​ ​mean​ ​-​ ​channel​ ​loss​ ​is​ ​supposed​ ​to​ ​be​ ​moderating​ ​force?
No.​ ​Channel​ ​loss​ ​is​ ​supposed​ ​to​ ​be​ ​strong​ ​too…​ ​prevent​ ​overlap!>
Smaller​ ​lr​ ​will​ ​help​ ​with​ ​numerical​ ​problems​ ​-​ ​mini​ ​snowball​ ​effects​ ​averaged​ ​out

Constants,​ ​determined​ ​using​ ​average​ ​values​ ​of​ ​derivatives, Constants


to​ ​equalise​ ​effect​ ​on​ ​m1
Sp​ ​factor:​ ​rounded​ ​up Ch_one_hot_xent
Alone
Ch_one_hot_xent​ ​ ​0.1747313,
Ch_xent sp_sm​ ​level​ ​1.0957074e-04
Alone
Ch_xent​ ​1.3988582,​ ​at​ ​ch_sm​ ​level​ ​1.5177328e-08,​ ​sp_sm With​ ​sp_xent​ ​(from​ ​sp_sm):​ ​e-1​ ​->​ ​0.0316
level​ ​8.5543109e-12 Sp_xent​ ​5.4539776
Ch_one_hot_xent​ ​0.1721362,
With​ ​sp_xent​ ​(from​ ​sp_sm):​ ​*​ ​e-8​ ​->​ ​2.67e-9 (Sp_sm​ ​level​ ​1.0374722e-04)
Sp_xent​ ​5.4539776​ ​(​ ​5.4349769e-08)
With​ ​sp_one_hot_xent​ ​(from​ ​m1):​ ​e-1​ ​->​​ ​0.0635
Ch_xent​ ​1.3988582,​ ​at​ ​ch_sm​ ​level​ ​1.4580331e-08 Sp_one_hot_xent​ ​0.0016726
(sp_sm​ ​level​ ​4.6584454e-08​ ​(​ ​ ​7.4275889e-12))
Ch_one_hot_xent​ ​0.1695681,
With​ ​sp_one_hot_xent​ ​(from​ ​m1)​ ​*​ ​e-9​ ​->​ ​5.44e-9 Sp_sm​ ​level​ ​1.0629114e-04
Sp_one_hot_xent​ ​0.0016709​ ​(​ ​1.6157747e-11)
With​ ​sp_min_var​ ​(from​ ​sp_sm):​ ​e-2​ ​->​ ​0.0149
Ch_xent​ ​1.3988582,​ ​at​ ​ch_sm​ ​level​ ​1.5664805e-08,​ ​at​ ​sp_sm
Sp_min_var​ ​ ​ ​11.4700451
level​ ​9.0829540e-12 Ch_one_hot_xent​ ​ ​ ​ ​0.1708048,
(sp_sm​ ​level​ ​ ​ ​ ​0.0032354)
With​ ​sp_min_var​ ​(from​ ​sp_sm)​ ​*​ ​e-9​ ​->​ ​1.3e-9
Sp_min_var​ ​ ​11.4700451​ ​(​ ​ ​1.1472684e-08) Lr:​ ​e-7​ ​->​ ​e-6
Ch_xent​ ​1.3988582,​ ​at​ ​ch_sm​ ​level​ ​1.4888952e-08 Alone
(sp_sm​ ​level​ ​0.0032294) ​ ​ ​ ​0.0583868
​ ​ ​ ​0.1007572
​ ​ ​ ​0.1425934
Lr,​ ​looking​ ​derivatives​ ​for​ ​conv1,​ ​2,​ ​3,​ ​4:​ ​1e-3,​ ​but​ ​1e-6 ​ ​ ​ ​0.0079129
otherwise​ ​goes​ ​to​ ​NaNs​ ​(even​ ​with​ ​e-5)
Alone With​ ​sp_xent
2.4604301e-09 ​ ​ ​ ​0.0061232
1.3473258e-09, ​ ​ ​ ​0.0053580
​ ​ ​ ​0.0068137
2.4604301e-09,
​ ​ ​ ​0.0034154
2.7288369e-10,
With​ ​sp_one_hot_xent
With​ ​sp_xent ​ ​ ​ ​0.0712916
​ ​ ​ ​1.7000327e-09 ​ ​ ​ ​0.0403903
​ ​ ​ ​1.2616471e-09 ​ ​ ​ ​0.0568874
​ ​ ​ ​1.2098091e-09 ​ ​ ​ ​0.0158062
​ ​ ​ ​1.4244753e-10
With​ ​sp_min_var
​ ​ ​ ​0.0504581
With​ ​sp_one_hot_xent ​ ​ ​ ​0.0397039
​ ​ ​ ​5.3798326e-09 ​ ​ ​ ​0.1137836
​ ​ ​ ​1.1697529e-08 ​ ​ ​ ​0.0123276
​ ​ ​ ​1.1376525e-08
​ ​ ​ ​1.1719968e-09

With​ ​sp_min_var
​ ​ ​ ​4.1186414e-09
​ ​ ​ ​3.4876761e-09
​ ​ ​ ​5.9062066e-09
​ ​ ​ ​4.4414297e-10
Run​ ​A

Spatial:
Xent
One​ ​hot​ ​xent
Min​ ​var
Self​ ​conv

Channel:
Xent

5x5,​ ​5x5,​ ​5x5​ ​d2​ ​(9x9)​ ​(2*(2​ ​+​ ​2)​ ​+​ ​9​ ​=​ ​17)
Compulsory​ ​4th​ ​flatness​ ​suppression​ ​layer
Channel​ ​after​ ​spatial​ ​and​ ​spatial​ ​separate

Network1:​ ​Adam,​ ​62,​ ​channel​ ​(l=1)​ ​and​ ​spatial​ ​(l=1)​ ​xent​ ​(losses​ ​take​ ​no​ ​lambdas)

Network2:​ ​Adam,​ ​62,​ ​channel​ ​(l​ ​=​ ​1)​ ​and​ ​spatial​ ​(l​ ​=​ ​2)​ ​xent

Network3:​ ​Adam,​ ​62,​ ​channel​ ​(l​ ​=​ ​2)​ ​and​ ​spatial​ ​(l​ ​=​ ​2)​ ​xent

Network4:​ ​Adam,​ ​1,​ ​channel​ ​(l​ ​=​ ​1)​ ​and​ ​one_hot_xent

Network5:​ ​Adam,​ ​1,​ ​channel​ ​xent​ ​(l​ ​=​ ​1),​ ​min_var

Network6:​ ​Adam,​ ​1,​ ​channel​ ​xent​ ​(l​ ​=​ ​1),​ ​self_conv

You might also like