Sunday, February 17, 2013

Historical Photos and Drawings

Historical Photos and Drawings

Fashion 1988 by Michael O'Brien (sitting in my portfolio for 25 years)



Tuesday, February 28, 2012

URLs

HTML5

Very impressed with or ghost car that escorted Canada geese family of 7 across march/queensway N overpass at 0804 on 20130523
https://twitter.com/_mikeobrien/status/337544671268061184
The 5 baby geese made it over the curb with about 100 cars on hold from 3 directions - thank you to the police service.

Sunday, January 15, 2012

Lake Ice Periodic Expansion Wavefronts

I was at my brother in law's place on the shore of the large lake beside Westport, Ontario, Canada.  An absolutely amazing phenomena happened.  We stood on the lake for about 15 minutes when an expansion wave went through the lake and passed under our feet at a speed that felt like the speed of sound.  Everyone has heard cracks when on the ice - but this was something completely different and very rare.  It seems that the ice at this early stage in its development has not developed any major defects or cracks and when it undergoes expansion due to further freezing as the night falls.  The wave of molecular expansion can be heard a mile away approaching and felt as it moves your feet a fraction of a milimeter and then hits the shore.  The sound is sort of like a light sabre sound just under your feet.

Thursday, August 4, 2011

Radar

http://wiki.eclipse.org/EclipseLink/Examples/Radar

R1: Determine layered GIS data for rainfall distribution at 1km resolution @ 200 km range
R2: Provide historical data
R2.1: Provide volumentric data per GPS position
R3: Provide 30 min prediction window
R3.1 Extend prediction window by including surrounding area weather
R4: Provide present status per GPS position
R5: Map noise areas for each radar site




DI1: Database: Derby or Oracle GIS/SDO aware?

DI2: Database Expected Volume
There are 14 levels of rainfall represented by color bands from purple to light blue.  If we include the ground color (green-grey), rivers (navy) and borders (black) we have 17 levels.  We also want to encode (null/unset/no-data) as white or -1 - this gives us 18 levels which fits in a single 8 bit byte.
There are approximately 500x480 pixels at 1km resolution which works to 234Kb decoded.
We expect 24 x 6 = 144 images / site / day, which comes to 234Kb x 144 = 34Mb / day / site.
We therefore need 12 Gb storage / year / site.  A standard 2Tb drive which is around the effective limit of most databases will hold 169 years of data (disregarding compression gains and error handling losses).  We should be able to hold our goal of 10 years of radar data for 16 sites comfortably.

This assumes we store raw data without gps coordinates, we may want to only store colored pixels.
Radar Sites:
http://en.wikipedia.org/wiki/Canadian_weather_radar_network
http://en.wikipedia.org/wiki/Beckwith,_Ontario
http://www.msc-smc.ec.gc.ca/projects/nrp/image_e.cfm?scale=true&s_image_querystring=city%3Dfranktown%26number%3D3&s_image_referrer=franktown%5Fe%2Ecfm&city=franktown&number=3
http://radar.weather.gov/Conus/index_loop.php

DI3: Persistence of Image data in JPA

DI4: Image URL capture is locking after a couple weeks

Stack trace:
 java.net.SocketInputStream.socketRead0(Native Method)
java.net.SocketInputStream.read(SocketInputStream.java:129)
java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
java.io.BufferedInputStream.read(BufferedInputStream.java:317)
   - locked java.io.BufferedInputStream@c32360
sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:695)
sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:640)
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1195)
   - locked sun.net.www.protocol.http.HttpURLConnection@11b865
org.obrienscience.radar.integration.ResourceManager.captureImage(ResourceManager.java:141)
org.obrienscience.radar.integration.ResourceManager.captureImage(ResourceManager.java:91)
org.obrienscience.radar.integration.ResourceManager.captureImage(ResourceManager.java:456)
org.obrienscience.radar.integration.ResourceManager.captureRadarIndefinitely(ResourceManager.java:699)
org.obrienscience.radar.integration.LiveRadarService.performCapture(LiveRadarService.java:67)
org.obrienscience.radar.integration.ApplicationService.performCapture(ApplicationService.java:326)
org.obrienscience.radar.integration.ApplicationService.performCapture(ApplicationService.java:316)
org.obrienscience.radar.integration.LiveRadarService.performCapture(LiveRadarService.java:57)
org.obrienscience.radar.integration.LiveRadarService.main(LiveRadarService.java:101)




References:
http://www.radar.mcgill.ca/who-we-are/history.html
http://java.sun.com/javase/technologies/desktop/media/
http://en.wikipedia.org/wiki/Numerical_weather_prediction
http://en.wikipedia.org/wiki/Weather_radar
http://www.dtic.mil/dtic/tr/fulltext/u2/a261190.pdf




Scratchpad:

Genetics

DNA Testing, Storage and Interpretation
https://www.23andme.com/

Friday, March 25, 2011

Distributed and Multithreaded Applications

  A problem that maps very well to multiple cores and is easily parallelized is the computation of the Mandelbrot set.
  The following graph is the result of an experiment where I varied the number of cores used to render each frame of a deep zoom to the limit of ''double'' floating point precision. When I run this algorithm as a traditionally single threaded application it takes up to 800 seconds to render a 1024x1024 grid from 1.0 to 1 x 10^-16. However when I start adding threads I see the best speedup when I use the same number of threads as there are hard processors (non-hyperthreaded). The performance increase nears it's maximum 8 times increase for an Intel Corei7-920 when I approach a thread/line of 512 threads.

  As you can see from the graph, we benefit more from a massive number of threads - as long as they are independent. The Mandelbrot calculation however it not homogeneous - computing the central set requires a lot more iteration than outlying areas. This is why each parallel algorithm must be fine tuned to the problem it is solving. If you look at the screen captures of performance during the runs with various thread counts you will see what I mean. The processor is not being exercised at it's maximum capacity when the ''bands'' assigned to particular threads are finished before other threads that are performing more calculations than their peers. If we increase the number of bands - we distribute the unbalanced load among the cores more evenly - at a slight expense of thread coordination/creation/destruction.

Multicore Rendering of Mandelbrot Set
  The following runs are on a 1024x1024 grid starting form 1.0 to 0.0000000000000001 that take from 800 to 67 seconds depending on the number of threads used concurrently. Notice that I have a temporary issue with shared variable access between threads - as some of the pixel coloring is off.
  As you can see - the processor usage goes from 12% for a single core, through 50% for 8 cores - to 99% for 128+ cores. (we need to leave some CPU cycles to the system so our mouse functions)
  Why do we need so many threads? If even one thread takes longer than any other ones that are already completed their work unit - the entire computation is held up. We therefore use more work units than there are threads.
  A better algorithm would be to distribute work units asynchronously instead in the current MapReduce synchronous way we currently use. When a thread is finished, it can work on part of the image that is still waiting processing. We would need to distribute work units more like packets in this case.

  1 thread on an 8-core i7-920 takes 778 sec

  2 threads on an 8-core i7-920 takes 466 sec

 16 threads on an 8-core i7-920 takes 138 sec

  128 threads on an 8-core i7-920 takes 114 sec


Thread Contention for Shared Resources:
  For our multithreaded Mandelbrot application - which currently is not @ThreadSafe - we encounter resource contention specific to the Graphics context. This type of contention is the same for any shared resource such as a database. The issue is that setting a pixel on the screen is not an atomic operation - it consists of setting the current color and then drawing the pixel (The Java2D API may require multiple internal rendering steps as well). The result of this is that another thread may change the color of the graphics context before the current thread actually writes the pixel - resulting in noise - or more accurately - '''Data Corruption'''.


  Note: that no noise or data corruption occurs when we run a single thread. We only get a problem when we run multiple threads concurrently.
color = Mandelbrot.getCurrentColors().get(iterations);
 color2 = color;

 // these 2 lines need to be executed atomically - however we do not control the shared graphics context
 synchronized (color) { // this does not help us with drawRect()
    mandelbrotManager.getgContext().setColor(color);
    // drawRect is not atomic, the color of the context may change before the pixel is written by another thread
    mandelbrotManager.getgContext().drawRect((int)x,(int)y,0,0);
 }
 if(color2 != mandelbrotManager.getgContext().getColor()) {
    System.out.println("_Thread contention: color was changed mid-function: (thread,x,y) " + threadIndex + "," + x + "," + y);
    // The solution may be to rewrite the pixel until the color is no longer modified or only allow a host thread to write to the GUI
 }

_Thread contention: color was changed mid-function: (thread,x,y) 2,298,22
_Thread contention: color was changed mid-function: (thread,x,y) 15,140,155
_Thread contention: color was changed mid-function: (thread,x,y) 15,140,156
_Thread contention: color was changed mid-function: (thread,x,y) 15,140,157
_Thread contention: color was changed mid-function: (thread,x,y) 15,141,151
_Thread contention: color was changed mid-function: (thread,x,y) 2,307,25
_Thread contention: color was changed mid-function: (thread,x,y) 15,143,154
_Thread contention: color was changed mid-function: (thread,x,y) 15,144,152
_Thread contention: color was changed mid-function: (thread,x,y) 13,0,130
_Thread contention: color was changed mid-function: (thread,x,y) 11,0,110

  The better solution would be designate a host thread that coordinates all the unit of work threads and acts as a single proxy to the GUI - only one thread should update AWT or Swing UI elements - as most of them are not thread safe by design. Multithreaded distributed applications need to be very careful when using GUI elements. For example if I do not introduce at least a 1ms sleep between GUI frames - the entire machine may lock up when 100% of the CPU is given to the calculating threads.

Sunday, January 23, 2011

Sequences and Patterns

   Note: 20110218 - See the following distributed application case study that uses Collatz to research concurrency issues.

http://wiki.eclipse.org/EclipseLink/Examples/Distributed

   I ran into hailstone numbers again recently when i was writing a very quick benchmark and volumetric test for JVMs deployed on my server farm recently.  I needed a way to test mixed integer and floating point performance using the unbounded BigInteger java class.  what better way than writing a small (3n +1) algorithm - like i did in my youth in 1984 after i read Brian Hayes' Scientific american column.
   it turns out that you get much better performace by using the following java virtual machines in descending order.  SUN 1.6 server 64bit, Oracle JRockit 32/64bit, the latest 1.6.0_23 SUN and lastly older SUN 1.6 JVM's .

See: http://en.m.wikipedia.org/wiki/Collatz_conjecture?wasRedirected=true

   The scientific part of this exploration would be attempt to verify the Collatz theorem past 100 billion - which would normally result in a Long.MAX_VALUE overflow past 2^63 of the max height reached.  Ideally i would like to find a hainstone number that stays above the 4-2-1 squence indefinitely - this will take a lot of computer power.
   The good news is - we now have access to a lot of computing power (gone are the days of running 3n+1, game of life and mandelbrot work over night on my TRS-80 - we have multicore and RMI for scientific distributed computing.  What we really need is time on a supercomputing architecture or distrubution of the BigInteger ranges on a massive number of distributed processors in parallel.

After 1 week of BigInteger iteration on around 20 JVM's calculating a half trillion sequences - I am currently at the following maximum path.

For n = 404,970,804,222 we reach a path of 1308 and a max of 2,662,567,439,048,656



Problem:
   Every number under the function where odd numbers increase to 3n + 1 and even numbers are reduced by n/2 eventually reach 1 and loop forever in the 4-2-1 sequence.

Optimizations:
1 - Mike Roosendaal: all odd number transformations are always followed by an even transformation - therefore both steps can be done together.
     If odd : n = (3n + 1)/2 = (3/2)n + 1/2 = n + n/2 + 1/2
     or using shifts (my optimization) : (n >> 1) + n + 1
     Notice that the transformation for odd [ (n << 1) + n + 1 ] is nearly identical to the double transformation for odd with an immediate even (n >> 2) - as in [ (n >> 1) + n + 1 ] except for the reversal of a shift left to a shift right.

Architecture:
     We can narrow down our choice of architecture to two opposite examples.  Either we have multiple clients that asynchronously ask a central server for units of work (the SETI@Home model), or we run multiple servers controlled from a single client.

Option 1: @ManyToOne

Option 2: @OneToMany

Hardware:
1 - SunFire Virtual Linux (dual core VM)
1 - Core i7-920 (2.7GHz 4-core/8-thread)
4 - P4-630 (3.0 GHz 1-core/2-thread)
4 - P4-531 (3.0 GHz 1-core/2-thread)
2 - E8400 (3.0 GHz - 2-core/2-thread)
1 - Q6600 (2.4 GHz - 4-core/4-thread)
1 - T4400 (2.2 GHz - 2-core/2-thread)
1 - P4- (2.0 GHz - 1-core/1-thread)
1 - P4 - (1.6 GHz - 1-core/1-thread)
2 - XMOS XS-1/G4 - (400 MHz - 4-core/32-thread)
80 - P8X-32 (20 MHz - 8-core/8-thread)


Software:
For ease of use i am running on various versions of the Java Virtual Machine (which is roughly performant with compiled native C for integer computation) on all Windows and Linux boxes.
For the XMOS chips i run XC (parallel C) and the Parallax chips run SPIN mixed with machine language.

SUN Hotspot JVM 1.6.0
Oracle JRockit JVM 1.6.0

Java client before extreme optimization (object pooling, loop truncation...) and distribution:

public List hailstoneSequence(BigInteger start)  {
  List sequence = new ArrayList();
  if(start.equals(BigInteger.ZERO) || start.equals(BigInteger.ONE)) {
   return sequence;
  }
  BigInteger current = start;
  while (!current.equals(BigInteger.ONE)) {
   if(current.testBit(0)) { // test odd
    current = current.shiftLeft(1).add(current).add(BigInteger.ONE);
   } else {
    current = current.shiftRight(1);//.divideAndRemainder(TWO)[0];    
   }
   sequence.add(current);
  }
  return sequence;  
 }

 public static void main(String[] args) {
  // get parameters
  int sector = Integer.parseInt(args[0]);
  int mult = Integer.parseInt(args[1]);
  BenchMark aBench = new BenchMark();
  StringBuffer buffer = null;
  long x = 0;
  //long lastVal = Long.MAX_VALUE;
  long lastVal = 4294967296L;
  long startTime = GregorianCalendar.getInstance().getTimeInMillis();

  for(int y=0;y<1;y++) {
   startTime = GregorianCalendar.getInstance().getTimeInMillis();
  for(x=0;x< lastVal;x++) {
   x = x + 1 - 1;
   //System.out.println(buffer.toString());
  }
  long endTime = GregorianCalendar.getInstance().getTimeInMillis() - startTime;
  System.out.println(y + ":" + x + "," + endTime + "," + (x/((1 + endTime)/1000)) + " it/sec");
  }

  long maxPath = 0; // path should fit in 64 bits
  BigInteger maxValue = BigInteger.ONE;
  startTime = GregorianCalendar.getInstance().getTimeInMillis();
  BigInteger currentMax = BigInteger.ONE;
  boolean milestone = false;
  String prefix = null;
  long rangeStart = (sector * mult) * 1048576L;//67108864L;//4294967296L;
  long rangeEnd = rangeStart + (mult * 1048576L);//67108864L);//4294967296L);
  long rangeInterval = (mult * 65536);//1048576;//67108864L;//134217728L;//268435456L;
  System.out.println("Proc cores:  " + Runtime.getRuntime().availableProcessors());
  System.out.println("Runtime    : " + Runtime.getRuntime());
  System.out.println("Short max:   " + Short.MAX_VALUE);
  System.out.println("Integer max: " + Integer.MAX_VALUE);
  System.out.println("Long max:    " + Long.MAX_VALUE);  
  System.out.println("Partition:   " + rangeInterval);
  System.out.println("Range:       " + rangeStart + " to: " + rangeEnd);
  
  long currentNumber = rangeStart;
  List list = null;
  /**
   * We use a double loop to show interim progress by splitting up the search space into quadrants
   */
  for(long interval = 0;interval < 16;interval++) { // sectors are dived by two (we only test odd numbers)
   currentNumber = currentNumber + 1;
   for(long index=1;index maxPath) {
      milestone = true;
      maxPath = list.size();
      prefix = "P";
     }
    
     if(currentMax.compareTo(maxValue) > 0) {
      if(milestone) {
       prefix = "PM";
      } else {
       milestone = true;
       prefix = "M";
      }
      maxValue = currentMax;
   }
   if(milestone) {
    buffer = new StringBuffer(prefix);
    buffer.append(",N,");
    buffer.append(interval);
    buffer.append(",");
    buffer.append(index);
    buffer.append(",\t");    
    buffer.append(currentNumber);
    buffer.append(",L,");   
    buffer.append(list.size());   
    buffer.append("\t");
    buffer.append(",M,");
    buffer.append(currentMax); // BigInteger implements Comparable
    //buffer.append(list.toString());
    buffer.append("\t");
    buffer.append(",T,");
    buffer.append(GregorianCalendar.getInstance().getTimeInMillis() - startTime);
    buffer.append("\t");
    if((currentMax.compareTo(BigInteger.valueOf(Long.MAX_VALUE)) > 0)) {
     buffer.append("2^63+,");
     buffer.append(currentMax.subtract(BigInteger.valueOf(Long.MAX_VALUE)));
     buffer.append(",");
    } else {
     buffer.append(",,");
    }
    
    buffer.append(",F,");
    buffer.append(Runtime.getRuntime().freeMemory());
    
    System.out.println(buffer.toString());
    milestone = false;
   }
   }
   // increment
   currentNumber += 2;
  }
  }

Power:
Todo:  I need to get some single crystal PV boards on my roof and join microFIT so i can get $0.80 to offset my 2 KiloWatts (Note to OPA - my usage ranges from .5KWh to 3KWh depending on what simulations are running - i am not a GO).
So far in the basement i am running about 14-18 A at 120V AC =
 about 1.7 to 2 KWh or about $0.50/hour when i have 10-14 boxes running.

It may be more efficient and cheaper to replicate my ECC cloud instance that is running about $
0.14 /h (as long as i remember to stop and detach all the instances)

C++
// collatz_vs10.cpp : Defines the entry point for the console application.
#include "stdafx.h"

__int64 hailstoneMax(__int64 start) {
 __int64 maxNumber = 0;
    __int64 num = start;
 __int64 path = 0;
    while(num > 4) {
  //printf("%I64d ", maxNumber);
     if((num % 2) > 0) {
      num = (num << 1) + num + 1; // odd
     } else {
      num >>= 1; // even
     }
     if(num > maxNumber) {
      maxNumber = num;
     }
  path++;
    }
 return maxNumber;
}

int _tmain(int argc, _TCHAR* argv[]) {
  __int64 num = 27;
  //unsigned long long num = 1;
  __int64 maxNumber = 0;
  __int64 newMax = 0;
  unsigned long long path = 0;
  unsigned long long maxPath = 0;
  __int64 MAX = (1 << 40); // dont use long long
  while(num < MAX) {
 newMax = hailstoneMax(num);
 
 if(newMax > maxNumber) {
 printf("\n%I64d,\t%I64d",num, newMax);
  maxNumber = newMax;
  //printf("\n%d,\t%I64d",num,maxNumber); // or I64u, %llu (do not work properly)
 }
 num += 2;
  }
  return 0;
}
Alternate Platforms
Just as an aside - I wrote some code for the current leaders in multicore microcontrollers - XMOS and Parallax.
The 4-core/32-thread XMOS G4 is easier to develop for because it runs compiled XC (C with parallel and concurrency language additions) instead of SPIN bytecode like the 8-core/8-thread Parallax Propeller (It can run assembly) - we can also use a standard Eclipse.org Eclipse IDE.



Here is some XMOS parallel XC code for 32 bit integers.
#include 
#include  //http://www.xmos.com/discuss/viewtopic.php?f=6&t=255
#define PERIOD 20000000

// Processor 0 = 4 green + 3r/g LED + 16 I/O, 4 push-buttons
//out port cledB0 = PORT_BUTTONLED;//PORT_CLOCKLED_0;
out port cled0 = PORT_BUTTONLED;//PORT_CLOCKLED_0;
// anode 4 bit ports
//out port cled0 = PORT_CLOCKLED_0;
// Processor 1 = 3r/g LED + 16 I/O
out port cled1 = PORT_CLOCKLED_1;
// Processor 1 = 3r/g LED + 16 I/O
out port cled2 = PORT_CLOCKLED_2;
// Processor 1 = 3r/g LED + 32 I/O
out port cled3 = PORT_CLOCKLED_3;
// cathode 1 bit ports
out port cledG = PORT_CLOCKLED_SELG;
out port cledR = PORT_CLOCKLED_SELR;

/**
 * http://en.wikipedia.org/wiki/XSwitch#XS1-G4_Switch
 */
unsigned long number = 27;
unsigned long maximum = 1 << 18;//138367; // 32 bit registers top out at a 4 billion max for 138367

// Compute the hailstone maximum
unsigned long hailstoneMax(unsigned long start) {
 unsigned long maxNumber = 0;
    unsigned long number = start;
    while(number > 1) {
     if((number % 2) > 0) {
      number = (number << 1) + number + 1; // odd
     } else {
      number = number >> 1; // even
     }
     if(number > maxNumber) {
      maxNumber = number;
     }
    }
 return maxNumber;
}

void hailstoneSearch(int coreId, out port led, unsigned long start, unsigned long end) {
  unsigned long number = 27;
  unsigned long maxNumber = 0;
  unsigned long newMax = 0;
  int flip = 0;
  //write_pswitch_reg(get_core_id(), XS1_PSWITCH_PLL_CLK_DIVIDER_NUM, 0x80);
  while(number < end) {
   newMax = hailstoneMax(number);
   if(newMax > maxNumber) {
  maxNumber = newMax;
  // TODO: send message to other cores
  // UART printing really slows down the cores
  /*if(coreId < 1) { // only core 0 prints
   printuint(number);
   printchar(',');
   printchar('\t');
   printuintln(maxNumber);
  }*/
  if(flip > 0) {
   flip = 0;
   led <: 0b1111;
  } else {
   flip = 1;
   led <: 0b0000;
  }
   }
 number = number + 2;
  }
  printint(coreId); // print core id when finished
}

// Search a range of integers for their hailstone maximums
void hailstoneSearch0(int coreId, out port led, out port redCathode, out port greenCathode,
  unsigned long start, unsigned long end) {
   redCathode <: 0b1111;
   //cledB0 <: 0b1111;
   //greenCathode <: 0b1111;
   printuint(start);
   printchar('-');
   printuintln(end);
   hailstoneSearch(coreId, led, start, end);
   // reduce temperature by lowering the PLL multiplier
   write_pswitch_reg(get_core_id(), XS1_PSWITCH_PLL_CLK_DIVIDER_NUM, 0x80);
   while(1);
}

int main() {
 // concurrent threads p.33 http://www.xmos.com//system/files/xcuser_en.pdf
 par {
  on stdcore [0]: hailstoneSearch0(0, cled0,cledR,cledG,number, maximum);
  on stdcore [1]: hailstoneSearch(1, cled1,number, maximum);
  on stdcore [2]: hailstoneSearch(2, cled2,number, maximum);
  on stdcore [3]: hailstoneSearch(3, cled3,number, maximum);
 }
 return 0;
}
For the Propeller wrote a very quick SPIN program that only runs in 1 of the 8 cogs on a parallax propeller microcontroller. This code needs to be parallized and also converted to assembly along with a need for a fixed point unlimited length integer library object. It currently cannot compute sequences past 113381 as this will overload the built in 32 bit register length.
First Assembly language (this is my first routing in Propeller assembly - it is a testament to the tutorial by deSilva from 2007 - I was able to write this function after 2 hours and 14 pages of the PDF.
DAT
              org       0
entry         ' iterate the collatz sequence for _nextVal
              RDLONG    _nextVal, PAR       ' read from shared ram (7-22 cycles)
              MOV       _loops, #1
              SHL       _loops, #30         ' load 2^24 iteration count (16777216)
              MOV       DIRA, _ledDir
              MOV       OUTA, 27'_loops
              SHL       OUTA, #9
:reset
              MOV       _nextVal, #27        ' hardcode start of search at 27
              SUB       _loops, #1
              CMP       _loops, #1 WZ
         IF_E JMP       #:finish
              MOV       OUTA, _loops
              SHL       OUTA, #9
:iterate
              ADD       _path, #1           ' increment path
              MOV       _bit0, #1           ' create mask
              AND       _bit0, _nextVal WZ  ' check bit 0 - affect zero flag
        IF_NE JMP       #:mul3x1
:div2         ' if even we divide by 2
              SHR       _nextVal, #1        ' divide by 2
              CMP       _nextVal, #1 WZ     ' check for 1 value == finished
         IF_E JMP       #:reset             ' sequence returned to 1 - exit              
              JMP       #:iterate           ' return to top of loop
:mul3x1       ' if odd we transform by 3n + 1
              MOV       _3rdVal, _nextVal
              SHL       _nextVal, #1        ' multiply by 2
              ADD       _nextVal, _3rdVal   ' add to multiply by 3
              ADD       _nextVal, #1        ' add 1
:maxValue     ' check for maximum value
              MIN       _maxVal, _nextVal   ' VERY ODD (max is actually min)
              JMP       #:iterate           ' return to top of loop
:finish
              SUB       _path, #1           ' we discount the first path count
              MOV       _nextVal, _path     ' copy path to return val
              WRLONG    _nextVal, PAR       ' write back to hub ram (thank you deSilva for reverse flow explanation)
'              WRLONG    _path, PAR               
:endlessLoop
              MOV       OUTA, #170
              SHL       OUTA, #16
              JMP       #:endlessLoop       ' keep the cog running
_3rdVal       long      $00000000
_nextVal      long      $00000000      
_maxVal       long      $00000000
_path         long      $00000000
_bit0         long      $00000000
_loops        long      $00000000
_ledDir       long      $FFFF<<16
_ledOut       long      $FFFFFFFF
              FIT       496                 ' deSilva (16 I/O registers in 496-511)</source>
Spin (interpreted bytecode)
CON
  _clkmode = xtal1 + pll16x
  _xinfreq = 5_000_000

VAR
  ' shared display RAM for the 4 display cogs
  long buffer[32]                                         
  long Stack0[64]                                         ' Stack Space
  byte Cog[7]                                             ' Cog ID
  long  randomNum
  
OBJ
  SER  : "Parallax Serial Terminal.spin"  
  STR  :"STREngine.spin"                      
  
PUB main | milestone,start,number,index, lRec,x,i, mIndex, mValue, path,height, maxPath, maxHeight
  ' wait for user to switch to terminal
  waitcnt((clkfreq / 1_000 * 4) + cnt)
  maxPath := 0
  maxHeight := 0
  milestone := 0 ' track whether we got a path or max height hit

  ser.Start(115_200)'31,30,0,38400)
  ser.Home
  ser.Clear
  ser.Str(string("Collatz Conjecture", ser#NL))

  'Cog[0] := cognew(Push8seg(0,buffer,2,3,4, 24_000,0), @Stack0) + 1  
  ' main loop
  repeat x from 1 to 113381 step 2
     start := x'77031'27
     path := 0
     height := 0
     number := start     
     repeat until number == 1
       ' if odd transform by 3n+1, else n/2 
       if (number // 2) == 0
         number := number >> 1
       else
         number := (number << 1) + number + 1

       path := path + 1  
       if height < number
         height := number

     ' check maximums
     if maxHeight < height
       maxHeight := height
       milestone := 1 ' flag a hit
        
     if maxPath < path
       ser.Str(string(ser#NL))
       maxPath := path
       if milestone > 0
         ser.Str(string("PM: "))
       else
         ser.Str(string(" P: "))
         milestone := 1 ' flag a hit
     else
       if milestone > 0
         ser.Str(string(ser#NL))
         ser.Str(string(" M: "))
           
  '  print out result if a new record
     if milestone > 0 
       ser.Str(STR.numberToDecimal(x,8))
       ser.Str(string(" Path: "))
       ser.Str(STR.numberToDecimal(path,8))
       ser.Str(string(" Max:  "))
       ser.Str(STR.numberToDecimal(height,31))
       milestone := 0  

  ' wait to allow the port to catch up before closing    
  waitcnt((clkfreq / 1_000 * 4) + cnt)    
  ser.stop    
Here, also is a 1992 version of the hailstone number generator in Smalltalk
Results: Full results for the first 360 billion integers are pending at the end of this week, but here are some interim maximum values for the hailstone sequence. 
Short max:   32767
Integer max: 2147483647
Long max:    9223372036854775807
Partition:   1073741824
type,  path,    max,
PM,   2,   2
PM,   3,   7    ,16    ,T,1    ,,,
PM,         7,  16   ,52    ,T,1    ,,,
P,        9,  19   ,52    ,T,1    ,,,
M,              15,  17  ,160   ,T,2    ,,,
P,              19,  20  ,88    ,T,2    ,,,
P,              25,  23  ,88    ,T,2    ,,,
PM,             27, 111        ,9232  ,T,2    ,,,
P,              55, 112        ,9232  ,T,4    ,,,
P,              73, 115        ,9232  ,T,5    ,,,
P,              97, 118        ,9232  ,T,6    ,,,
P,             129, 121       ,9232  ,T,9    ,,,
P,             171, 124       ,9232  ,T,12   ,,,
P,             231, 127       ,9232  ,T,16   ,,,
M,             255,  47        ,13120        ,T,17   ,,,
P,             313, 130       ,9232  ,T,23   ,,,
P,             327, 143       ,9232  ,T,25   ,,,
M,             447,  97        ,39364        ,T,51   ,,,
M,             639, 131       ,41524        ,T,69   ,,,
P,             649, 144       ,9232  ,T,69   ,,,
PM,            703, 170       ,250504       ,T,69   ,,,
P,             871, 178       ,190996       ,T,70   ,,,
P,        1161, 181      ,190996       ,T,72   ,,,
M,            1819, 161      ,1276936      ,T,75   ,,,
P,            2223, 182      ,250504       ,T,79   ,,,
P,            2463, 208      ,250504       ,T,80   ,,,
P,            2919, 216      ,250504       ,T,83   ,,,
P,            3711, 237      ,481624       ,T,87   ,,,
M,            4255, 201      ,6810136      ,T,91   ,,,
M,            4591, 170      ,8153620      ,T,93   ,,,
P,            6171, 261      ,975400       ,T,99   ,,,
M,            9663, 184      ,27114424     ,T,117  ,,,
P,          10971, 267     ,975400       ,T,123  ,,,
P,          13255, 275     ,497176       ,T,133  ,,,
P,          17647, 278     ,11003416     ,T,159  ,,,
M,          20895, 255     ,50143264     ,T,187  ,,,
P,          23529, 281     ,11003416     ,T,241  ,,,
PM,         26623, 307     ,106358020    ,T,254  ,,,
P,          34239, 310     ,18976192     ,T,288  ,,,
P,          35655, 323     ,41163712     ,T,294  ,,,
P,          52527, 339     ,106358020    ,T,376  ,,,
P,          77031, 350     ,21933016     ,T,487  ,,,
P,        106239, 353    ,104674192    ,T,623  ,,,
P,        142587, 374    ,593279152    ,T,811  ,,,
P,        156159, 382    ,41163712     ,T,880  ,,,
P,        216367, 385    ,11843332     ,T,1179 ,,,
P,        230631, 442    ,76778008     ,T,1253 ,,,
P,        410011, 448    ,76778008     ,T,2172 ,,,
P,        511935, 469    ,76778008     ,T,2759 ,,,
P,        626331, 508    ,7222283188   ,T,3384 ,,,
M,        665215, 441    ,52483285312  ,T,3598 ,,,
M,       704511, 242    ,56991483520  ,T,3837 ,,,
P,       837799, 524    ,2974984576   ,T,4582 ,,,
M,      1042431, 439   ,90239155648  ,T,5726 ,,,
P,      1117065, 527   ,2974984576   ,T,6161 ,,,
M,      1212415, 328   ,139646736808 ,T,6701 ,,,
M,      1441407, 367   ,151629574372 ,T,8010 ,,,
P,      1501353, 530   ,90239155648  ,T,8360 ,,,
P,      1723519, 556   ,46571871940  ,T,9639 ,,,
M,      1875711, 370   ,155904349696 ,T,10532        ,,,
M,      1988859, 427   ,156914378224 ,T,11194        ,,,
M,      2643183, 430   ,190459818484 ,T,15091        ,,,
M,     2684647, 399   ,352617812944 ,T,15339        ,,,
M,      3041127, 363   ,622717901620 ,T,17477        ,,,
M,      3873535, 322   ,858555169576 ,T,22541        ,,,
P,      2298025, 559   ,46571871940  ,T,13033        ,,,
P,      3064033, 562   ,46571871940  ,T,17616        ,,,
P,      3542887, 583   ,294475592320 ,T,20526        ,,,
P,      3732423, 596   ,294475592320 ,T,21685        ,,,
M,      4637979, 573   ,1318802294932        ,T,27248        ,,,
P,      5649499, 612   ,1017886660   ,T,33549        ,,,
M,     5656191, 400   ,2412493616608        ,T,33592        ,,,
M,     6416623, 483   ,4799996945368        ,T,38375        ,,,
M,      6631675, 576   ,60342610919632       ,T,39740        ,,,
P,      6649279, 664   ,15208728208  ,T,39857        ,,,
P,      8400511, 685   ,159424614880 ,T,51033        ,,,
P,    11200681, 688  ,159424614880 ,T,69226        ,,,
P,    14934241, 691  ,159424614880 ,T,93871        ,,,
P,    15733191, 704  ,159424614880 ,T,99175        ,,,
M,    19638399, 606  ,306296925203752      ,T,125377       ,,,
P,    31466383, 705  ,159424614880 ,T,206680       ,,,
P,    36791535, 744  ,159424614880 ,T,243861       ,,,
M,    38595583, 483  ,474637698851092      ,T,256522       ,,,
P,   63728127, 949  ,966616035460 ,T,436156       ,,,
M,    80049391, 572  ,2185143829170100     ,T,554782       ,,,
M,       120080895, 438 ,3277901576118580     ,T,850400       ,,,
P,       127456255, 950 ,966616035460 ,T,905504       ,,,
P,       169941673, 953 ,966616035460 ,T,1226118      ,,,
M,       210964383, 475 ,6404797161121264     ,T,1540036      ,,,
P,       226588897, 956 ,966616035460 ,T,1659884      ,,,
P,       268549803, 964 ,966616035460 ,T,1984199      ,,,
M,       319804831, 592 ,1414236446719942480  ,T,2383830      ,,,
P,       537099607, 965 ,966616035460 ,T,4104391      ,,,
P,       670617279, 986 ,966616035460 ,T,5175958      ,,,
P,      1341234558, 987        ,966616035460 ,T,10515479     ,,,
M,      2379584155, 763        ,7125885122794452160   ,T,18815293     ,,,
P,      2384416993, 993        ,966616035460 ,T,18855328     ,,,
P,      2511978395,1006       ,966616035460 ,T,19923962     ,,,
P,      2610744987,1050       ,966616035460 ,T,20751706     ,,,
P,      4578853915,1087       ,966616035460 ,T,37063783     ,,,
P,      4890328815,1131       ,319497287463520      ,T,39763292     ,,,
P,      9780657630,1132       ,319497287463520      ,T,81997346     ,,,
M,     10829712411, 672       ,15781722338690299312  ,T,90904965     2^63+,6558350301835523505,,
M,     11371756681, 729       ,18144594937356598024  ,T,95810690     2^63+,8921222900501822217,,
M,     12987343015, 556       ,20722398914405051728  ,T,109846948    2^63+,11499026877550275921,,
P,     13040876841,1135      ,319497287463520      ,T,110329343    ,,,
P,     13371194527,1210      ,319497287463520      ,T,113344783    ,,,
Range:635655159808 to: 652835028992
P,    639953863463,1319,  ,2662567439048656       ,41245333       ,
Range: 51539607552 to: 68719476736
M,     51739336447, 770      ,114639617141613998440        ,T,3437912      2^63+,105416245104759222633,,









Range:       17179869184 to: 34359738368
P,17828259369,1213      ,319497287463520      ,T,6056083      ,,,
Range:       34359738368 to: 51539607552
P,1,223038545,        35656518738,1214      ,319497287463520      ,T,17634743     ,,,
P,12,297384717, 47542024985,1217 ,319497287463520 ,T,125367559 ,,,F,54935512 
Range: 51539607552 to: 68719476736
P,11,38599019, 63389366646,1220 ,319497287463520 ,T,113845586 ,,,F,66350984 
Range:       68719476736 to: 85899345920
Range:       85899345920 to: 103079215104
Range:       103079215104 to: 120259084288
P,8,1023057667,       112692207371,1226     ,319497287463520      ,T,356397390    ,,,F,4636736
Range:       120259084288 to: 137438953472
P,12,417148475,       133561134663,1234     ,319497287463520      ,T,480482552    ,,,F,4361648
Range:       171798691840 to: 188978561024
Range:       206158430208 to: 223338299392
P,4,606173317,        211059570825,1245     ,319497287463520      ,T,141893594    ,,,F,11744352
P,12,314731323,       219358063431,1289     ,319497287463520      ,T,381152328    ,,,F,15132528

Range:       240518168576 to: 257698037760
Range:       274877906944 to: 292057776128
Range:       292057776128 to: 309237645312
P,10,932908789,       303728103167,1305     ,2662567439048656     ,T,260241794    ,,,F,37314808

Range:       326417514496 to: 343597383680

Range:       395136991232 to: 412316860416
P,9,170136565,404970804222,1308 ,2662567439048656       ,114658773      ,
Range:       635655159808 to: 652835028992
P,4,3736355,639953863463,1319   ,2662567439048656       ,41245333       ,



                                                                           
hold from Current Research:    As I was optimizing my algorithm from a brute-force/naive approach to one that takes advantage of properties of the Lothar Collatz (3n+1) sequence I did some research on what the current state of Collatz investigation is current at.  Here are some interesting results. Main Wiki site http://en.wikipedia.org/wiki/Collatz_conjecture Tom├ís Oliveira e Silva at Universidade de Aveiro  has computed the sequence to http://www.ieeta.pt/~tos/3x+1.html Eric Roosendaal has offered several usefull optimizations http://www.ericr.nl/wondrous/index.html Ken Korrow http://www-personal.ksu.edu/~kconrow/gentrees.html References: Signed Short MAX is 32767 (2^15 - 1)
Signed Integer MAX is (2 billion) 2,147,483,647 (2^31 - 1)
Signed Long MAX is (9 quintillion) 9,223,372,036,854,775,807 (2 ^63 - 1)