Thursday, December 2, 2010

compiler optimization and algorithm synthesis

this is a big topic, especially the synthesis part is very hard problem, compiler is really a translation tool with some optimization of the context information in software and hardware side.

and algorithm synthesis is like a pattern generation process where the correctness of result is critical.

here I will just list some ideas recently been thinking in my head.

both pattern translation and generation request we have a way to learn the patterns (can be from existing code base or human coding process).  here both the pattern in time and  in graph domain are very important.

once we have those patterns, we then do the following,

analysis –> model extraction –> optimization ( convex, genetic algorithm, annealing) –> validation

the pattern can be static hierarchy or data/control flow patterns. in other words, relational or communication patterns.

    

communication patterns

explicit & implicit communication require a distributed memory system with a shared memory space. these means a special high speed cache to cache network is needed for resolving conflicts.

cache system design

cache is a transaction filtering system with a locally self managed memory, internally, cache has a data structure (e.g 4-way, set associative etc) and a read/write algorithm (hit/miss test, replacement policy, external transaction scheduling such as write back).

the aim of the cache system is to learn the transaction pattern of clients and avoid transaction collision or cache miss.

statistical measure and learning maybe needed for better replacement algorithms. things like victim cache is a good example.

for some cases, double miss + random replacement policy rather than just single miss + least recently used policy

cache service protocol is also important part of the design, for example optional caching memory access. (non cacheable address, always miss region, OR special delivery channel for non-cacheable memory space )

transaction model

transaction model should be aware of the memory system and related protocol for better performance. for example, cache geometry aware malloc algorithm.

 

hardware/ software partition

by life time, context switching overhead and communication explicitness.

hardware function cache / fpga bit stream cached in the frabic

cache system support both high speed local transactions and global shared memory space communication, also a local scratch memory for self managed memory for clients.

clients can be cpu cores or fpga frabic.

No comments: