this is a big topic, especially the synthesis part is very hard problem, compiler is really a translation tool with some optimization of the context information in software and hardware side.
and algorithm synthesis is like a pattern generation process where the correctness of result is critical.
here I will just list some ideas recently been thinking in my head.
both pattern translation and generation request we have a way to learn the patterns (can be from existing code base or human coding process). here both the pattern in time and in graph domain are very important.
once we have those patterns, we then do the following,
analysis –> model extraction –> optimization ( convex, genetic algorithm, annealing) –> validation
the pattern can be static hierarchy or data/control flow patterns. in other words, relational or communication patterns.
communication patterns
explicit & implicit communication require a distributed memory system with a shared memory space. these means a special high speed cache to cache network is needed for resolving conflicts.
cache system design
cache is a transaction filtering system with a locally self managed memory, internally, cache has a data structure (e.g 4-way, set associative etc) and a read/write algorithm (hit/miss test, replacement policy, external transaction scheduling such as write back).
the aim of the cache system is to learn the transaction pattern of clients and avoid transaction collision or cache miss.
statistical measure and learning maybe needed for better replacement algorithms. things like victim cache is a good example.
for some cases, double miss + random replacement policy rather than just single miss + least recently used policy
cache service protocol is also important part of the design, for example optional caching memory access. (non cacheable address, always miss region, OR special delivery channel for non-cacheable memory space )
transaction model
transaction model should be aware of the memory system and related protocol for better performance. for example, cache geometry aware malloc algorithm.
hardware/ software partition
by life time, context switching overhead and communication explicitness.
hardware function cache / fpga bit stream cached in the frabic
cache system support both high speed local transactions and global shared memory space communication, also a local scratch memory for self managed memory for clients.
clients can be cpu cores or fpga frabic.
No comments:
Post a Comment