The syntax and text of a Harth program should be familiar to most programmers.
Reified means that all levels of the programming language model (text, lexical/tokens, AST, Semantic Model, Project, Binaries) are processed and available at run-time as ordinary data.
The data where possible immutable (or near enough).
Immutable data gives greater flexibility for shared state, caching, checksum, versioning, undo.
Homoiconic means that the text, lexemes, tokens, AST, Semantic data, Project and binaries have the same shared written syntax (in human and/or binary) as any other language data and/or code.
Thus the distinction between code, data, and functionality blurs.
Text is read from any stream of characters and internally represented
by a Char*
, an abstract sequence of zero or more Unicode characters.
Note that Char
and Sequence<Lexeme>
are immutable data structures.
method Lexer/Read(text : Char*) -> Lexeme*
Method takes some text
of type Char*
; a sequence of Unicode characters of any valid Unicode value.
The method returns Lexeme*
; that is zero or more Lexeme
objects.
/* block */
(allows proper nesting)// line
(for comment to end of line)#!
is allowed (rest of line is special script comment).\n
means a literal newline, raw newlines in source are usually not significant.\r
means a literal carriage return, again these are ignored.\n
and/or \r
pair or single character sequence is counted as “end of line”.BuiltinToken
: Builtin to the language itself (eg. if
, function
, return
, +
, ===
).NameToken
: Any other symbol formed of unrecognized alpha-numeric or symbol characters.ReaderToken
: A special token reader, (eg. "text"
is implemented as a special reader function).CommentTrivia
: A comment span.WhitespaceTrivia
: Any other whitespace.ErrorTrivia
: Some synthetic error detected by the lexer (eg. EOF in unterminated string or comment).Lexeme
and Sequence<Lexeme>
are immutable data structures.The Lexeme*
is further processed by a recursive descent parser (eventually hand written).
method Parser/Read(lexemes : Lexeme*) -> NodeTree
Method takes some lexemes
of type Lexeme*
; a sequence of zero or more Lexeme
.
The method returns a NodeTree
which represents any valid or invalid
AST tree of Node
, essentially a view of the AST tree as a
Homogeneous AST. This view and the underlyig data in immutable.
This means the source text, lexeme and AST syntax tree can be easily versioned, editted, undo using functional style immutable data structures (which are provided by Harth in the standard library).
method Semantic/Analyze(tree : NodeTree) -> SemanticModel
The tree input of type NodeTree
is converted into a static
SemanticModel
instance; from the outside perspective this is
immutable (it’s actually either “frozen” once completed and/or backed
by immutable data such as the Lexeme*
and NodeTree
instances).
Semantic Analysis is useful fairly expensive, so caching these results might be useful both for the user of the library and the provider (which may share some cached/immutable data as is possible).
Again this state can be serialized and deserialized in human or binary forms.
Externally the SemanticModel
allows the programmer (and thus all
tools) to ask any semantic, syntactic, lexical, or textual question.
Typically the analysis phase is responsible for:
Caching the SemanticModel
with the equivalent previous levels allows
for almost instant restarts; the models are all checksummed/hashed for
fast comparison checks. Any tool would start by:
Char*
stored in files, memory, repositories etc).Lexical
and/or NodeTree
and/or SemanticModel
read these,
ensuring they all match each other exactly.This data is also immutable and can thus be safely shared between machines, processes, and/or threads over any connection type.
Essentially all tools wherever they are have the best chance of instantly reading all source code at the highest fidelity - using the least amount of work (at least this time). That is approaching “Read many, write once” levels.
The backend of the Compiler essentially looks like:
method Compiler/Eval(text : Char*) -> Anything
method Compiler/Load(files : Name*) -> SemanticModel
method Compiler/Compile(model : SemanticModel) -> Project
method Compiler/Link(project : Project) -> Executable
method Compiler/Run(exec : Executable) -> Anything
A Project
is the fully reified and homoiconic representation of an
executable program; by default it contains an immutable representation
of all semantic information, code, binaries, libraries, debug
information etc.
The method Eval
is a short cut for:
The method Load
allows for a number of files to be loaded into a SemanticModel
:
method Compiler/Load(files : Name*)
{
let files = assume(IO/ReadFiles(args))
let trees = map(files, _ => Parser/Read(Lexer/Read(_.SourceText)))
let tree = NodeTree.CreateProjectWith(trees)
return Semantic/Analyze(tree)
}
The methods Compile
, Link
and Run
allow for finer control.
A compiler might look like this:
function main(args : String*)
{
let model = Compiler/Load(files)
let project = Compiler/Compile(model)
let executable = Compiler/Link(project)
match (executable)
{
case Error: abort(_)
case Anything: printf("Compilation success\n")
}
}
A script interpreter might look like this:
function main(args : String*)
{
let model = Compiler/Load(files)
let project = Compiler/Compile(model)
let executable = Compiler/Link(project)
let result = Compiler/Run(executable)
match (result)
{
case Int: return(_)
case Error: abort(_)
case Anything: printf("%O\n", result); exit(0)
}
}
A REPL interpreter might look like this:
function main(args : String*)
{
let input = IO/Stdin
while (!input.EOF)
{
let text = input.ReadLine("Harth> ")
let result = Compiler/Eval(text)
match (result)
{
case Error: _.PrintStackTrace()
case Anything: printf("%O\n", result); exit(0)
}
}
}
The debugger essentially provides a few basic helpers on top of semantic name and source text resolution:
method Debugger/Location(functionName : Name) -> Location
method Debugger/Location(fileName : Name, line : Int, column : Int?) -> Location
method Debugger/Break(loc : Location) -> Breakpoint
The Breakpoint
result can be modified for single shot, read/write, conditional etc.
Additionally the JIT and/or compiler can live insert any compiled
expression (actually a parsed NodeTree
) into code:
method Debugger/LiveEdit(loc : Location, code : NodeTree)
Since all internal data can be converted to human string with
ToString()
and/or printf
, a watch point might be implemented by
LiveEdit
insertion of printf()
for each variable and/or state.
Breakpoints are trivial therefore; they stop execution of all tasks under the current process except for the Debugger/Task
instance which continues running under process control so that all other tasks can be monitored.
Implementations of some other fundamental functions such as
Debugger/Continue()
, Run()
, Step()
and Next()
can be
implemented using semantic information.
The backend binary and object generator uses the latest LLVM to
convert a SemanticModel
into an internal set of linkage/executable
LLVM/Module*
. These are either cached and/or written to disk as
appropriate.
The basic program (executable, script or REPL) is to print “Hello World”:
1function main(_ : String*)
2{
3 printf("Hello World!\n")
4}
5
This code can be made into an executable, shell script or typed interactively at a REPL:
bash$ ls -l /usr/local/bin . -rwxr-xr-x 1 harveyt staff 1184 Nov 26 11:04 /usr/local/bin/harth lrwxr-xr-x 1 harveyt staff 1184 Nov 26 11:04 /usr/local/bin/harth-compiler -> /usr/local/bin/harth lrwxr-xr-x 1 harveyt staff 1184 Nov 26 11:04 /usr/local/bin/hc -> /usr/local/bin/harth -rw-r--r-- 1 harveyt staff 1184 Nov 26 11:04 HelloWorld.ha bash$ hc HelloWorld bash$ ls -l . -rwxr-xr-x 1 harveyt staff 1184 Nov 26 11:04 HelloWorld bash$ ./HelloWorld Hello World!\n bash$
function
denotes that a function name, type and body will appear.
package
and import
are the defaults.main
takes only one argument, the String
passed to the executable and/or script._
means variable value is unamed and never used. Essentially the arguments are ignored.String*
means a sequence of zero or more String, it’s like a
bounded array, but has a abstract type of Sequence<String>
. See Types.main
has no return type written, it is intuit from the block return type. See below.main
must have one of the following types, and must exist and be named at global scope:
(String*) -> Unit
- arguments are passed as required, always exits success (unless abort()
).(String*) -> Int
- arguments are passed as required, always exits success (unless abort()
in whic case exits failure with error message).() -> Unit
- similar to above.() -> Int
- similar to above.BlockNode
) which contains ExpressionNode*
(in this case one expression).printf()
always returns the unit/identity/void value (of type Unit
).A more correct and portable (to Linux, MacOS, Windows Cygwin) script should be in an executable file of any name.
#!/usr/bin/env hs
function main(_ : String*)
{
printf("Hello World!\n")
}
The key addition is the POSIX magic #!/usr/bin/env hs
which runs the
script text through the hs
(aka. harth-script
) JIT script compiler
and execution engine found on $PATH
. The #!
is always treated as a
comment by Harth.
bash$ ls -l /usr/local/bin $HOME/bin -rwxr-xr-x 1 harveyt staff 1184 Nov 26 11:04 /usr/local/bin/harth lrwxr-xr-x 1 harveyt staff 1184 Nov 26 11:04 /usr/local/bin/harth-script -> /usr/local/bin/harth lrwxr-xr-x 1 harveyt staff 1184 Nov 26 11:04 /usr/local/bin/hs -> /usr/local/bin/harth -rwxr-xr-x 1 harveyt staff 1184 Nov 26 11:04 /Users/harveyt/bin/HelloWorld bash$ HelloWorld Hello World! bash$
printf("Hello World!\n")
At an interactive Harth shell (or REPL), one can simply execute any statement, expression or program. So the above will result in:
Harth> printf("Hello World!\n") Hello World!\n Harth>
harth 0.1
package /Harth/Lang/Collection
import /Harth/Lang/Core
/**
* A Sequence is an abstract finite stream of elements which can be counted, accessed by index.
*
* New sequences can be created using Builder, Append(), Prepend() and similar functions.
*
* Operations performance may vary from O(1) to O(n) or more. LinearSequence and IndexedSequence
* provide tighter bounds on performance order.
*
* Typically most classes should internally use concrete types such as List<T> or Array<T>, but
* should return or accept Sequence<T> from methods for maximum usefulness.
*/
@const
class Sequence<out T> extends Stream<T>
{
@abstract
property ReverseIterator : Iterator<T>
@abstract
property Builder : Builder<T>
@abstract
property Size : Int32
@abstract
method Get(i : Int32) -> T%
@abstract @covariant
property First : T%
@abstract @covariant
property Rest : Sequence<T>%
@abstract
property Last : T%
@abstract @covariant
method MakeEmpty() -> Sequence<T>
@abstract @covariant
method Append(item : T) -> Sequence<T>
@abstract @covariant
method AppendAll(items : Iterable<T>) -> Sequence<T>
@abstract @covariant
method Prepend(item : T) -> Sequence<T>
@abstract @covariant
method PrependAll(items : Iterable<T>) -> Sequence<T>
@abstract @covariant
method Reverse() -> Sequence<T>
@abstract @covariant
method Drop(n : Int32) -> Sequence<T>%
}
Some things to note:
@name
.class
can be generic in any number of types, with covariance and contravariance.class
can extend any one other class
(and/or zero or more types
).@abstract
attribute means this class only provides an interface (might turn info a type?)T%
type means essentially “returns either the generic parameter type T
or an Error
.Non-obvious:
{ a; b; }
style block.harth 0.1
package /Harth/Lang/Collection
import /Harth/Lang/Core
/**
* BaseSequence provides basic implementation of Sequence using some concrete SeqT type.
* Sub-classes are LinearSequence and IndexedSequence which provide different performance
* properties for methods; for example Get() and Size() are O(n) and O(1) respectively.
*
* Sub-classes must at minimum must provide:
* - Iterator or ReverseIterator (the opposite has a default implementation)
* - Builder
*
* Default implementations are provided for all other operations, though these may have terrible
* performance up to O(n^2). Sub-classes should really override with better performance
* implementations were possible.
*/
@const
class BaseSequence<out T, out SeqT> extends Sequence<T>
{
@override
property Iterator : Iterator<T>
{
get { return Reverse().ReverseIterator;}
}
@override
property ReverseIterator : Iterator<T>
{
get { return Reverse().Iterator }
}
@override
property Size : Int32
{
get
{
var s = 0
for (_ in self)
s++
return s
}
}
/* ... */
}