\documentclass[a4paper]{article} \usepackage[latin1]{inputenc} \usepackage{hyperref} \date{5. February 2000} \author{Michael~Neumann} \title{Comparing and introducing Ruby} \begin{document} \pagestyle{headings} \maketitle \newpage {\parindent=0cm\thispagestyle{empty} \textcopyright{} Copyright 2000 Michael~Neumann \bigskip The distribution of this document in electonical or printed form is allowed, as long as it's content including the author and copyright notices remain unchanged and the distribution take place for free apart from a fee for the data carrier disk, the copy process etc. \bigskip \vfill If questions occur or you discover errors or if you have improvement suggestions you can write me an email: \texttt{}. } \newpage Informations about Ruby and the Ruby interpreter as well as many libraries are availabe from the official Ruby-Homepage: \href{http://www.ruby-lang.org}{http://www.ruby-lang.org}. \bigskip I'll compare Ruby with Perl and Python, because I think they are the most frequently used and best known ones. Ruby has so much advantages against Perl and Python, that I'll try to mention here as much as possible. At first I'll shortly explain what Ruby is:\\ Ruby is a modern, interpreted and object-orientated programming language. It has many similarities with Smalltalk (''everything is an object'', simple inheritance, metaclass-model, code-blocks, garbage-collector, typeless variables, etc\dots), but takes much of the well formed syntax of Eiffel (or who don't know that great language, it's a little bit like Modula or Ada). Additionally many useful elements from Perl were added (e.\,g. regular expressions, text-processing, text-substitution, iterators, variables like \$\_{} \$/ \dots). Therefore, Ruby is a very good alternative to Perl and Python. The difference between Perl and Ruby is the much easier and better to understand syntax and the easy-to-use \emph{''real''} object-orientation. Following term which I have found on a Ruby-page expresses the power of Ruby: \begin{quotation} Ruby \textgreater{} (Smalltalk + Perl) / 2 \end{quotation} Some time ago I asked the author of Ruby, Yukihiro Matsumoto (aka matz), about the history of Ruby and why he developed a new language. Here is his original answer: \begin{quotation} {\it\parindent=0cm ''Well, Ruby was born in Feb. 23 1993. At that day, I was talking with my colleague about the possibility of object-oriented scripting language. I knew Perl (Perl4, not Perl5), but I didn't like it really, because it had smell of toy language (it still has). The object-oriented scripting language seemed very promising. \medskip I knew Python then. But I didn't like it, because I didn't think it was a true object-oriented language. OO features are appeared to be add-on to the language. I, as a language mania and OO fan for 15 years, really really wanted a genuine object-oriented, easy-to-use object-oriented scripting language. I looked for, but couldn't find one. \medskip So, I decided to make it. It took several months to make the interpreter run. I put it the features I love to have in my language, such as iterators, exception handling, garbage collection. \medskip Then, I reorganized the features in Perl into class library, and implemented them. I posted Ruby 0.95 to the Japanese domestic newsgroups in Dec. 1995. \medskip Since then, mail lists are established, web pages are formed. Highly active discussion was held in the mail lists. The oldest list ruby-list has 14789 messages until now. \medskip Ruby 1.0 was released in Dec. 1996, 1.1 in Aug. 1997, 1.2 (stable version) and 1.3 (development version) were released in Dec. 1998. \medskip Next stable version 1.4 will be shipped this months (June 1999), hopefully.'' } \end{quotation} As you can see, Ruby was developed, having Perl, Python, Smalltalk and Eiffel (as well as some other languages) in mind. So \emph{matz} took the best from the above called languages to make a new, better, object-orientated scripting-language. Unlike Perl and Python Ruby was designed totally object-orientated right from the beginning. So there's no clumsy syntax for declaring a class like in Perl. That is why many people, myself included, say that Perl isn't really object-oriented. I agree with Stroustrup, the developer of C++, who once said that a special programming-style (e.\,g. OOP) is only sufficient supported if the language makes it easy to use this one. And I do not think that Perl supports sufficient enough the use of the object-oriented paradigm. A big advantage of Ruby is, that it is very easy to learn, and so could perhaps become a language to introduce people into programming or object-orientation (maybe at school instead of the often used language Pascal). It took me only one day to get into Ruby and after some weeks I was nearly an expert! For learning Python it took me a little bit longer, but for learning Perl you normally need months and to become an expert even years. The syntax of Ruby is IMHO so easy that even non-Rubyists can read and understand most of the sourcecode, if it is written in clean Ruby and not Perl-like. A good tutorial is very important when starting to learn a new language. Therefore big thanks to \emph{matz} and the translators of \emph{Ruby User's Guide}\footnote{ \tt \href{http://www.math.sci.hokudai.ac.jp/\%7Egotoken/ruby/ruby-uguide}{http://www.math.sci.hokudai.ac.jp/\%7Egotoken/ruby/ruby-uguide} } , which is a short but almost everything covering introduction into Ruby. Also the \emph{Ruby Language Reference Manual}\footnote{ \tt \href{http://hydrogen.ruby-lang.org/en/man-1.4}{http://hydrogen.ruby-lang.org/en/man-1.4} } is very good. Looking at the tutorials of Python and Perl I have noticed following: either they are very long or they do not cover all aspects (especially Perl). As already mentioned above, the syntax of Ruby is very easy, i.\,e. it is very clean, readable but also short. The code nearly documents itself, like in Eiffel and unlike Perl, which is the opposite. Clean, understandable and short code increases productivity, because it increases the speed of coding, reduces the need of documenting, is less error-prone and therefore it is easier to maintain. In Perl, very much time is wasted in finding errors or to document code. Even some errors are first detected after some time. In Ruby you have the possibility to make your code much more readable by inserting additionally keywords or by using words instead of operators. Now a small example of doing one and the same task in different styles in Ruby: \begin{verbatim} # short form (1..10).each { |i| print "#{i}\n" if i % 2 == 0 && i > 5 } # the same more readable (1..10).each do |i| if i % 2 == 0 and i > 5 print i, "\n" end end # the same more readable, with syntax-sugar for (1..10).each for i in 1..10 do if i % 2 == 0 and i > 5 then print i, "\n" end end \end{verbatim} In Perl you would write: \begin{verbatim} for (1..10) { if($_ % 2 == 0 && $_ > 5) { print "$_\n"; } } \end{verbatim} And in Python: \begin{verbatim} for i in range(1,11): if(i % 2 == 0 and i > 5): print i,"\n" \end{verbatim} The {\tt ..} operator in Ruby creates an {\tt Range}-object. You can take every object which is comparable with the {\tt <=>} operator and which have the {\tt succ}-method. So you can also iterate over a string-range (e.\,g. {\tt "a".."ab"}). In Python you need the function {\tt range} to create an array over which you can then iterate. In Ruby you can choose between the keywords {\tt and}, {\tt or}, {\tt not} and the operators {\tt \&\&}, {\tt \textbar\textbar}, {\tt !}. The keyword {\tt then} is optional and you can choose between {\tt \{}\dots{\tt \}} and {\tt do} \dots{} {\tt end}. Semicolons are only necessary if you want to write more than one statement into one line. This is like Python but unlike Perl, where you have to end every statement with a semicolon, which has only disadvantages, e.\,g. it result in more errors, because you often forget them, and make code less readable. Another disadvantage of Perl is, that it differentiates between scalar (e.\,g. string, integer, reference) and non-scalar (e.\,g. array, hash) variables which makes programming much more difficult. A problem appears, when you want to create an array which itself contains arrays, because arrays or hashs in Perl can only contain scalar values. The solution is to put the references of the arrays into the array. But that's not easy and you can make many faults. Not so in Ruby and Python, where variables only contain references to objects. \medskip An advantage of Ruby is, that all constructs have a value. For example an {\tt if}-construct returns a value, so you can use {\tt if} also on the right side of an expression. The following example shows this: \begin{verbatim} txt = "Hello World" a = "size " + if txt.size > 3 then "greater" else "less" end + " than 3" print a # prints "size greater than 3" \end{verbatim} \medskip Functions in Perl do not automatically introduce a new scope, so if you use a variable, which was already declared outside the function, it will be overwritten. You need 'my' or 'local' to declare local variables. Ruby let you easily create constants (begins with a capital letter), local variables (begins with a small letter), global variables (begins with a \$) and instance variables (begins with a @). \bigskip But the most important aspect, why I am using Ruby instead of Python or Perl are the object-orientated features of Ruby, and that Ruby was designed object-oriented right from beginning, unlike Python and Perl where object-orientation was added on later. You can recognize this in e.\,g. in Python very good, because the first parameter (often named {\tt self}) of every method of a class is the object on which the method is called: \begin{verbatim} class A: def method_a (self): # do what you want \end{verbatim} \medskip The syntax of declaring a class in Ruby couldn't be easier: \begin{verbatim} class X ... end \end{verbatim} Now you see how a class in Perl is declared: \begin{verbatim} sub new { my ($class,@args) = @_; bless({@args}, $class); } \end{verbatim} You see, Ruby is much more intuitive. Now an example of declaring a {\tt Point}-class in Ruby: \begin{verbatim} class Point # initialize is called implicit when Point.new is called def initialize (x, y) @x = x # @x is an instance variable @y = y # @y is an instance variable end # returns x (because instance variables are only # visible inside the class) def x @x # the same as return @x end # the same like 'def x' def y; @y end # the setter-function def x= (x) @x = x end def y= (y) @y = y end end \end{verbatim} To make the code shorter and better to read, the same class using the {\tt attr\_accessor}-function of module {\tt Module}, which dynamically creates a getter- and setter-method for each parameter is shown: \begin{verbatim} class Point attr_accessor :x, :y def initialize (x, y) @x = x @y = y end end \end{verbatim} There are also some other useful methods, like {\tt attr\_reader}, {\tt attr\_writer} as well as {\tt attr}. Ruby's instance variables are only accessible from inside the class, you cannot change this. This is \emph{advanced} object-orientation. You'll see this in Eiffel as well as in Java but not in Python, where you can access the variables from outside the class. For example JavaBeans use method-names like \emph{get\_varname} and \emph{set\_varname} to access instance variables (where varname is the name of the variable). These are also called attributes. But Ruby has the ability to access the instance-variables through methods as if they were directly assigned. Here an example of using the {\tt Point}-class: \begin{verbatim} a = Point.new (1,6) # create object "a" of class Point a.x = 5 # calls the method x= with the paramter 5 print a.x,"\n" # prints 5 print a.y,"\n" # prints 6 \end{verbatim} There's no direct access to instance variables. But sometimes you want to use a class the same way as in Python, where you can assign instance variables from outside without declaration. A short example in Python: \begin{verbatim} # empty class class A: pass x = A() # create object "x" of class A x.a = 3 # create new instance variable print x.a # prints 3 \end{verbatim} The same behavior can be reached in Ruby by using the class {\tt OpenStruct} defined in file \emph{ostruct.rb}. Now the same example like above in Ruby: \begin{verbatim} require 'ostruct' x = OpenStruct.new # create object "x" of class OpenStruct x.a = 3 print x.a # prints 3 \end{verbatim} There is almost no difference between the two example. But in Python you have to declare an own class! This behavior is very easy to impemented in Ruby, because Ruby calls the method {\tt method\_missing} for every unknow method (in this case this would be the method {\tt a=}. So you can then dynamically create the method {\tt a} which returns the value 3. \medskip Now we'll extend the {\tt Point}-class of an equality-operator. You do not need to insert the ''==''-method into the above written {\tt Point}-class, you can also extend the existing {\tt Point}-class in adding a whole {\tt Point}-class (not recommended in this case). The whole {\tt Point}-class could now look like: \begin{verbatim} class Point attr_accessor :x, :y def initialize (x, y) @x = x @y = y end end class Point def == (aPoint) aPoint.x == x and aPoint.y == y end end \end{verbatim} or better like this: \begin{verbatim} class Point attr_accessor :x, :y def initialize (x, y) @x = x @y = y end def == (aPoint) aPoint.x == x and aPoint.y == y end end \end{verbatim} As you can see, Ruby is very easy and clean. But there are more features. In Ruby there are some conventions which you should not break. Method names which ends with: \begin{itemize} \item {\tt=} should be used as \emph{setter} of instance-variables \item {\tt?} should return a boolean (e.\,g. {\tt has\_key?} of class {\tt Hash}) \item {\tt!} signalize that data inside the object is changed and not the values which is returned (e.\,g. {\tt downcase!} which directly changes the objects value and {\tt downcase} which returns the downcased value) \end{itemize} \bigskip Now we'll extend the given {\tt Array}-class which comes with Ruby for a missing method {\tt count(val)}, which counts the occurence of \emph{val}. We do not need to change any given sourcecode or inherit a given class: \begin{verbatim} class Array def count (val) count = 0 # each iterates over every item and executes the block # between 'do' and 'end'. # the actual element of the iteration is stored into 'i' each do |i| if i == val then count += 1 end end count # returns 'count' end end # now every declared Array has the method count(val) print [1, 5, 3, 5, 5].count (5) # prints 3 \end{verbatim} Sometimes it could happen, that you do not want to construct e.\,g. a {\tt Point}-object via {\tt Point.new} but via {\tt Point.new\_cartesian}. Ruby has not only classes but also meta-classes, like Smalltalk, i.\,e. the class-definition is available during runtime and could also be changed. In Ruby \emph{every} class is an object constructed from the class {\tt Class}. There are \emph{instance methods} and \emph{class methods}. \emph{Instance methods} do not exist without objects. \emph{Class methods} do exist without objects, they exist as far as a class is created. \emph{Class methods} are called on classes (e.\,g. {\tt Point.new}), \emph{instance methods} on objects or instances (e.\,g. {\tt "hallo".length}). Now we will create a \emph{class methods} {\tt new\_cartesion} for the class {\tt Point}: \begin{verbatim} class Point def new_cartesian (x, y) aPoint = new(x,y) # here you can do what you want return aPoint end # makes new_cartesian a class method module_function :new_cartesian end # now you can instatiate a Point-object with new_cartesian: a = Point.new_cartesian(1, 43) \end{verbatim} The same can also be done this way: \begin{verbatim} class Point def Point.new_cartesian (x, y) aPoint = new(x,y) # here you can do what you want return aPoint end end \end{verbatim} Now about iterators, they can be declared very easy: \begin{verbatim} # iterates n-times over the given block def times (n) while n > 0 do yield n n -= 1 end end times(5) {|i| print i," " } \end{verbatim} Prints {\tt 5 4 3 2 1} onto the screen. Here's a more advanced example of using iterators: \begin{verbatim} include FileTest FILE, DIRECTORY, DIRECTORY_UP = 0..2 PATH_SEP = "/" # # depth: -1 = recurse all # yield: path, name, type # def scan_dir(path, depth=-1) # remove PATH_SEP at the end if present if path[-1].chr == PATH_SEP then path = path.chop end Dir.foreach (path) do |i| next if i =~ /^\.\.?$/ if directory? (path+PATH_SEP+i) then yield path, i, DIRECTORY scan_dir (path+PATH_SEP+i, depth-1) do |a,b,c| yield a,b,c end unless depth==0 yield path, i, DIRECTORY_UP elsif file? (path+PATH_SEP+i) yield path, i, FILE end end end # prints all files in directory /home and subdirectories scan_dir ("/home",-1) { |path, name, type| print path, PATH_SEP, name, "\n" if type==FILE } \end{verbatim} The iterator {\tt scan\_dir} iterates over all files and subdirectories and calls the given block with path, filename and type as parameter. Using blocks with methods, you can program very flexible, because you can extend the method from outside. Here an example of measuring the time to execute of a piece of code: \begin{verbatim} def Time.measure start = Time.times.utime yield Time.times.utime - start end # measures the time used by the loop between { and } print Time.measure { for i in 1..100 do a = 10 end } \end{verbatim} Or counting the number of lines in a file: \begin{verbatim} def get_num_lines (file) # iterates over every line of "file" IO.foreach(file){} # $. returns the number of lines read since # last explicit call of "close" $. end print get_num_lines ("/home/michael/htdocs/index.html") \end{verbatim} {\tt IO.foreach(path)} is a short form for: \begin{verbatim} port = open(path) begin port.each_line { ... } ensure port.close end \end{verbatim} So in any case the file will be closed. You need not explicitly open the file or close it. Another important use of blocks is for synchronizing threads: \begin{verbatim} require "thread" m = Mutex.new a = Thread.start { while true do sleep 1 m.synchronize do print "a\n" end end } b = Thread.start { while true do sleep 2 m.synchronize do print "b\n" end end } sleep 10 a.exit # kill thread b.exit # kill thread \end{verbatim} The {\tt Mutex}-object makes sure that only one thread can call simultaneous it's method {\tt synchronize}. In Java e.\,g. to reach the same effect, the new keyword \emph{synchronize} was introduced (in comparison to C++), but in Ruby this is done with a block-construct, so you're much more flexible, because you can extend or change the semantics. Socket are also very easy to program. Following code will connect to a Whois-server and get information about a domain: \begin{verbatim} require 'socket' def raw_whois (send_string, host) s = TCPsocket.open(host, 43) begin s.write(send_string+"\n") return s.readlines.to_s ensure s.close end end print raw_whois("page-store.de", "whois.ripe.net") \end{verbatim} There is also a {\tt TCPserver}-class in Ruby, which makes it much easier to build a server. Following a multi-threaded echo-server is shown: \begin{verbatim} require 'socket' require 'thread' server = TCPserver.open(5050) # build server on port 5050 while true do # loop endless (until Ctrl-C) new_sock = server.accept # wait for new connection print new_sock, " accepted\n" Thread.start do # new thread for connection sock = new_sock sock.each_line do |ln| # read line from socket sock.print ln # put line back to socket end sock.close # close connection print sock, " closed\n" end end \end{verbatim} The exception-model of Ruby is very close to the one of Eiffel, where you have pre- and post-conditions. In Ruby you have only post-conditions (ensure)! Now a presentation of the possibilities the exception-model of Ruby gives you: \begin{verbatim} begin # do anything... # raise an exception of type RuntimeError raise "Error occured" # raise an exception of user-defined class raise MyError.new(1,"Error-Text") rescue # is called when an exception occures, you can # access the exception-object through $!, the error-message # is accessible through $!.message and the file and # line-number where the exception occured is stored in $@ # solve problem...and retry the whole block retry # or re-raise exception raise # or raise new exception raise "error..." ensure # is always called before the block # surrounded by "begin" and "end" is left end \end{verbatim} Regular Expressions are used like in Perl. To extract the top-level domain from a domain-name you can define following method: \begin{verbatim} def extract_tld (domain) domain =~ /\.([^\.]+)$/ $1 end print extract_tld ("www.coding-zone.de") # prints "de" \end{verbatim} Most of the features of Perl's regular expressions are also available in Ruby. Database-access is also available in Ruby, but not all databases are yet supported. Currently only MySQL, Msql, PostgreSQL, Interbase and Oracle are available. A generalized database-access standard like ODBC, JDBC or DBD/DBI (Perl) would be very nice. Following code-example shows, how to print out a whole database-table with MySQL: \begin{verbatim} require 'mysql' m = Mysql.new(host, user, passwd, db) # creates new Mysql-object res = m.query("select * from table") # query the database # gets all fieldnames of the query fields = res.fetch_fields.filter {|f| f.name} puts fields.join("\t") # prints out all fieldnames # each row is printed res.each do |row| # row is an array of the columns puts row.join("\t") end \end{verbatim} In Ruby, you can very easiliy implement dynamic argument-type checking. Following code implements this behavior: \begin{verbatim} class Object def must( *args ) args.each do |c| if c === self then return self end end raise TypeError, "wrong arg type \"#{type}\" for required #{args.join('/')}" end end # this method requires an Integer as argument def print_integer( i ) i.must Integer print i end # you can also allow more than one type # "name2s" returns an String, but takes a String or an Integer def name2s( arg ) arg.must String, Integer case arg when String then arg when Integer then arg.id2name end end # :Hello is an Integer representing the string "Hello" print name2s(:Hello) # prints "Hello" print name2s(" World") # prints " World" print name2s([1,2,3]) # raises TypeError \end{verbatim} \bigskip Writing C/C++ extension for Ruby is very easy. You can do all in C what is possible in Ruby. It is also possible to use the SWIG interface-generator for Ruby, but using the ruby-functions directly is very easy. Following a C-program which declares a module and one method: \begin{verbatim} // filename: str_func.c #include "ruby.h" #include // for malloc // // Function will add "add" to each character of string "str" // and return the result. Function do not change "str"! // "obj" is assigned the class/module-instance on which method is called // extern "C" VALUE add_string( VALUE obj, VALUE str, VALUE add ) { int len, addval, i; char *p, *sptr; VALUE retval; // checks parameter-types // if type is wrong it raises an exception Check_Type(str, T_STRING); Check_Type(add, T_FIXNUM); // length of string "str" len = RSTRING(str)->len; // convert FixNum to C-integer addval = FIX2INT(add); // alloc temorarily memory for new string p = (char*) malloc(len+1); // raise an exception if not enough memory available if( !p ) rb_raise(rb_eRuntimeError, "couldn't alloc enough memory for string"); // get pointer to string-data sptr = RSTRING(str)->ptr; // iterate over each character, and add "addval" to it for(i=0; i