Formal Grammars can be a powerful modeling tool for a large range of
scientific fields. They can be used in different areas for understanding the structure of the systems' functionality they are applied to. Learning grammars from a collection of data that have arisen from such systems' behavior can help us make decisions about their evolution and in extension, to have a more sophisticated knowledge of their nature. The usefulness of such a formalism is a result of its potential applicability to further multidisciplinary research. The necessity for the use of such a methodology is growing proportionally with the exponential growth of the data that come up in those systems. For example, formal grammars can be applied to model end-users navigation on the web, which is a vast collection of behaviors, or to standardize biological data (for example DNA structures). The numbers used in those circumstances are very large and the need to build models for understanding the normality subsumed by their function is obvious. Formal grammars can also be used in musical composition, in a sense that they are a collection of descriptive rules for analyzing or generating sequences of symbols which represent musical parameters, such as notes and their attributes. By using a specific set of rules, we can obtain musical sequences that can be either represented by finite state automata. Inverting the process described above, the arisen goal is to obtain a formal grammar from a given set of musical pieces. To be more precise, given a set of improvisations - meaning musical streams that are not written in a musical scores but recorded during the spontaneous execution of a piece as the musician plays his variations on its main themes, we can built a structure that gives us the opportunity to understand them better. A formal grammar (or a finite state automaton) built on such large collections can model the behavior of the musicians involved in them (a fact that can be related to their cultural environment and other ethnomusicological factors) or determine classification schemes on them. A common policy for providing classification techniques is to consider whole musical passages as specific instances, having a set of enumerated attributes, which is a classical approach in machine learning methodologies. Those attributes can be related to various musicological characteristics, such as tempo, timbre, harmonic density, polyphony etc. As a result, a musical passage can be represented by a multidimensional vector, each of its number representing the value of the equivalent attribute in the enumeration. Using metrics such as the euclidean distance, we can obtain a "visual" and quantitative representation of their resemblance. The classification methods can either be supervised (by having a training set to built the model and then test the algorithm using an evaluation set of pieces), or unsupervised, using clustering techniques, so as to let the algorithm have the opportunity to provide its own scheme by the "raw" material used. By obtaining those clusters we can have a perspective view of the input and make decisions on the special characteristics of those pieces. Instead of using written musical passages as input, we can consider a set of improvisations that can be transformed into vectors-instances. Modeling improvising techniques help us to understand better the idioms, the richness of the musical genres and the idiosyncrasy of the musicians involved in them. Such differences and resemblances between several improvisations can yield from a comprehensive research and can be found in various musical genres, locations or eras. The above can be a start point for further and large anthropological study. The classification schemes provided by the techniques mentioned above give us an image of the instance space that is somehow static, in a sense that we do not have a relation between the shape of the clusters as they could possibly be before and after the process of the modeling. As useless as this may seem to be in standard pieces written down in musical scores, it is crucial in cases of improvisation because of their evolutionary and dynamically transforming nature. A lazy approach could be a series of classifications at several time periods, probably using adaptive learning techniques to reduce the search space. Nevertheless, this contributes to a growth of complexity beyond the bounds of computation if we want to observe improvisation schemes during a large interval. A solution to the above problem is to learn formal grammars from these improvisational material. Given sets of those improvisations, we can deduce specific rules than imply a kind of normality in those musical corpora that can be applied to explain future improvisational skills. We thus gain an evolutionary model of the structure of the improvisational process, which helps us to understand its nature and development along time and space. The greatest drawback of this approach is the fact that the process of improvisation – most of the times one could say (?) - does not conform to certain rules, but is mainly characterized by the feeling of spontaneity or uncertainty. This is the recipe for a live, vivid and inspired improvisation. Nevertheless, it is certain that this “window of freedom” of those solos is bounded by the harmonic and rhythmic structure of the main piece. A musical performer, as heuristic and inspiring as he may be, he must first conform to the main structure and then expand it at will. This “expanding bias” is his personal musical signature on the piece. A second thought that gives an answer to the above questions and makes the study more literate and sophisticated, is that we can somehow measure the degrees of freedom of the improvisational process by the use of entropy terms. To be more precise, imagine a small piece written in the C major scale (Ionian mode), in a sense that most of the notes played in that piece belong to that scale. If a musician plays the C note on his instrument, then the information gained by that musical event is near zero. If we suppose that after a while, during his solo, he plays the B flat note, then the entropy increases, as the Mixolydian mode is implied. In the meantime he can play all the notes from the C major scale. Given that history of notes, if he now plays the E flat note, then, combined to the previous B flat note, it implies the Dorian mode and the information gained is altered. Analogous alterations in normality will happen if he suddenly makes - let us say - a three octave leap and plays the next note at a very high frequency range. The simple examples above can help us understand that the tendency that one musician has for applying novel ideas in his improvisation results in equivalent entropy fluctuations that can be measured and embedded in the general study, reducing combinatorial explosion and similar chaotic phenomena. In conclusion, formal grammars, enriched with entropy measures can provide the X-ray of the backbone of the improvisational process. It can be a powerful tool that help us understand the implying characteristics of those spontaneous sequences macroscopically in space, i.e from human to human and time.