 |
Note: This document was made for my own use, and is not fit for any particular purposes.
Despite this, I am claiming the copyright: (c) copyright 2008 by Alexandru Iosup. All rights reserved.
1. Main Idea: A 4D Thesisometer
A Thesisometer is a counter that records the progress of writing the PhD thesis over time.
The counters record any number of the many aspects of thesis writing: the number of words, the number of lines, the number of chapters completed, etc.
The data set that underlies the thesisometer contains all the counted values and their associated timestamps (the time of the measurement, usually the timestamp of the major CVS/SVN commits).
A 2D thesisometer example that counts the total number of words in the thesis is available online here.
A 3D display is often useful in order to display a large number of closely-related data (many 2D graphing purists disagree with this).
If the 3D display presents data that evolve over time, the time presentation becomes 4D (time becomes the 4th dimension).
My idea is to build a 4D display for thesisometer data, which can also be used to export snapshots of the
instantaneous 3D display; I intend to use this latter function to produce a background image for the cover
of my thesis.
For the 3D display, I intend to adapt the CodeCity ideas
of code analysis and visualization to visualizing LaTeX documents. For this,
I will use the city/building metaphor (same as for them), but
the meaning of the vertical structures will be changed to fit
the structure of a LaTeX document.
A sample CodeCity output (produced by its authors) is displayed below:
2. Components
The following components are needed to build the 4D Thesisometer described in Section 1:
- The thesisometer data collection The counters need to be extracted from the thesis snapshots.
I assume that the thesis is structured in chapters and sections; the following process can be
easily extended for parts and (sub-)sub-sections.
For each section in the thesis, the following counters are needed: the number of words, the number of paragraphs,
the number of images, the number of tables, and the number of references to parts of the section from any other
section. The data are obtained as follows.
I assume that each major update of the thesis is checked out from the CVS/SVN and archived into
the ZIP file MyThesis_<YYYY-MM-DD>.zip. The ZIP file includes a main directory (thesis/)
in which the main .tex files reside. The other .tex files are placed in subdirectories of the main
directory. The thesisometer data is obtained from each zip file by scanning its .tex files.
- The visualizer The data need to be displayed in 3D, with an option to go for 4D.
I have developed a Python script that uses OpenGL and
seems to fit these tasks perfectly (note it's still a prototype, and includes quite a lot of hard-coded values and of unused
(or otherwise useless) code). The script needs PyOpenGL, which can be easily installed using the
Python SetupTools (and in particular
ez_setup.py,
also available here). The main
advantage of PyOpen, besides being a Python-based OpenGL implementation, is that it comes with
Python ports of several of the NeHe OpenGL Tutorials.
[Windows users: you will need the GLUTils libraries
(also from here) in
the search path (same directory?) of the vis1.py script.]
The visualizer script also creates vis1.pov,
the Persistence of Vision Raytracer (PoV-Ray) script that can be used to render one frame from the animation displayed by vis1.py.
The current vis1.py needs to be modified to display the whole data-set (right now it
only displays the data for one timestamp, albeit in an animation). Here is a screenshot of the visualizer in action:
- The renderer The vis1.pov
file produced in the previous step needs to be rendered into a raster frame.
PoV-Ray is a free renderer
with distributions for (almost) every major platform, and can fill this task.
I assume that rendering vis1.pov
into vis1.bmp, an 1024x768 BMP using AntiAliasinng (intensity 0.3), which is one of the
around 10 default rendering modes available in the default PoV-Ray interface,
is not a problem. It was not for me:
- The vectorizer The final step is to obtain a vectorized image from the raster one, preferable EPS
(good for LaTeX -- remember when in Section 1 I was talking about using the thesisometer output as thesis cover?).
The excellent CAD and Linux: The LUnIx Linux CAD/3d Utilities Links
page lists many useful tools, from which AutoTrace
(also available here)
fulfills this requirement: here is the EPS file. As a nice additional effect, AutoTrace's output looks rather cartoonish:
3. Sample run
The following sample run sequence was used:
- Produce thesis data
a. thesis_structure_complete.csv (using WinEdt::Document->WordCount on selected pieces of text or an automated tool -- TODO)
b. copy rows to vis1.py or add in vis1.py input capabilities -- TODO
- Produce the PoV-Ray input file
$ cd E:\PhD\Tools\PhDThesisVis\
$ vis1.py
--> vis1.pov
- Produce the raster image
Run PoVRay to produce vis1.bmp (1024x768,AA 0.3) from vis1.pov
--> vis1.bmp
- Convert raster image to vectorial image (EPS format)
a. Copy vis1.bmp to AutoTrace dir
b. Raster -> Vectorial (EPS) using AutoTrace
$ cd E:\PhD\Tools\PhDThesisVis\AutoTrace
$ autotrace.exe --output-format eps --dpi 600 vis1.bmp > vis1.eps
--> vis1.eps
|
 |