Thoughts on Programming Languages for Scientific Computing

7 #

I have a friend who works in chemistry / protein folding and thus does a lot of computational stuff within that field. Coming from an application/web programming background, I was utterly shocked when she mentioned that FORTRAN is widely used in computational simulations. The great programmer Nietzsche once declared that “FORTRAN is dead”, and I held that belief too, thinking that FORTRAN was only used in the days when C and C++ weren’t developed and programmers didn’t have anything else. I didn’t know a single person who uses FORTRAN.

But I was wrong. It turns out that FORTRAN is still actively used in scientific computing. In addition, it is not an “outdated language”. FORTRAN actually has quite a few versions which have evolved and added on modern programming languages features over the years.

The biggest draw of FORTRAN is speed. It’s one of the lowest level out of all the high level programming languages (like C, C++, Pascal, Java, etc.). The language, intended for scientific computing, was designed to be fast. In comparison, languages like C and C++ were designed as general purpose languages that could be used for a variety of purposes.

But I wasn’t totally sold on FORTRAN. I don’t like the syntax and rather code in C++ any day. I didn’t believe that C++ was all that much slower than FORTRAN so I decided to do some research into this matter. The following comes from an email I wrote:

  1. FORTRAN is widely cited to be “20% to a factor of ten” faster than C++. I didn’t find anything for C, but I think the speed of C is about the same as C++.
  2. However, the data in #1 was published in 1997. In the modern day, C and C++ compilers have been heavily optimized to produce speeds comparable to FORTRAN. Some modern benchmark results are here:

    (These two benchmarks show FORTRAN to be about the same as C/C++
    depending on how the code is written and compiled)
    http://dan.corlan.net/bench.html (maybe around the year 2000)
    http://dan.corlan.net/amd64_dual_core_benchmarks.html (newer version)

    (This benchmark shows FORTRAN to be much slower than C/C++. Java is
    even faster!)
    http://shootout.alioth.debian.org/gp4/benchmark.php?test=all&lang=all

    (This benchmark shows C++ being slightly faster than FORTRAN)
    http://pauloherrera.blogspot.com/2006/12/introduction-ive-read-lot-of-things.html

  3. In addition, there is a library for C++, called Blitz++, which allows C++ programs using the library to approach the speed of FORTRAN. Benchmarks are here:
    http://www.oonumerics.org/blitz/benchmarks/
    (Note the percentages, ie. 95.7%, is the speed relative to the
    comparable FORTRAN program. So 100% is the same speed.)

    Also note that the benchmarks for Blitz++ were completed in like the year 2000. I expect if the same benchmarks were run today with modern compilers, the results would be even better.

  4. Therefore, I conclude that C/C++ is comparable to the speed of FORTRAN and depending on what you are doing and how you write the code can be faster than FORTRAN. So it depends on your preference of the programming language. I would probably always pick C++ over FORTRAN since I am more familiar with C++ and I think C++ code is easier to write and read.
  5. Surprisingly, python *can* be really fast. See:
    http://scipy.org/PerformancePython
    Of course, it will never approach the speed of C++ or FORTRAN, but Python code can be very easy to write and for medium sized projects, you can write the code much faster.
  6. Surprisingly, Java *can* be faster than C/C++. This is largely attributed to compiler optimizations that Java can do that C/C++ can’t do. See:

    http://www.idiom.com/~zilla/Computer/javaCbenchmark.html

    http://www.kano.net/javabench/ (see links at bottom of the page for more java vs c++ benchmarks)
    http://www.kano.net/javabench/data (good graphs)

    In my opinion, Java code is easier to write than C++ code. You might also get advantages of portability since Java code can pretty much run on any computer unmodified.

  7. Unsurprisingly, Matlab is slow. Even when Matlab code is optimized, it is at least 2 times as slow as comparable C/C++ code. Unoptimized, it can be 30 times or more slower! Of course, the good thing about Matlab is that you can easily write programs that perform math — much faster than with C/C++, Java, Python, or FORTRAN. So Matlab might be good to test an idea or implement something on a small scale.

Overall conclusion: C++ speed is about the same or only a little bit slower than FORTRAN. As C++ compilers become more optimized, the difference decreases. Java seems pretty fast, so I would probably do more research on it. If it *really* comparable to C/C++ speeds, I would probably use Java. For medium scale applications, I would use Python. For small scale applications, I would use Python or Matlab.

Project Update and Computer Irony

2 #

Eugene asked me today via email how things were going. I figured I should post the response here too so that other people can know what’s up too:

It’s going alright. Low efficiency though since I’m constantly distracted by stuff. Also, since I want to do it the “right way” sometimes it’s a lot slower (as opposed to building it in Rails at the loss of some extensibility and compatibility since I’m trying to make it so that other people can run it on their own server). If it was just an app running on one server with no one else using it, then rails would probably be the best way.

I’m hoping to get a really basic version released (nothing really special) next week so other people can play around with it too. At the same time, I’m going to try to set up the commercial side of it (but not release it to the public yet) so that I can try focusing on security and speed.

But in all, it’s more difficult than I imagined. Most of the time is spent debating what’s the best way to do it and by looking at how other scripts do it.

There are also some setbacks….like these past days, windows suddenly began corrupting my files after each reboot or hibernation for no good reason. Actually, I think it’s this program called RollBack Rx which is supposed to be a data recovery software that runs in the background (yeah, what irony). So I went to uninstall the program, but after I uninstalled it, the program took with it a chunk of core windows files which rendered windows useless.

So I used a Feisty live cd to mount the C drive over samba and used my other computer to copy over files from the windows cd. Unfortunately, that didn’t help at all.

So I went to reinstall windows. But I discovered that my DVD/CD drive suddenly couldn’t read CD-Rs (I’m working with Toshiba to get it exchanged although they want me to send my whole laptop in). My two windows CDs were both CD-Rs (legit versions, might I add).

Finally, I decided to install Feisty since it was a hard-pressed CD (I got it from the Ship-It service) and my CD/DVD drive could read that. Since I don’t use tablet functionality a lot when I’m at home, I figured I could live with Ubuntu for a while until I get my DVD/CD drive fixed.

It’s surprisingly good. It actually set up my digitizer pen out of the box (although right click doesn’t work). Wireless also works out of the box. Video and audio are good (although I had to tweak video a bit). It even tells me that the battery life on the logitech wireless mice is only at 14%!

Well, in truth, it’s not like I didn’t know that Feisty was good. Three of Avery’s computers run Ubuntu so I had quite some experience with it. But compared to two years ago when I tried out Ubuntu, Feisty does very good hardware detection. So I’ll probably be using Ubuntu for a while until I can get windows back on (mainly for MS OneNote).

Obligatory ‘It’s so close..’ Post

2 #

Every time I’m about to take a bunch of finals, my brain somehow wants to write a blog post instead of actually studying for them. I usually write something like: “I leave for home in a week, but I have to get through all of these finals first. The end is so close, yet so far!”

Well, yeah, I just wrote it. This week: 4 finals (1 in class) + wrapping up research. Luckily, I spent last weekend writing an essay due this week so I don’t have that extra assignment hanging over my head.

Every week has been so busy that I never even mentioned that I was doing research in the Goddard Group which has a focus on computational chemistry and material science. The project that I am working on (and currently wrapping up) is simulating Ruthenium based polypyridine dyes on a Titanium Dioxide surface using the ReaxFF force field (which was developed by one of my mentors here at Caltech). I really enjoy the computational side of it since that means I don’t have to be in lab running experiments. I can also see atoms and molecules and how they interact :). Also, the project directly relates to creating better solar cells, an area that I am very interested in.

I decided to eschew a potential SURF (summer research) this summer for the ability to focus 100% on creating an internet startup. Although I wanted to do both chemistry research and work on my startup, I knew that splitting my focus will lead to neither project being very successful. My startup idea is to develop a new type of wiki while providing services around it. I’m really excited about working on this project so it is kind of difficult to concentrate on my finals when I have this idea lurking in the back of my mind all the time. Summer is going to be awesome :).

Tags:

Created Wikipedia Article: N-Version Programming

1 #

I’ve been trying to contribute to Wikipedia more often these days, particularly in the sciences since I’m supposed to be knowledgeable in that field. Today, when I was writing a paper on electronic voting systems for a humanities class (PS 12), I stumbled upon a paper about a e-voting system that used N-Version Programming (NVP). Now, what is NVP? You ask. That’s a good question. I went over to Wikipedia to learn about it, but Wikipedia didn’t have an article on N-Version Programming! No way!

Well, I found out more about NVP through reading a few papers and decided to contribute back to Wikipedia by creating the article: N-Version Programming. That’s right, everything on that page right now should be my work (check the history to see if other people have modified it). It took me over about an hour to write the article and to reference the heck out of it. But overall, I’m glad I finally created an article in Wikipedia.

Most of the time, however, I’m making small edits here and there. You can view my list of contributions here.

Edit: Maybe in my paper, I should cite the wikipedia article and see if the grader thinks I’m plagiarizing off of the article :).

Tags: |

Analyzing the Effectiveness of Brita® Water Filters

6 #

This was my final paper for my independent research project in a laboratory class, Ch 15, at Caltech. I’m currently in finals week so I have about 4 finals to take and 1 more assignment to finish.

Abstract:

Brita(R) water filters are popular personal and home water filtration systems that claim to reduce, among many impurities, the concentration of copper in tap water by at least 74% and chlorine by at least 94%. These claims and the performance of the filters were tested by filtering residential tap water and then collecting filtrated samples at various points along the filtration process up to 40 gallons (Brita(R) ’s claim of the filter’s lifespan). By using mercury electrode differential pulse cyclic voltammetry and uv-vis spectrometry (on DPD colorimetric reaction), the concentrations of copper and chlorine, respectively, were determined in unfiltrated and filtrated tap water. The results showed that there was an average of 36.10% +/- 2.43% reduction of copper and 72.49% +/- 0.03% reduction of chlorine throughout the 151 L (40 gallons) filtering of the Brita(R) water filter. There was also a detectable difference (5.48% change for copper and 17.12% change for chlorine), albeit small, between new filters and filters due to be replaced after 151 L. Under real world usage of the Brita(R) water filter (and not in prepared laboratory samples), Brita(R) ’s claims about percent reduction are overestimated by ~43% for copper and 13% for chlorine.

Full Paper: Analyzing the Effectiveness of Brita® Water Filters (PDF, 565 KB).

Tags: |

My SURF Presentation in Video Form

0 #

I’m still alive…just in finals week at Caltech. I managed to obtain a digital copy of my SURF presentation from last month thanks to Eric Tai so I uploaded it to Google Video:
Enhancing conductivities in (porphyinato)metal based materials with alkyl-ethylene glyco substituents, SURF 2006 (Caltech) Presentation

Tags: |

Chemistry Research at UPenn

0 #

I’m somewhat 6-7 weeks into my SURF (summer undergraduate research fellowship) at the University of Pennsylvania (although I got the SURF from Caltech) so I figured I should post something.

The project I’m working on is slightly different than from my project proposal that I posted a few months ago. Essentially, I’m more focused on making porphyrin for conducting polymers now instead of for supercapacitors. In short, a conductive polymer is like a wire but made with organic elements (like carbon, oxygen, nitrogen atoms) instead of with metals (like copper, silver, iron, although a little bit of metal can be added). Conductive polymers are cool because they can be used to create molecular electronics (imagine wire being constructed piece by piece on a computer chip), organic electronics (such as OLEDs), supercapacitors, and generally cheaper electronics.

However, a problem that they suffer from is that their conductivities are not very high. For instance, the electrical conductivity of copper is 59.6*10^6 S-m^-1 whereas polyacetylene, a conducting polymer, has a conductivity of around 10^-7 S-m^-1 which is like 18 orders of magnitude smaller. However, a better comparison would be with silicon (which conducting polymers have a good chance of replacing) which has a conductivity of around 10^-5 S-m^-1 (I’m not too sure on this number so don’t quote me on this).

Porphyrin can be made into conductive polymers (although we are working with oligmers, really short polymers with only a few units length), and recent research (by my mentor Paul and the group) has shown that it has the potential to have conductivities at or even better than amorpous silicon!

My project consists of working with a certain class of porphyrin and synthesizing a new variant of them. This new variant will have a flatter chemical structure that will allow the porphyrin oligmers to pack more closely to each other. When they are closer to each other, then they can interact with each other better (pi-pi bonding is increased) and pass electrons (and holes) more effectively.

Well, at least that’s what my mentor and I think so far. I’m still in the process of making the porphyrin. The synthesis process is proving to be really tedious since it has around 12-15 steps which, after each step, requires one or more purification processes (mostly column chromatography). In total, I have over 24-28 steps to complete. Each step takes around a whole day so this is pretty much why I’m still in the synthesis process.

The yield is also very low (around 15-20%). The amount of material I’m currently working with is about 1 gram (I started with around 4.5g). Probably by the end of the all the steps, I’ll end up with maybe 0.2 grams of porphyrin. But those 0.2 grams are the result of over 6 weeks of labor and hundreds of dollars of materials. This is partially the reason why my mentor in recent days keeps reminding me to guard these samples (not fully synthesized yet though, but very close) with my life. Although he’s kind of joking, pretty much, if I screw up (which I’m very prone to do), I should just like give up and go into computer science or something.

The idea of manipulating matter at will (aka. Chemistry) is tremendously awesome, but it can be so tedious at times. For instance, if one comes up with an idea in chemistry, it might take weeks just to test it. That’s too long! I’m used to the computer science method is programming things at once and making modifications on the fly—instant feedback.

I actually started writing this post last week or so and didn’t get around to finishing it. Today, my mentor (he helps me do a lot of synthesis work) are almost close to completion. We have two and three these new porphyrin variants connected. Our goal is to make a pentamer (five connected porphyrins). The next reaction should give us a pentamer. After purifications and such, we should have the final product by the end of this week.

Tags: |

Backup (File) to Email

3 #

Around last year, I started backing up the SQL databases on my accounts weekly just so if anything happened, I would have a recent copy. This has served me pretty well a few times (ie. saved my a__). However, the backup files soon started to clutter up server space so I had to delete them from time to time.

Now, this is all fine and all except that I’m lazy, and I want an archive of all my backup files. Okay, okay, the other reason is that I want to fill up a Gmail account at least once in my lifetime. So why not? Let’s send the backup files to Gmail!

So I wrote two scripts: A filesplitter that took the backup files and split them into a specified size (10MB attachment is the limit for Gmail) and a PHP emailer script that encoded the attachment in base64 and generated the correct MIME type emails. It’s easy to send mail with PHP using the mail() function, but it’s a lot more difficult trying to send attachments.

I combined this with my original shell script (.sh) that was linked to a crontab that made backups of my databases every week.

It’s working pretty well right now so I thought other people might be interested in the source:

Download (3KB): BackupToEmail_1.0.zip. Written in PHP and 1 shell (SH) script.

Purpose
The included three files work together to back up a database every so often (specified by a crontab), tar and gzips the .sql dump, and sends the tgz to a email address for backup. If the tgz file is too large, a file splitter is employed to keep the file within attachment size limits.

Usage

  1. Place all files in a specified directory (ie. /home/yourname/www/backup)
  2. Chmod that directory 777.
  3. Chmod backup.sh 777 (give it executing permissions +x).
  4. Edit backup.sh to reflect the database you wish to backup.
  5. Edit backupToEmail.php to reflect the email address you wish to send the files to. A gmail account is a good choice since it has over 2GB of storage Also change the from address to reflect the domain you are sending emails from.
  6. Set a crontab to execute backup.sh every period of time. This can beweekly, monthly, or even daily. You can do this in cpanel is you haveaccess to one or issue the command ‘crontab -e’ at command line and addto the crontab file something like:

    0 3 * * 0 /home/yourname/www/backup/backup.sh

    For the above example, that is set to execute at 3AM every week.

No support is offered for this script. Use at your own risk!

Credits: PHP Manual

License: GPL

Simple Asides Wordpress Plugin

4 #

The official Simple Asides page can be found here. Please do not use this page anymore.

The newer versions of Wordpress (from 1.2 on) have cool plugin functionality. I used to have asides on my blog, but since I upgraded, I didn’t want to hack through a lot of code just to create the “asides” effect. Although there are some plugins already written for WP that create asides functionality, they require the use of special fields which I dislike. I would rather just write a post, place the post in a designated category (like Asides) and have it be formatted as an aside!

Therefore, I worked on my first WP plugin today: Simple Asides which does roughly what Matt describes in his blog in the plugin format. Usage is very simple:

  1. Place the simpleasides.php file in wp-content/plugins/
  2. Activate the plugin through the control panel
  3. Create a category called ‘Asides’
  4. Place all posts intended to be an aside into the ‘Asides’ category

Since WP had limited filters, I had to use a HTML comment trick to mask a lot of the unneeded HTML. Although writing the plugin was fairly easy, the WP Codex could benefit from some more plugin API documentation. Overall, I liked the WP plugin system and hope to work with it more.

Download Simple Asides: [zip] (3KB)
(WARNING: There are bugs with this plugin and conflicting issues with other plugins. Please do not use this plugin seriously. Feel free to take the code and modify it for your own uses (and even re-release it as your own plugin).)

Latest project: A (better) LaTeX to HTML converter

0 #

So I procrastinated today and worked on this spontaneous idea I had. You see, sometimes I want to convert my documents (in LaTeX, I barely use Word anymore) into HTML format. I’ve searched very long on the internet a few months ago, but the best I could come up with was TeX4ht, and I just found pyLaTeX which is good, but not what I want.

So I began working on my own version of a LaTeX to HTML converter. I spent today setting up how pages would look, and I have a few demos to show you.

I essentially took Adobe Acrobat 6 output (at 118% magnification at 1024×768 screen resolution) and tried to design an HTML layout around the “page” format. I purposely constrained LaTeX output to the Times font instead of Computer Modern so I could better compare my version with the real LaTeX generated version.

If you are acute, compare the TeX4ht xhtml code with my xhtml code. Notice how the TeX4ht version seems a lot more bloated? My goal is to convert from LaTeX to HTML while maintaining logical and clean HTML.

Now, since I barely have time to be doing things like this, would anyone like to help me? We will be doing PHP work and writing some CSS to account for all of the different LaTeX environments and formatters. Let me know what you think.