In computational chemistry, the calculations which are performed involve very complex mathematics. The level of theory chosen indicates how much knowledge about a given molecule you are assuming; ab initio, the level we will use, assumes no prior knowledge about the molecule. All data is the result of solving the equations describing the molecule. The exact equations describing these molecules cannot be solved exactly, so approximations are created and then solved. The "basis set" describes the approximation to the exact equations of the molecule. These sets are a starting guess, or initial idea, about the molecule in question.
Basis sets usually attempt to approximate the exact equations by adding up several other curves, called basis functions. The basis set chosen describes the number of basis functions used as well as the way they are added up. The name of a basis set is based on the number of basis functions used to represent different features of the molecule.
When working at the ab initio level of theory for computational quantum chemistry, the choice of basis set is important. This affects the accuracy of your results as well as the computational expense associated with any results. A compromise must be reached between accuracy and computational time, since the time involved can be large, even on the best of computers.
Behind the issue of computation time is the idea of basis functions. The number of functions being used in a particular computation is directly related to the computation time required. The choice of basis set determines how many functions (ie, gaussian curves) will be used to express the molecule, depending on the number of atoms of each type present. In this activity, you will determine the relationship between basis set, number of atoms, and number of functions, and the relationship between number of functions and CPU time for several alkanes.
Your computational tools for this study will be Mac/PCSpartan and spreadsheets. To gather the necessary data, each group will run a few of the calculations, and the data will be pooled.
Each group should run a geometry optimization for two molecules on Mac/PCSpartan, and record the number of atoms of each type and the total number of atoms. For the entire scope of this activity, make sure that your Job Speed (under Actions on the Mac/PCSpartan Monitor menu bar) is set to the very maximum. Once the operation is done, examine the output file. At the top are listed the number of basis functions used, and at the very bottom are the reports on time. Record the number of basis functions used and CPU time. Make sure that you record the value for total time on the CPU. "Wall" time is the apparent time that the calculation took, the time from when you clicked "Submit" to when it came up with the report. CPU time is the actual amount of processor time used. Record CPU time in units of whole seconds.
The molecules we will work with are called alkanes. They have the formula CnH2n+2. Alkanes are made up of n carbons linked together in a line with hydrogens jutting off every carbon. All of the bonds in alkanes are single bonds.
Run each molecule TWICE: once with the 3-21G basis set (level) and once with the 6-31G* basis set. Put all of your data into a spreadsheet and share it with the class.
Throughout this section, discuss your results with your group and classmates as you proceed. Make sure you understand and have discussed each section before you go on!
Do all of the analysis in this section for the two levels of theory separately. For example, when you make a graph of number of functions versus the number of carbons, you should have two graphs, one for 3-21G calculations and one for 6-31G* calculations. Make predictions for results at both levels of theory!
Once the spreadsheet is set up, make a graph which shows the number of basis functions versus the number of carbons in each molecule. What do you observe about the shape of this graph? Try to write an equation which gives the number of basis functions (y) for a certain number of carbons (x). What is the coefficient on x in this equation?
Now compare the number of basis functions with the number of both kinds of atoms. The number of basis functions (Nbf) is related to the number of carbons (Nc) and hydrogens (Nh) through the following equation: Nbf=a*Nc+b*Nh. You have already determined (a); now find a value for (b) which fits the data you have. Both of these constants are integers, but (a) and (b) will be different for 3-21G and 6-31G*.
Do you think that this equation will accurately predict the number of basis functions needed for alkanes we haven't done yet? Record your predictions for these values of n.
Now graph the CPU time, in seconds, versus the number of basis functions. What sort of relationship do you see? Describe what happens to the CPU time required as the molecule gets extremely large.
Try to predict what the CPU time will be for molecules which haven't been calculated yet. How accurate do you think your predictions are? Do you think that the prediction for a molecule between two that we calculated is any more or less accurate than the prediction for a molecule not in between two that we calculated? If so, why? Discuss the difference between these predictions with your group. Record your predictions.
Run the calculations for the molecules that you made predictions about. See the table below for assignments.
Record the actual data from these runs and share it with the class.
How accurate were your predictions about the number of basis functions? Were all of the predictions of basis functions equally accurate or inaccurate?
How accurate were your predictions about CPU time? Were all of the predictions of CPU time equally accurate or inaccurate? Did the pattern of accuracy/inaccuracy fit what you thought? How so? Why or why not?
Which predictions of CPU time were most accurate? Why? Which ones were least accurate? Why? How could you have made these predictions more accurate without doing calculations on a molecule for which n is larger than what we have already done?
Predicting values for data points which are inside a set of data that you already have is called interpolation. Predicting values for data points which are outside the data you have is called extrapolation. From your experiment, which one do you think gives more accurate predictions in general? Discuss why there is a difference in accuracy between the two types of predictions.
It may seem that a 6-31G* calculation would be more accurate because it is a larger basis set, but this is not true. Different kinds of results are more accurate in either 3-21G or 6-31G* calculations. Now, pretend that a computational chemist has come to you for advice. She is going to run several calculations on various sizes of alkanes using either 3-21G or 6-31G* basis set. However, she has a limited amount of time on the computer. What basis set would you tell her to use in order to minimze her computational time? If she can use different basis sets for different molecules, would you recommend a different basis set for large molecules than small ones? If so, decide what value of n is a "large" molecule. Discuss your recommendations and reasoning with your group and classmates.
If time permits, run the same calculations on similar molecules with other first-row elements instead of carbon. For example, build a molecule which contains three nitrogens with single bonds and as many hydrogens as needed. Run the geometry optimization and see if your formula for the number of basis sets depending on the atoms involved still fits. Replace Nc, the number of carbons, with Nfr, the number of first-row atoms. Does (a) need to be changed?