Download Process Explorer from Microsoft Technet. Process Explorer is similar to the Windows Task Manager, but I find it much more useful. There is no installation routine, simply create a folder on your boot drive and extract the contents of the zip file you downloaded to that folder. The program file is named procexp.exe and I would recommend that you create a shortcut on your desktop to make it convenient to run it or pin it to your start menu or task bar.
To see if you need more memory, run the system for a while using the programs that lead to the slowdown. If you run Process Explorer while the slowdown is occurring, you can see whether the problem is cpu related and see what's running and how much cpu each process is using. While Process Explorer is running, press ctrl+i to see the system info panel. Select the memory tab and look at the section labeled "Commit Charge (K)". In particular, look at the "Peak" value. This is the high water mark for the combined physical (RAM) and virtual (Page File) memory that has been used during the current windows session. Compare the Peak value to the Total Physical Memory value shown in the same dialog window. You want the Peak to be much less than Total Physical Memory to allow for System cache and to avoid using the Page file. If Peak is higher than Total Physical Memory, performance is being degraded because the system had to use the page file and the System Cache had to be purged. If you consistently see this happening, you will benefit from installing additional RAM.
For the applications you are using, you probably won't see much difference between DDR3 1333 and 1600 RAM. Also, in general, CL9 would be preferred over CL11. For SDRAM, Column Address Strobe Latency, commonly known as CAS or CL, is the number of clock cycles from the time the memory address is entered until the value is available on the RAM module output pins. Because RAM modules contain multiple internal memory banks, memory from one bank can be output during the latency phase of the next read. When memory addresses are predictable, such as in sequential reads, the output pins can be kept 100% utilized despite the CAS latency. In this case, bandwidth is essentially limited by the speed of the module. However, when the addresses are random and the next memory address cannot be known rapidly, the CAS limits the speed. Real world results will be somewhere between the two.