This is a combination "tip" and a "cry for help." Probably mostly the latter. I feel that logically, with PRE allowing for multiple tracks, there's got to be a better way of matching up tracks than my brute force method. If there's a simple, elegant and fast way of doing this, I'd really like to know - it's where I waste 25% of my time, at least.
In my particular situation, most of my video projects require me to combine - synchronize - tripod footage with handheld footage of a live event. In my case, it's karate instruction and promotion tests, but it could be a dance recital or a play or anything else that's live - anything where you can't say to the folks onscreen, "Wait - I didn't get that. Let's go back and do it again."
* * * * *
I always put the tripod footage on track one, and handheld on track 2. After inserting the vid on T1, I expand the video and audio lines, and change the view of the audio so I can see the waveforms. With the timeline zoom set about half way, I find the spot near to where my handheld clip needs to sit.
After dragging the handheld clip to T2, I select the front of that clip and reduce the opacity to around 50%. Then I expand that track and change the audio so I can see the waveforms there as well. This way, when I scrub or play the tracks, I've got audio and video clues as to how closely they're matched.
I often don't try to match up the exact start of the handheld with the tripod. I'll find some action sequence, hopefully something with a short, sharp sound; a kiai (spirit yell) or a punch landing. A body hitting the floor actually isn't "sharp" enough, and it's often hard to match up visually as well.
This is where the drudgery really begins. I start zooming in on the time line closer and closer, paying particular attention to spikes in the waveforms. Zooming in also allows me to move the clips in smaller increments, as little as a frame at a time forward or back. Using the edge of the scrub bar as a guide, I'll move a clip using the wave spikes as a visual guide until they line up. Now I'll scrub or play the combined clips and watch the monitor to make sure the audio is synched.
Depending on how the audio for the two cameras was recorded, synching the audio may not synch the video, which is more important to me, since once the clips are lined up, I usually remove the T2 audio as one of the final steps before completing the project. I'll test the video synch by moving frame by frame and watch the smallest movements - head turns, hand gestures, weight shifts, etc. - to make sure that they're starting/stopping in the same frame. If not, I'll use the visual cues to shift the T2 clip frame by frame until they match. The audio disconnect often happens when one audio recording device is significantly farther away from the action than the other. My tripod camera often uses an external shotgun mic, while I record audio through the on-board mic of the handheld. Distances over 100 feet, or larger auditoriums without a sound system often increase the disconnect in the sound recordings.
After synching up an entire clip, I'll zoom the timeline back out to around 50% and then I'll go back and edit jump cuts in between the tracks. Where I leave footage on T2, I'll then go back and push the opacity back up to 100% to "cover up" the visuals from T1. I never remove anything from T1, mainly because I don't want to disturb the audio there, and I find that it doesn't significantly slow the rendering process at the end.
* * * * *
Now at one point, I found a utility that basically applied this exact brute force method to the clips outside of the NLE, and allowed for importing them, locked together, back into the program. Except that it didn't work with PRE (I was using 9 at that point). Supposedly it only took seconds to do this. I have to admit, just synching two clips together can sometimes take as much as 10 minutes. So if you've got a simpler, faster way of doing this, I really want to hear it!