It should work to just glue the media items, as you described. One possible variable is the relative length of the items -- by default, if one media item is completely enclosed by the other, the shorter item will mask the longer one. You can work around this by opening the media item properties, and setting the mix behavior for both items to "always play."
|