The receiver needs three key information for synchronization - the synchronization source, packets in order and sampling instant of packets which it gets from three header fields. You must know about the header fields first.
Synchronization Source (SSRC)
The receiver may be receiving data from several sources. So for proper arrangement it needs to identify the source of individual packets which is possible from the SSRC field.
It is not enough to identify the source, the order is important too. The sequence number increments by one for each RTP data packet sent, and may be used by the receiver to detect packet loss and to restore packet sequence. The loss or out-of-order delivery occurs due network problems.
For media delivery not just the order of the packets but also the sampling instant of individual packets are important. Please go through the following paragraph carefully.
Several consecutive RTP packets may have equal timestamps if they are (logically) generated at once, e.g., belong to the same video frame. Consecutive RTP packets may contain timestamps that are not monotonic if the data is not transmitted in the order it was sampled, as in the case of MPEG interpolated video frames. (The sequence numbers of the packets as transmitted will still be monotonic.) So the sequence number is not enough for synchronization.
You already know that in a audio/video session audio and video data are transmitted using separate channels (if you don't know this, please go through applications of RTP). The receiver matches the video data with corresponding audio data using timestamp.