Packet Based Equalization in Multipath Environment

[1 0 0 0 0 0 0 0 0 0 1]
.The communication is packet based consisting of a known header + payload of known size. I am able to detect this packet. I need a way to decode the payload based on the known header without prior knowledge of the channel. All the samples (header + payload) are able to be operated on at once. Assume the channel is constant for a given packet, but could vary packet to packet.
What is a good strategy to recover the payload?

I assume your transmissions are not OFDM, so you are looking to calculate the coefficients for some kind of FIR equalizer. In cases like this where there is a known preamble/header (which we want to exploit to estimate/equalize the channel), I can think of two general approaches that are widely used.
The first is the "least squares" type of approach, whereby you need to arrange your data into a system of linear equations to solve. This usually involves reformulating a FIR filter convolution as a matrix multiplication, where the matrix is a Toeplitz matrix. You can either solve to estimate the FIR channel, or you can solve slightly differently to directly estimate the FIR equalizer (without ever explicitly estimating the channel).
The second type of approach is the spectral approach, whereby the FIR filter convolution is written as a multiplication in the frequency domain: Y = HX + noise. Given that X is known, we can in some sense estimate the channel H based roughly on H ≈ Y/X. Or the equalizer can be estimated based roughly on 1/H ≈ X/Y (but there is some nuance in doing that estimation well/optimally with a finite amount of noisy data).
I can't seem to put my hand on my preferred references right now, but you should be able to find plenty of discussions on those two approaches. Maybe this link or this link could be a possible starting point.

Thank you for your post. I'll look into the links you sent!

I was able to get something mostly working from your links. Thank you.
With regards to your second approach and you comment "(but there is some nuance in doing that estimation well/optimally with a finite amount of noisy data)" would you mind expanding on that and or pointing me to a source?
Thank you!