how to identify or predict the promoter sequence of a gene in C. elegans?

I can’t say much about specific instances, like why exactly 68 aa of ocr-4, but the usual ideas are along the following lines:

  1. how to identify the correct length of promoter sequence of a gene?
    a) Up to the next upstream gene (ie coding sequence), when you think this is convenient, ie it’s neither too short nor too long
    b) If that’s really, really too short, and they’re on the same strand, maybe it’s an operon. Plan accordingly!
    c) If it’s not incredibly short, but it’s short: what’s too short? Good question! Sadly, there’s not a great answer. Most people want at least a couple of kb, especially if by including some of the upstream coding sequence you’re also including large introns, especially if those contain sequence conserved in other Caenorhabditis species.
    d) On the other hand, “too long” also happens - a few kb is convenient to PCR, more than 7-10 really isn’t.
    e) Basically, most people find they’re happy with a few kb because they get expression, and in a pattern that makes sense to them. It helps if you’ve got a loss of function (or overexpression) phenotype and can rescue (or reproduce) it by sticking your promoter construct onto a cDNA of your gene.
    f) There are some other considerations, such as that there might be important transcriptional control elements in introns within your gene, downstream of the start of transcription - or even completely 3’ of your gene, or farther 5’. These things happen, though they don’t seem to be too terribly common, or at least not commonly to be terribly important. If you can find these elements, or if you have a good idea where you are, you can add them, often by moving them 5’ of your current promoter construct. This is why having some sort of assay for successful rescue can be very important - so you have some confidence your expression pattern is good, and so you know whether you’re in one of those cases where you need more than a few kb 5’ of the start of transcription. If your genomic construct rescues and your cDNA expression construct doesn’t, it’s possible you need to add back noncoding pieces from your genomic construct.
    g) On the other hand, there are slightly fancier approaches: you can place the gfp open reading frame into the context of an extrachromosomal genomic fragment (fosmid, cosmid, or cloned PCRd genomic fragment), either as a fusion with your open reading frame or replacing it (if the latter, avoid inducing Nonsense Mediated Decay). Or in these days of CRISPR you can knock gfp into the actual genomic locus itself, and in theory have all the native control elements (except maybe any intronic elements you replace, or theoretically even functional elements in the coding sequence). These approaches are considerably more work; you may find them more satisfying.

2)why 68aa ORF was included in the promoter sequence?
To promote good expression of your expression construct, and maybe even some native regulation, keeping not only the initiator ATG but also the first several amino acids of coding sequences isn’t a bad idea. Why they kept almost the whole first exon in this particular instance, I don’t know - it’s considerably more than I’d normally expect (and I haven’t looked at the paper). If you’ve a particular interest in this construct, you should contact the first author or the corresponding author directly.