Most brain-computer interface (BCI) systems are trained in controlled settings on a small set of constrained, repetitive and well-characterized instructed behaviors. While effective in such settings, these systems often fail to generalize to real-world settings, where behavior is variable, context-sensitive, and structurally complex. Yet, behavior can be decomposed into reusable, overlapping motifs, for example, sequential combinations of phonemes form words and in handwriting, strokes form characters. We hypothesize that compositionality can enable neural decoders to be more sample-efficient in capturing variability, particularly when generalizing to novel combinations of familiar motifs. Motivated by recent Brain-to-Text BCI via attempted speech or handwriting, we design a compositional neural decoder for BCI. Despite explicit behavior not being observed in human BCI studies, we find that neural activity for different instructed behaviors carries a clear signature of motif compositionality; distinct temporal segments across different behavior classes reuse similar neural patterns. Leveraging this compositional structure, we propose a temporal model that jointly predicts both the compositional motifs (strokes/phonemes) and the behavior class (characters/words) from neural activity. The model is trained with a multi-objective loss, comprising a motif prediction branch, and a behavior prediction branch that integrates motif outputs. This hierarchical supervision guides the decoder to leverage the compositional structure while maintaining behavior prediction performance. We evaluate the models on intracortical recordings from human participants performing attempted handwriting of single letters or attempted speech of words (Willett 2021, 2023). We benchmark against a capacity-matched baseline trained solely for behavior classification. On a 50-50 train-test split, the performances are comparable, although the compositional models exhibit a trade-off between the representations for motif and behavioral class prediction. To assess generalization, we conduct a two-shot learning experiment in which for one of the behavioral classes only two trials are included in training (i.e. all other trials for that class are held out), while maintaining the 50-50 split for the remaining classes. In this setting, the compositional model consistently outperforms the baseline on the held-out class. These results demonstrate that motif-level decoding enhances performance on infrequent behaviors and compositionality improves generalization to novel settings.