Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis W Feng, X He, TJ Fu, V Jampani, A Akula, P Narayana, S Basu, XE Wang, ... ICLR 2023, 2023 | 287 | 2023 |
LayoutGPT: Compositional visual planning and generation with large language models W Feng*, W Zhu*, T Fu, V Jampani, A Akula, X He, S Basu, XE Wang, ... Advances in Neural Information Processing Systems 36, 2024 | 176 | 2024 |
Velma: Verbalization embodiment of llm agents for vision and language navigation in street view R Schumann, W Zhu, W Feng, TJ Fu, S Riezler, WY Wang Proceedings of the AAAI Conference on Artificial Intelligence 38 (17), 18924 …, 2024 | 51 | 2024 |
Neuro-Symbolic Procedural Planning with Commonsense Prompting Y Lu, W Feng, W Zhu, W Xu, XE Wang, M Eckstein, WY Wang ICLR 2023, 2023 | 37* | 2023 |
CPL: Counterfactual Prompt Learning for Vision and Language Models X He, D Yang, W Feng, TJ Fu, A Akula, V Jampani, P Narayana, S Basu, ... EMNLP 2022, 2022 | 25 | 2022 |
T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback J Li, W Feng, TJ Fu, X Wang, S Basu, W Chen, WY Wang arXiv preprint arXiv:2405.18750, 2024 | 20 | 2024 |
Reward guided latent consistency distillation J Li, W Feng, W Chen, WY Wang arXiv preprint arXiv:2403.11027, 2024 | 8 | 2024 |
ULN: Towards Underspecified Vision-and-Language Navigation W Feng, TJ Fu, Y Lu, WY Wang EMNLP 2022, 2022 | 7 | 2022 |
TC-Bench: Benchmarking Temporal Compositionality in Text-to-Video and Image-to-Video Generation W Feng, J Li, M Saxon, T Fu, W Chen, WY Wang arXiv preprint arXiv:2406.08656, 2024 | 6 | 2024 |
Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners X He, W Feng, TJ Fu, V Jampani, A Akula, P Narayana, S Basu, WY Wang, ... arXiv preprint arXiv:2305.10722, 2023 | 6 | 2023 |
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos X He, W Feng, K Zheng, Y Lu, W Zhu, J Li, Y Fan, J Wang, L Li, Z Yang, ... arXiv preprint arXiv:2406.08407, 2024 | 5 | 2024 |
EDIS: Entity-Driven Image Search over Multimodal Web Content S Liu, W Feng, T Fu, W Chen, WY Wang arXiv preprint arXiv:2305.13631, 2023 | 5 | 2023 |
Anticipating the Unseen Discrepancy for Vision and Language Navigation Y Lu, H Zhang, P Nie, W Feng, W Xu, XE Wang, WY Wang arXiv preprint arXiv:2209.04725, 2022 | 2 | 2022 |
BlobGEN-Vid: Compositional Text-to-Video Generation with Blob Video Representations W Feng, C Liu, S Liu, WY Wang, A Vahdat, W Nie arXiv preprint arXiv:2501.07647, 2025 | | 2025 |