NVIDIA Generative AI Multimodal NCA-GENM Prüfungsfragen mit Lösungen:
1. You're building a real-time voice cloning application using NVIDIA Riv
a. You need to ensure high-quality synthesized speech with minimal latency. Which of the following Riva configurations would provide the BEST trade-off between quality and speed?
A) Using only the open source implementation and not NVIDIA Riva to implement a Voice Cloning application
B) Using a large, transformer-based text-to-speech model with aggressive quantization and pruning, deployed on a cloud-based TPIJ instance.
C) Using a large, high-capacity Tacotron 2 text-to-speech model and a high-resolution WaveGlow vocoder, deployed on a single, low-power GPU.
D) Using a pre-trained, open-source text-to-speech model and a CPU-based vocoder, optimized for minimal memory footprint.
E) Using a smaller, faster FastSpeech text-to-speech model and a parallel WaveGAN vocoder, deployed on a multi-GPU server with TensorRT optimization enabled.
2. You are building a system to translate spoken language into images. You have a large dataset of audio clips and corresponding images.
Which of the following is the MOST appropriate architecture?
A) A hidden Markov model (HMM) trained to map audio features to image segments.
B) A CNN for audio feature extraction, followed by a GAN for generating images conditioned on those features.
C) A transformer-based model that attends to both audio features and a learned visual vocabulary to generate images.
D) A Support Vector Machine (SVM) trained on audio features to classify the type of image to generate.
E) A sequence-to-sequence model with an LSTM encoder for the audio and an LSTM decoder for generating image pixels directly.
3. You are fine-tuning a pre-trained large language model (LLM) for a specific text generation task. During training, you observe that the model is overfitting to the training data and not generalizing well to unseen examples. Which of the following techniques could be MOST effective in mitigating overfitting in this scenario?
A) Using a smaller batch size during fine-tuning.
B) Increasing the size of the training dataset.
C) Early stopping based on a validation set.
D) Applying dropout regularization to the LLM's layers.
E) Decreasing the learning rate during fine-tuning.
4. You're tasked with building a system that can generate realistic images from text descriptions and, conversely, generate accurate text descriptions from images. You decide to use a GAN (Generative Adversarial Network) architecture, but need to handle both modalities effectively. What GAN variant would be MOST suitable for this bi-directional multimodal task?
A) Vanilla GAN
B) Deep Convolutional GAN (DCGAN)
C) Super-Resolution GAN (SRGAN)
D) Conditional GAN (cGAN)
E) CycleGAN
5. When using prompt engineering with text-to-image models, which of the following techniques are most effective in improving the fidelity and relevance of generated images to the input text?
A) Using vague and open-ended prompts to encourage creative variations.
B) Using negative prompts to explicitly exclude undesirable elements from the generated image.
C) Using a combination of highly specific prompts and negative prompts.
D) Using highly specific and detailed prompts, including attributes, style, and composition.
E) Focusing solely on the main subject of the image, omitting any contextual details.
Fragen und Antworten:
| 1. Frage Antwort: E | 2. Frage Antwort: C | 3. Frage Antwort: C,D,E | 4. Frage Antwort: E | 5. Frage Antwort: B,C,D |






1283 Kundenbewertungen

