My goal with this guide is not merely to explain how to sound the letters of the Sanskrit alphabet. There are countless pronunciation guides online for that! Rather, I want to give a deeper understanding of how the sounds are formed in the mouth, and why almost all those pronunciation guides online are wrong. I will also show why the IAST transliteration system is by far the best.

The word “Sanskrit” — or saṃskṛta in IAST transliteration — means “well-made” or “perfected”. And it earns that name. It is well thought out, logically structured, and precise. This makes it relatively easy for non-native speakers like me to understand how to produce the sounds correctly (for the most part — there are a couple of tricky ones!).

Note: For now, don’t worry about how to pronounce the saṃskṛta words. The lines above and the dots below the letters might seem intimidating, at first. By the end of this guide you will understand what all those marks represent.

Pronunciation: The Theory

In many European languages, a single letter can represent many different sounds. For example, the “g” in “garage” makes two different sounds within the same word. Likewise the “c” in the Italian “cucina” (kitchen). This creates the need for extensive memorization of how to pronounce words. This is not the case with saṃskṛta, where every letter makes only one sound, and every sound has only one letter. This allows for easy pronunciation of written words. There are no confusing homonyms or homophones like “buffet” (buff-ay or buff-it), “through / threw”, or “to / too / two”. Some English words even have several correct pronunciations. The following three words — “ante”, “anti”, and “auntie” — can all be pronounced “ant-ee”. The listener needs context clues to infer which word was said. These words can also be pronounced differently: “ante” as “ant-ee”, “anti” as “ant-eye”, and “auntie” as “awn-tee”. This isn’t possible in saṃskṛta, where every word is spelled and pronounced in only one way1. If you hear a well-pronounced word, you’ll know how to spell it. If you see a correctly spelled word, you’ll know how to say it. This logical simplicity comes down to understanding how the mouth forms sounds.

The Five Mouth Positions

To understand saṃskṛta pronunciation, we have to visualize the mouth and how it creates different sounds. There are five distinct mouth positions where the energy of a particular saṃskṛta sound originates. From back to front:

Velar (kaṇṭhya)
Formed with the back of the tongue and the velum (soft palate)
Palatal (tālavya)
Formed with the middle of the tongue and the hard palate
Retroflex (mūrdhanya)
Formed with the tip of the tongue turned back towards the hard palate
Dental (dantya)
Formed with the tip of the tongue against the front upper teeth
Labial (oṣṭhya)
Formed with the lips

The five mouth positions, the focal points of saṃskṛta sounds.

Throughout this guide, I will use the color coding shown above for the mouth positions. Wherever I reference the positions, I will highlight them with these colors. My hope is this will help emphasize the importance of learning and using the mouth positions.

Each of these mouth positions produces both a short and a long vowel sound, along with five stop consonants2. Each mouth position except velar also produces a semivowel, and each position except labial produces a fricative (I’ll explain these terms later).

The tongue and lips in each of the five mouth positions


This will sound very obvious, but saṃskṛta is built with syllables. All languages are, of course. But saṃskṛta pays special attention to its syllabic nature by appending a vowel sound to every consonant by default. Every letter, unless otherwise specified, forms a complete syllable. For example, the consonant is not transliterated or pronounced as “k”, but rather as “ka”, with a short a vowel built right in3. Saṃskṛta’s syllabism manifests in its various alphabets used over the past three millennia. From the ancient Brahmi script to the modern day Devanāgarī, saṃskṛta’s predominant alphabet has been an an abugida, or syllabic script. This means every syllable is written and treated as a single unit. Vowels are not written as distinct letters (unless the word begins with a vowel). They are written as diacritical marks appended to consonants.

A syllable is a packet of sounds that contains exactly one vowel. A syllable must have a vowel and it can have only one vowel. Vowels are the energy that give syllables life. They are often referred to as the mātṛkā or śakti (“powers” or “energies”) of saṃskṛta. The short and long versions of each vowel sound the same, only the length is different. The long vowels are twice the length of short vowels.

Simple Vowels

The tables below show the five basic vowel sounds in their short and long forms. First is the Devanāgarī “independent” form, used only when the vowel begins a word. Then is the IAST transliteration of the vowel. And last is the Devanāgarī diacritical form attached to the consonant (ka).

Vowel IAST Diacritic
(with )
Velar a
Palatal i कि
Retro­flex कृ
Dental कॢ
Labial u कु
Vowel IAST Diacritic
Velar ā का
Palatal ī की
Retro­flex कॄ
Dental कॣ
Labial ū कू

These five simple vowels are the pure sounds created from each of the five mouth positions. In fact, they define the mouth positions used for the rest of the alphabet.


Diphthongs are compound vowels that combine two simple vowels to create a new vowel. All diphthongs in saṃskrta are long vowels (dīrgha). They are called saṃdhyakṣara, which means “combined letters”.

Vowel IAST Diacritic
e के
ai कै
o को
au कौ

Anusvāra and Visarga

The anusvāra (, transliterated as ) and visarga (, transliterated as ) are grouped with the vowels, but they are neither vowels nor consonants. They are not vowels because they can never follow a consonant. They are not consonants because they can never begin a syllable. They can only appear immediately after a vowel, and cannot precede a vowel. They serve to “close” the vowel.

Anusvāra () closes the vowel with a resonant (nasal) sound from the mouth position of the consonant that follows it.4.

Visarga () closes the vowel with an unvoiced breathy (aspirate) sound through the vowel’s mouth position. If the visarga comes at the end of a sentence, it is common to add a voiced echo of the vowel after the breath. For example, aḥ would be pronounced “aha”. This is not a standard rule, and many traditions end the word with the unvoiced breath sound. As for which method is more correct, I don’t think there is a definitive answer.


Consonants (vyañjana) in saṃskṛta are also called “stops”. This is because they stop the flow of air (and thus the sound) by means of contact within the mouth by either the tongue or lips. Full stops (spṛṣṭa) are made by complete contact that blocks the air. Partial stops (īṣatspṛṣṭa, also called semivowels) are made by partial contact that does not stop the air, but suppresses it through the various mouth positions.

Stop Consonants

There are twenty-five stop consonants, five for each of the mouth positions. The stop consonants have the following three characteristics:

Unvoiced (aghoṣa)
Sound created only with the breath, not the vocal cords.
Voiced (ghoṣa)
Sound created with the vocal cords.
Resonant (anunāsika, “through the nose”)
A nasalized ghoṣa (voiced) sound. The mouth acts as a resonance chamber, then directs the resulting sound through the nose. For example, with your lips closed, pronounce a prolonged “ng” sound by closing the back of the mouth with the base of the tongue. This is the velar mouth position. This sound bypasses the mouth entirely and passes directly into the nose. Now relax the whole tongue and pronounce a prolonged “mmm” sound. This is the labial mouth position. In both cases, all the sound is passing through the nose, but they sound distinctly different. The velar “ng” sound does not resonate in the mouth at all. The labial “mmm” sound resonates through the entire mouth before traveling out the nose.

The voiced (ghoṣa) and unvoiced (aghoṣa) sounds have two variants:

Aspirated (mahāprāṇa, “large breath”)
These sounds are followed by a small puff of air, creating a breathy sound. Take for example, these two English phrases: “a pill” and “uphill”. They sound almost the same, but the second “p” is aspirated. It is important not to overdo the aspiration. This can result in the insertion of a vestigial vowel sound, especially with the voiced (ghoṣa) consonants, i.e. “bha” becomes “baha”.
Unaspirated (alpaprāņa, “small breath”)
These sounds are, obviously, not followed by a puff of air.

The table below shows all twenty-five stop consonants, sorted by mouth position and type of sound.

Unvoiced Unaspirated
Unvoiced Aspirated
Voiced Unaspirated
Voiced Aspirated
Resonant (Nasal)
  ka   kha   ga   gha   ṅa
  ca   cha   ja   jha   ña
  ṭa   ṭha   ḍa   ḍha   ṇa
  ta   tha   da   dha   na
  pa   pha   ba   bha   ma

Semivowels, Sibilants, and Aspirate

Semivowels (antaḥstha, “standing between”)
Also called īṣatspṛṣṭa (“slight touch”). The flow of air is not stopped completely, but is suppressed at the respective mouth positions.
Sibilants (ūṣman, “steam”)
Also called īṣadvivṛta (“slight opening”). An unvoiced (aghoṣa) stream of air passes through a small space at the respective mouth positions, creating a hissing sound.
Aspirate (ūṣman, “steam”)
A pure breath sound, “ha”. Formed with an open throat and mouth.
Semi­vowel Sibilant Aspirate
Velar   ha
Palatal   ya   śa
Retroflex   ra   ṣa
Dental   la   sa
Labial   va

Transliterating with IAST

Before we move on to the practical application of what we’ve covered so far, I want to discuss transliteration.

The International Alphabet of Sanskrit Transliteration is the only transliteration system I use. It is very easy to learn and understand. It uses strict 1:1 character substitution, making it unambiguous and reversible. This means that IAST can be transliterated back into Devanāgarī or other Indic scripts with 100% accuracy.

The Diacritics

IAST employs a handful of diacritical marks.

Macron   ̄
This mark only occurs above the five simple vowels to denote the long version of the vowel: ā ī ṝ ḹ ū. A vowel with a macron is held for twice as long as its short counterpart.

Underdot   ̣
This mark is used for almost all the sounds in the retroflex position (tongue-tip vertical at the roof of the mouth): ṛ ṝ ṭa ṭha ḍa ḍha ṇa ṣa. It is also used for the dental vowel ( and ), the anusvāra (), and the visarga ().

Overdot   ̇
This mark typically only occurs with the velar resonant consonant (the “ng” sound): ṅa. It is also used for the anunāsika (), which occurs in Vedic saṃskṛta when an anusvāra is followed by a vowel, transliterated as . Chances are you won’t encounter this, but I include it for the sake of thoroughness.

Tilde   ̃
This mark only occurs once, with the palatal resonant consonant: ña.

Acute   ́
This mark only occurs once, with the palatal sibilant: śa.

Macron below   ̱
This is the rarest mark. It only occurs on the transliteration of the character : ḻa. This letter is not part of the saṃskṛta alphabet, but a grammatical substitution that occurs very rarely, and only in Vedic saṃskṛta. This consonant is an alternate version of ḍa () that occurs when ḍa falls between two vowels. The aspirated version of the consonant is ḻha (ळ्ह), and occurs when ḍha () falls between two vowels. It is pronounced similarly to la (), but with the tongue raised to the retroflex position (tongue-tip vertical at the roof of the mouth). An example of this grammatical substitution occurs in the very first word of the first verse of the ṛgveda:
अग्निमीडे अग्निमीळे
agnimīḍe agnimīḻe
“I praise Lord Agni”

Comparing Systems

Saṃskṛta does not differentiate between upper- and lower-case letters. Proper nouns are not identified by capitalization, but by grammatical rules (noun declension). IAST uses only lower-case letters, the exception being if a saṃskṛta word comes at the beginning of an English sentence. This makes IAST far more readable than some other transliteration schemes. Some systems use a mixture of upper- and lower-case letters or interior punctuation to represent saṃskṛta characters.

Here is an example using two famous saṃskṛta names. I give them first in an Anglicized spelling (not a transliteration), followed by several popular transliteration systems, beginning with IAST.

Anglicized Shankaracharya Krishna
IAST śaṅkarācārya kṛṣṇa
Harvard-Kyoto zaGkarAcArya kRSNa
ITRANS sha~NkarAchArya kRRiShNa
SLP1 SaNkarAcArya kfzRa
Velthuis5 "sa"nkaraacaarya

It is immediately clear that IAST is the most concise and legible of these examples. Mixing upper- and lower-case letters, or sprinkling punctuation into the middle of a word, wreaks havoc on readability. When reading a full sentence, or a whole stotram, or conducting a long pūjā ceremony, quick and easy legibility is a must.

There are some very similar systems that use all the same diacritical marks as IAST. These include National Library at Kolkata romanization and the ISO 15919 standard. These systems were not designed for saṃskṛta alone, but for all Indian language scripts. These include Bengali, Tamil, Malayalam, Kannada, Telugu, Arabic, Nastaliq, Oriya, Gujarati, and others. Unlike saṃskṛta, some of these languages have both short and long diphthongs, meaning the e and o have a macron: ē and ō. Since I only work with saṃskṛta, these macrons are superfluous, and I would rather not have to type or read them. This is why I prefer IAST, and use it exclusively.

IAST at a Glance

This chart shows the full saṃskṛta alphabet in IAST. It is color-coded by mouth position, and the letters with diacritical marks are outlined . Until you become accustomed to what each mark means, use this chart as a reference.

Short Vowels a i u
Long Vowels ā ī ū
Diph­thongs e ai o au
Velar Stops k kh g gh
Palatal Stops c ch j jh ñ
Retro­flex Stops ṭh ḍh
Dental Stops t th d dh n
Labial Stops p ph b bh m
Semi­vowels y r l v
Aspirate / Sibilants h ś s

A Brief History of IAST

Modern IAST is derived from a system developed and adopted at the 1894 International Congress of Orientalists in Geneva. The scholars of the time saw a growing need to standardize a system of transliteration. The goal was to eliminate confusion caused by differing systems in use across Europe, as well as to streamline the printing of transliterated texts. They took inspiration from the most prevalent systems of the day, as well as leading saṃskṛta scholars like Monier Monier-Williams.

The system they arrived at is very similar to modern IAST, with a few distinctions. Namely, despite a desire to use a macron for all the long vowels, due to printing limitations of the day, they were not able to place a macron over the letter “l” for the long dental vowel. Since the short vowel had a dot below, they opted for an “l” with two dots below for the long vowel (l̤). Other minor differences include “m with dot above” for anusvāra (ṁ), and “m with candrabindu” (m̐) for anunāsika. There were also two additional variants of visarga: jihvāmūlīya (ẖ), used when ka or kha immediately follows visarga, and upadhmānīya (ḫ), used when pa or pha immediately follows visarga.

Modern IAST is essentially a simplification of the 1894 system. We are now able to put a macron above the letter “l”, so the long dental vowel is . The candrabindu mark for anunāsika is superfluous, so is used. This leads to the anusvāra being . And the two visarga variants occur so rarely in saṃskṛta that modern IAST ignores them, using in all cases.

If you are so inclined, you can read the Report of the Transliteration Committee from the 1894 Congress. It has some interesting insights into how they decided which diacritical marks to use where.

Pronunciation: The Practice

So, how are all these things actually pronounced? We’ll examine that in detail, but first I want to cover some of the most common mistakes people make.


First, I want to address something that, to me, seems straightforward, yet is the subject of controversy. It has to do with the retroflex and dental vowels. How the heck do you pronounce those things?

Pronouncing and

Many saṃskṛta pronunciation guides will tell you to pronounce like “ri” as in “crisp”, but with a trilled “r”. This leads to the instruction that is pronounced as “ree” in “creek” (again with a trilled “r”). They will then tell you to pronounce like , but starting with an “l” sound, leading to the truly bizarre “lri” (and “lree” for ). Some other guides and teachers will say that and are a rolled “r”, but for different lengths.

These guides have lost sight of the fact that these are simple vowels. They are the pure sound created from each mouth position. One could sound these vowels without variation for the length of a full breath. The notion that sounds like “ri” changes it from a simple vowel to, at best, a diphthong, and at worst, a consonant. And “lri” for is just wacky.

The simple vowel sounds very like an English “r” ( being the same, only longer), but with the tip of the tongue pointed straight up (retroflex position). Likewise, sounds very much like an English “l” (with being the same, only longer), with the tip of the tongue right behind the teeth (dental position).

Charles Wikner — author of A Practical Sanskrit Introductory — offers the following exercises for how to pronounce these vowels:

To get to the correct pronunciation of , begin by sounding a prolonged i and slowly raise the tip of the tongue so that it is pointing to the top of the head, approaching but not touching the roof of the mouth. Do not try to hold the back of the tongue in the i position, nor try to move it out of that position: simply have no concern with what is happening at the back of the tongue, just attend to the tip of the tongue and listen. Repeat the exercise a few times until comfortable with the sound of then practice directly sounding for a full breath.

Similarly for start sounding with a prolonged i and slowly raise the tip of the tongue to behind the upper front teeth without touching them. Continue the exercise as for .

Wikner closes this section with the following explanation of how the mispronunciation of as “ri” came to be (emphasis added):

In practice when either of these vowels is followed by a consonant whose mouth position requires that the tip of the tongue be at a lower position, a vestigial i will emerge due to the bunching of the muscle at the back of the tongue when moving the tip downwards. For example ṛk tends to produce rik, but a word like kṛṣṇa should produce no i sound at all.

Troubles With jña (ज्ञ)

If you have even a passing familiarity with saṃskṛta, you have likely encountered the compound consonant jña. You’ll have seen words like jñāna (wisdom), yajña (sacrificial worship), and ājñā (the “third-eye” cakra located behind the point between the eyebrows). And someone probably told you to pronounce these letters as “gya”, i.e., gyāna, yagya, and āgyā. Other common pronunciations of this character are “nya” (as in the Spanish word “mañana”), “dnya”, and “gna”. In modern Hindi, the character ज्ञ (jña) is, indeed, pronounced “gya”. In Marathi, it becomes “dnya”. And in South Indian dialects, we have “gna”. But for saṃskṛta these are all incorrect.

Jña is a compound of two consonants: ja and ña. Ja is the voiced, unaspirated palatal consonant, and ña is the palatal resonant. Both sounds originate from the palatal position, so their combined sound must also originate there.

Again citing Wikner:

The pronunciation of this is similar to the French “J” as in “Jean-Jacques”, or as in the “zh” sound in the English words “mirage”, “rouge”, “measure”, “or “vision”; but in all cases it is sounded through the tālavya (palatal) mouth position, and is strongly nasalized.

He gives the following exercise to practice this sound:

Now with the tongue in the palatal position, sound a prolonged śa (the palatal sibilant). And then repeat the sound but allowing the vocal cords to vibrate — with some imagination, this is beginning to sound like a prolonged ja (which is of course, impossible to sound). Now repeat this voiced sound allowing it to be strongly nasalized. This is about as close as one can get to describing the sound of jña.

To summarize, jña is pronounced as a strongly nasalized “zha” sound (as in “mirage” and “vision”) that originates from the palatal position. This is a very tricky sound to master, which is most likely why it devolved over centuries to the much simpler “gya”.

Silent Letters

Saṃskṛta has no silent letters. You will often hear Westerners pronounce brahmā as “braw-muh”, treating the “h” as silent. But every letter in saṃskṛta is important and needs proper articulation. Some people, knowing that the “h” is not silent, will pronounce it as brum-haw, with the “h” after the “m”. But this is also wrong. Every letter must be sounded in the order they are written.

Pronunciation Guide

If you read everything up to this point and you’re still with me, I commend you and your commitment to refining your skills. Now we’re going to get into the practical stuff.

Pronouncing the Vowels

The table below lists the vowels in order by mouth position. An English approximation is given with the relevant sound highlighted.

Mouth Position Vowel Sounds Like6
Velar a but, not bat
ā harm, not ham
Palatal i pit, pill
ī peep, peel
Retro­flex ṛ / ṝ acre
Dental ḷ / ḹ table
Labial u put, foot
ū boot, shoe
Diph­thongs e edge, not age
wet, not weight
ai aisle, pie
o told, not tow
au down, hound

The ai and au diphthongs are “moving sounds” (not a technical term). The ai vowel begins with a short a sound and moves into the i position. The au vowel begins with a short a and moves into the u position (this is different from the English sound, which begins with an “a” like in “cat”, and moves to the u position).

The e and o diphthongs are “constant sounds”, so the tongue and lips don’t move during the vowel.

Pronouncing the Anusvāra and Visarga

As explained earlier, even though they are neither vowels nor consonants, the anusvāra () and visarga () are typically grouped with the vowels. Therefore, I will include them here, as well. These sounds are grouped with the vowels because they can only appear immediately after a vowel, thus “closing” the vowel.

The anusvāra closes its preceding vowel with a resonant (nasal) sound that changes depending on what consonant comes next. An anusvāra followed by any velar consonant would be pronounced as the velar resonant (). An anusvāra followed by any labial consonant would be pronounced as the labial resonant (m).

Take, for example, this famous mantra for the elephant-headed deity gaṇeśaoṃ gaṃ gaṇapataye namaḥ. This is often mispronounced as “gum ganapataye”. But because the consonant after the anusvāra is the velarga”, the anusvāra must be pronounced as a velar resonant: “gung”.

The visarga closes its preceding vowel with a breathy sound resulting from a small puff of air. The visarga uses the same mouth position as its preceding vowel. If the visarga is at the end of a sentence, it is very common to follow the breathy sound with a voiced echo of the vowel. Take, for example, the universal closing mantra oṃ śāntiḥ śāntiḥ śāntiḥ. The last word is often pronounced “shawn-ti-hi”. The gaṇeśa mantra from the previous example can end with either “namah” with a puff of air, or as “namaha” with a voiced a. The correct way is not definitively known. I come down on the side of the unvoiced puff of air, because a voiced echo adds a syllable to the word, which changes the poetic meter (if there is one). That being said, I still often say “namaha”, simply out of decades long habit.

Pronouncing the Stop Consonants

The twenty-five stop consonants are separated below by mouth position. Each table gives English approximations with the relevant sound highlighted, followed by specific guidance for pronouncing the sounds of that position.

k kiss, kiln, back
kh bunkhouse
g good, give, bug
gh loghouse
sing, long, tongue

Velar Mouth Position

The velar consonants are created with the back of the tongue and the soft palate, exactly as in English.

c cello, chair, church
ch coach-horse
j just, jolly, joy
jh hedgehog
ñ enjoy, canyon, pinch

Palatal Mouth Position

The palatal consonants are created with the middle of the tongue against the hard palate. In English, it is common to make these sounds with the tip of the tongue placed at the palatal ridge behind the teeth, but this is incorrect for saṃskṛta.

Say the word “chai”, and pay attaention to the tip of your tongue. If the tip of your tongue is raised and / or touching the roof of the mouth, you will have to make some adjustments to properly get to the saṃskṛta sound. Relax the tip of the tongue, and bunch up the middle of the tongue. Now try saying “chai” again, and keep the tip of the tongue relaxed. Practice this tongue position until it becomes very easy, as this applies to all the sounds from this mouth position.

Retroflex / Dental
ṭ / t tub, tap, cart
ṭh / th anthill, hothouse
ḍ / d day, dog, god
ḍh / dh redhead
ṇ / n gentle, hand, gain

Retroflex Mouth Position

Dental Mouth Position


I have grouped the retroflex and dental consonants together because neither of these mouth positions occur in English.

In English these sounds occur as alveolar consonants, because the tip of the tongue touches the palatal ridge behind the teeth, called the “alveolar process” or “alveolar ridge”. The saṃskṛta versions of these consonants are pronounced similarly to the English, but with the tip of the tongue pointed straight up (touching the roof of the mouth) for the retroflex sounds, or straight forward (touching the teeth) for the dental sounds.

p pill, pat, tap
ph uphill
b be, cab, imbibe
bh clubhouse
m amble, mumble

Labial Mouth Position


The labial consonants are created with the lips, precicely as they are in English.


Pronouncing the Semivowels

Every mouth position besides velar has a semivowel. The semivowels are similar to their simple vowel counterparts, but they act as the boundary of a syllable rather than the heart of the syllable (the “nucleus”, in phonetic terms). If you sound a simple vowel and immediately follow it with “a” to form a syllable, you get the semivowel for that mouth position.

Mouth Position Semi­vowel Sounds Like
Palatal i+a = ya yellow, beyond, backyard
Retro­flex +a = ra rise, pray, grow
Dental +a = la love, play, blink
Labial u+a = va wet, swim, twin

Palatal — as explained above in the “chai” example, this semivowel is pronounced with the tip of the tongue relaxed and the middle of the tongue bunched up. Take care that you are not engaging the tip of your tongue when sounding this letter.

Retroflex — pronounced like an English “R”, but with the tongue-tip vertical at the roof of the mouth.

Dental — pronounced the same as an English “L”, taking care that the tip of the tongue is touching the upper teeth.

Labial — pronounced similarly to an English “W”, but the lips do not protrude as far forward (as when whistling).

For many years, before I truly understood the mouth positions, I made the mistake of pronouncing the labial semivowel (va) as an English “V” sound (as in “very” or “vertical”). But the English “V” is not a labial sound. It — along with its unvoiced couterpart, “F” — is called a labiodental fricativelabiodental because it involves both lips and the teeth (specifically, the lower lip curls under and touches the upper teeth), and fricative because it is created by the friction of air through a very narrow passage (i.e. a hissing sound)7. But the saṃskṛtava” is a labial sound, and does not involve the teeth at all. For example, where I used to pronounce viṣṇu as “Vishnu”, I now pronounce it more akin to “Wishnu”. After training myself or years to always say “V” — even in words where it is more awkward like svāmi or tattva (both of which are much easier to say with the correct sound: “swami” and “tattwa”) — the habit is deeply ingrained. If I don’t pay close attention I still absentmindedly default to a “V” sound. Don’t get frustrated by small errors. No matter how much we practice, we will still make simple mistakes. Just keep at it, and get a little better every day.

Pronouncing the Sibilants and Aspirate

The sibilants (hissing sounds) and aspirate (breathy sound) are called ūṣman, meaning “heated”. In English they are called fricatives.

Mouth Position Fricative Sounds Like
Velar ha hello, behind
Palatal /
śa / ṣa shine, wish
Dental sa sun, peace

Every mouth position except labial has a fricative. They are all unvoiced, and result from passing a stream of air through the mouth.

The aspirate has the same open throat and mouth as the simple vowel a, and a puff of air creates the an “H” sound, just as in English.

The palatal and retroflex sibilants are similar to the English “SH”, but distinctly different from it. The English sound is a voiceless postalveolar fricative. The tongue tip is between the teeth and the alveolar ridge, and the teeth are clenched. For both the saṃskṛta sounds the tongue tip is relaxed, and the teeth are open. The palatal sibilant can be especially tricky for an English speaker. It is similar to the German “ich”.

The dental sibilant is the easiest, as it is a simple “S” sound. To quote linguist W. D. Whitney:

…it is the ordinary European S — a hiss expelled between the tongue and the roof of the mouth directly behind the upper front teeth.

In Review

Everything in this section is redundant. We covered it all in the previous sections. But presenting the same information in a different, more compact way can be quite helpful. Think of this section as a quick reference guide if you need a refresher on a particular mouth position.

The Saṃskṛta Alphabet

The table below lists all the sounds created from each of the five mouth positions. Excluded are the diphthong vowels, as they don’t belong to any particular mouth position 8 .

Velar Palatal Retro­flex Dental Labial
Simple Vowels
aā iī uū
Stop Consonants
ka ca ṭa ta pa
kha cha ṭha tha pha
ga ja ḍa da ba
gha jha ḍha dha bha
ṅa ña ṇa na ma
ya ra la va
Sibilants / Aspirate
ha śa ṣa sa

Creating the Sounds

Creating the Sounds
back of tongue to soft palate

All sounds of this mouth position originate in the back of the mouth. The vowels (a and ā) are wide open sounds. Throat open, tongue relaxed, and lips wide open and relaxed.

Form the consonants by touching back of the tongue to the soft palate, or velum. Relax the rest of the tongue.

middle of tongue to hard palate

These sounds all use the middle of the tongue — the tip of the tongue stays relaxed — with the teeth open (i.e., not clenched). This can be an awkward adjustment for an English speaker.

To form the vowels (i and ī) and the semivowel (ya), raise the middle of the tongue to the hard palate without touching.

The consonants are all formed by touching the middle of the tongue against the hard palate.

Form the sibilant (śa) the same way, but with a small gap to force air through, creating a hissing sound. This is different from the English “SH”, which has the front of the tongue at the palatal ridge and clenched teeth.

For an English speaker, this mouth position can be especially awkward. We tend to form many of these sounds with the front of the tongue, often with the teeth clenched. Always try to keep the tip of the tongue relaxed for these sounds, and the teeth open.

Practice this mouth position with the phrase “jaya śiva śaṃkara”. All three words begin with a palatal sound. Keep the tip of the tongue relaxed until the last syllable (“ra”). Keep the teeth apart, and be sure to form “va” with only your lips (like “wa”).

tip of tongue vertical to hard palate

Form all the sounds of this position with the tip of the tongue pointed straight up towards the roof of the mouth.

The vowels ( and ) are not pronounced as a trilled “ri” or a Spanish-style rolled “r”, as is often taught. They are a pure “r” sound and can be voiced continuously for a full breath.

As with its palatal counterpart, the sibilant (ṣa) is similar to — but different from — the English “SH”.

tip of tongue to teeth

These sounds are formed by touching the tip of the tongue to the upper teeth.

The vowels ( and ) are not pronounced as “lri”, as is often taught. They are a pure “l” sound, and can be voiced continuously for a full breath.

Form the dental sibilant by passing air through the clenched teeth with the tongue tip directly behind the upper teeth, exactly like the English “S”.

lips only,
tongue relaxed

These sounds are created entirely with the lips. For the most part, sound them exactly like their English analogs.

Note that the semivowel (va) is formed with the lips like an English “W”, and should not involve the teeth, as with an English “V”.

More Guidance

Correctly pronouncing saṃskṛta mantras, whether mentally in silent meditation or out loud during kirtan, has a vibratory effect on the cakras, helping to awaken them. Awakening the cakras helps to purify the nadis, the energy channels in the spine through which prāṇa (life-force energy) flowes. Purifying and controlling the nadis is a vital step in the process of prāṇāyāma meditation.

If you found this guide helpful, but would like some more guidance on correctly producing the sounds, feel free to book a Skype session with me.

Sessions are one hour, and we will cover everything in this guide, plus I will check and make sure that you are creating the sounds correctly.

Sanskrit Guidance
from 50.00

Guidance on Pronunciation and Meaning of Sanskrit Mantras

Add To Cart
  1. There are, of course, exceptions in cases of compound words formed by various grammatical rules, which are sometimes spelled with an anusvāra (), and sometimes with the consonant the anusvāra is implying (e.g. śaṃkara and śaṅkara are both correct spellings, though śaṃkara is more correct).

  2. These consonants are called spṛṣṭa, meaning “touched”, because they are formed by complete contact of different parts of the mouth. They are also called “stops” because the contact interrupts the air flow.

  3. In phonetics, this built-in vowel is referred to as “schwa”. Modern descendants of saṃskṛta — such as Hindi, Urdu, Marathi, and Gujarati — have undergone “schwa syncope” or “schwa deletion”, meaning the inherent vowel is dropped. For example, even though spelled the same in both languages, गुरुदेव is pronounced “gurudeva” in saṃskṛta, but “gurudev” in Hindi. Whenever you see a saṃskṛta word without its ending “a” (like Ram instead of rāma, or Ganesh instead of gaṇeśa, this is due to the schwa syncope of modern Hindi, et al.

  4. For example, the anusvāra in śaṃkara (शंकर) is voiced as a velar resonant ( , “ng”) because it is followed by a velar consonant, k. Likewise, the anusvāra in saṃskṛta (संस्कृत) is voiced as a dental resonant (न n) because it is followed by a dental sibilant, s.

  5. Gérard Huet, creator of the fabulous Sanskrit Heritage Site, modified the Velthuis scheme to make it easier to type. This modified Velthuis gives us zafkaraacaarya, which somehow manages to be both better and worse.

  6. The English approximations are only a very rough guide, especially considering the wide variety of accents around the world. Be very conscious of the five mouth positions, and make sure you are using them to shape these vowel sounds.

  7. In phonetics (the study of sounds used in human communication), a fricative is different from a semivowel in that a fricative creates “turbulence”, and a semivowel doesn’t. For this reason, semivowels are also sometimes called glides.

  8. For a more thorough explanation of the diphthongs and the mouth positions used to create them, see Lesson 1.A.5 of A Practical Sanskrit Introductory, a self-taught course created by Charles Wikner, now available online for the first time on this site.