Date Tags corpus

Close but no cigar

In the longer run, I’m interested in understanding the distinction between DE beinahe vs. fast and other words and multiwords that have English translations such as ‘nearly’ and ‘almost’.

The list of items of interest includes at least:

  • annähernd
  • beinahe
  • fast
  • nahezu
  • praktisch
  • schier
  • so gut wie
  • um ein Haar
  • um Haaresbreite
  • ums Haar

fast vs. beinahe

This pair of adverbs seems to be the most vexing pair of words to distinguish. Of the two, fast is clearly much more frequent than beinahe but that doesn’t mean that beinahe has only idiosyncratic uses that are not open to some clear description.

Closeness to a transition

beinahe and fast like to occur with verbal descriptions of events that almost occurred in the past but then didn’t.

fast can also be used in the present tense with resultant states that are still possible to reach. With beinahe, this seems odd.

  • Ich bin fast in München ‘I am almost in Munich’
  • ?#Ich bin beinahe in München ‘I am nearly in Munich’
  • Ich bin fast fertig mit den Hausaufgaben. ‘I almost done with homework.’
  • ??Ich bin beinah fertig mit den Hausaufgaben. ‘I am nearly done with homework’

  • Das Glas ist schon fast voll. ‘The glass is almost full.’

  • ??Das Glas ist schon beinahe voll. ‘The glass is nearly full.’

beinahe in relation to events strongly prefers counterfactual contexts.

Closeness to category

fast can be used to express that an entity almost could be treated as a member of some category.

  • Du läufst ja fast! ‘You’re almost running’ (said to somebody who is walking so fast that it could be called running)

The availability of this reading with fast results in ambiguity in the perfect:

  • ich bin fast gelaufen ‘i almost ran’
    • I was close to running but didn’t.
    • I was in motion and you might have called it running.
  • ich bin beinahe gelaufen
    • I was close to running but didn’t.
    • ?? I was in motion and you might have called it running.

Nominal compounds

It seems that beinahe is much better and more common as the non-head of nominal compounds.

  • … seit meiner Beinahe-Begegnung mit einem fliegenden Betonobjekt … ‘Since my close encounter with a flying object made of concrete …’
  • ?? … seit meiner Fast-Begegnung mit einem fliegenden Betonobjekt …

Frequency

Looking just for the single words, we see that fast is usually more than 10 times as frequent in a corpus than beinahe.

fast beinahe
DEWAC 409882 33784
DEWIKI Cell 89799 7558

This difference might suggest that beinahe is a more specialised word than fast.

Fast and beinahe do not seem to differ much in their frequencies across the data sets.

Dewiki is a data set consisting of German language wikipedia pages. It contains 433,226,106 words. Dewac is a data set consisting of German language web pages. It contains 1,627,169,557 words.

beinahe has about 4.47 times the number of instances in Dewac as in Dewiki and for fast the ratio is 4.56. An analysis on corpora with better metadata on genres might reveal some interesting differences, though.

POS of following word in DEWAC

fast

78162 fast ADJD

74851 fast CARD

61189 fast ADV

45277 fast PIAT

34513 fast ART

32181 fast ADJA

23407 fast APPR

22919 fast PIS

18826 fast PIDAT

10120 fast VVPP

8167 fast $(

6531 fast KOKOM

6298 fast APPRART

5070 fast VVFIN

4796 fast NN

4060 fast PTKNEG

3808 fast VVINF

2086 fast VAFIN

2067 fast FM

1966 fast $,

1510 fast PTKVZ

1432 fast PTKA

1423 fast PPOSAT

1394 fast $.

1186 fast NE

1077 fast VMFIN

beinahe

7457 beinahe ADJD

3901 beinahe ADJA

3619 beinahe ART

3334 beinahe ADV

3258 beinahe APPR

2614 beinahe CARD

1930 beinahe VVPP

1432 beinahe PIDAT

1287 beinahe PIAT

999 beinahe APPRART

786 beinahe NN

771 beinahe PIS

757 beinahe VAFIN

718 beinahe KOKOM

680 beinahe VVFIN

516 beinahe VVINF

357 beinahe $(

306 beinahe $,

285 beinahe PPOSAT

271 beinahe PTKNEG

250 beinahe $.

One conspicuous difference is that CARDinal numbers are much more common after fast than after beinahe (just based on comparing the rank of CARD on the lists.)

POS of immediately following verbal forms in DEWAC:

fast

POS | count VVPP |10120

VVFIN|5070

VVINF |3808

VAFIN |2086

VMFIN |1077

VVIZU |116

VVIMP | 35

VAINF | 17

VAPP | 14

VAIMP | 9

beinahe

VVPP 1930

VAFIN 757

VVFIN 680

VVINF 516

VMFIN 51

VVIZU 25

VAPP 6

VVIMP 5

VAINF 2

The data above show that both adverbs mostly co-occur with past participial verb forms.

Lemma of immediately following verbal forms in DEWAC:

fast

verb count
vergessen 1318
haben 1102
verdoppeln 991
sein 956
sagen 852
scheinen 611
scheinen 611
können 520
mögen 435
erreichen 384
meinen 360
verdreifachen 297
verschwinden 282
abschließen 277
halbieren 258
schaffen 247
glauben 242
denken 215
verlieren 215
aufgeben 190
sterben 187
aussterben 179
ausrotten 178
schließen 164
ausschließen 152
verhungern 147
einschlafen 131
ersticken 130
umbringen 108
sehen 102
beenden 95
schenken 92
töten 92
erdrücken 91
ausverkaufen 90

beinahe

verb count
haben 421
sein 329
vergessen 195
sagen 81
verdoppeln 79
töten 61
sterben 56
erreichen 47
gelingen 47
umbringen 38
ausrotten 37
ertrinken 37
schaffen 35
scheinen 30
ersticken 28
platzen 28
überfahren 28
abschließen 27
zerstören 26
verlieren 25
scheitern 25
verschwinden 24
glauben 24
aussterben 23
übersehen 21
unterlaufen 21
können 21
verdreifachen 20
untergehen 20

frequency sentence-initially in DEWAC

fast 37369

beinahe 2508

frequency before negative quantifiers in DEWAC

Frequency of fast and kein before negative pronoun niemand:

fast 873 beinahe 21

Frequency of fast and kein before negative determiner / pronouns kein:

fast 4944 beinahe 132

The asymmetries are much larger than the 11:1 ration we would expect. This is probably the same phenomenon as the different affinity for occurring with CARD.

frequency before end of scale adjectives in DEWAC

voll

fast 213 beinahe 12

leer

fast 557 beinahe 24

fertig

fast 797 beinahe 37

so gut wie

  • Heiligung ist ein Wort , das aus dem allgemeinen Sprachgebrauch so gut wie verschwunden ist . ‘Sanctification is a word that is as a good as gone from the general vocabulary’
  • Die Farben Grau und Braun sind in der traditionellen Heraldik so gut wie unbekannt . ‘The colors grey and brown are almost unknown in traditional heraldry’
  • Die Behörden unternahmen so gut wie nichts , um anderen Wohnraum für die Rückkehrer zu finden . ‘The authorities are doing close to nothing to find accommodation for the returnees’
  • Er hat so gut wie gewonnen . ‘He has practically won’
  • Die Band ist {so gut wie am} Ende , berichtet die britische ‘ Sun ‘ . ‘The band is practically finished, the British Sun [newspaper] reported.
  • Das Auto war jedoch zum Zeitpunkt der Trennung noch {so gut wie neu}… ‘However, the care was practically new at the time of their separation…’

The four most frequent words following “so gut wie” all have to do with negation:

kein 4040 nie 1949 gar 1937 nichts 1928

However, when so gut wie is used to refer to a transition close at hand, then this can only be in a realis context, where the transition is “practically” real or impending.

  • Ich wäre fast zu spät gekommen. ‘I almost arrived late’
  • Ich wäre beinahe zu spät gekommen. ‘I nearly arrived late.’
  • ??Ich wäre so gut wie zu spät gekommen. ‘I practically arrived late’
  • Er hat so gut wie gewonnen. ‘He has practically won’

praktisch

Examples:

  • Der Riesenstern HD0107-5240 ist {praktisch} frei von schwereren Elementen. ‘The giant star HD0107-5240 is practically free of heavy elements’
  • Die leben ja {praktisch} immer anerob ( ohne O2 ) . ‘They practically always live anaerobically’
  • Selbst die Siedler , die bereit gewesen wären , zu gehen , konnten nicht und waren {praktisch in} ihren Siedlungen gefangen , .. ‘Even the settlers who would have been willing to leave were not able to [do so] and were practically trapped in their settlements, …’
  • Ich wäre praktisch zu spät gekommen ‘I practically would have come late’
  • Diese Insel ist eher hügelig als bergig , aber sehr felsig mit vielen klein en Weiden zwischen den Felsen , aber {praktisch ohne} Wälder . ‘This islannd is more hilly rather thanmountainous but very rocky with many little pastures between the rocks, but practically without forests’

praktisch can be replaced by fast in some cases such as the last example above.

It can be used in cases where one refers to closeness to category:

  • Er ist praktisch ein Deutscher. ‘He’s practically German’
  • Ich bin praktisch gelaufen. ‘I practically ran’

It can also be used in what seem to be closeness to transition contexts:

  • Das Glas ist praktisch voll. ‘The glass is practically full’

However, that doesn’t seem to be generally right: in counterfactual cases, it cannot be used.

  • Fast wäre ich gegangen. ‘I almost left’
  • ?#Praktisch wäre ich gegangen’
  • Sie hätten den Dieb fast erwischt. ‘They almost caught the thief’
  • ?#Sie hätten den Dieb praktisch erwischt. ‘They pracitcally caught the thief’
  • ?#Ich wäre praktisch nicht hingegangen. ‘I almost/nearly/practically didn’t go.’

nahezu

Frequency:

DEWAC 61909 DEWIKI 23518

Historically nahezu seems to come from a transparent combination “nahe zu” (close to). [In English “close to” also in fact, has a very similar semantics, it seems.]

One interesting observation, possibly owed to these origins, is that nahezu doesn’t like to occur with prepositions in their concrete spatial senses.

So while there are 2344 instances where nahezu is followed by a preposition, these are usually cases where the PP is metaphoric/idiomatic:

  • Dabei fällt die Körpertemperatur {nahezu auf} die Umgebungstemperatur ab , … ‘During this the body temperature drops to close to ambient temperature, …’

nahezu can very often be replaced by fast. However, the reverse is not true. The uses described above as “close to transition” could not be expressed using nahezu.

  • ?#Ich bin nahezu in München ‘I am close to in Munich’

annähernd

Frequency: DEWAC 16085

Like nahezu, annähernd does not occur in closeness to transition contexts.

  • ??Ich bin annähernd in München. ‘I am close to in Munich’
  • ?Das Projekt ist annähernd abgeschlossen. ‘The project is close to finished’

The two most frequent lemmas preceding annähernd are

nur ‘only’ 2874 nicht ‘not’ 2093

These are interesting. “Nur annähernd” in the main seems to be an negative polarity item (in which use it is frequently also accomponied by auch ‘too’):

  • Aber kein ander er ist auch {nur annähernd} so groß wie Herschel . ‘But no other one is even close to being as tall as Herschel.’
  • Die große Zahl der in der Praxis vorkommenden steuerlichen Probleme macht es unmöglich , diese {nur annähernd} komplett zu behandeln bzw. eingehend zu erörtern . ‘The great number of tax problems arising in real life makes it impossible to give anything like a complete treatment or discussion for any of them.’

The uses of “(auch) nicht annähernd” have the same idiomatic interpretation that English “not nearly” has: the difference under discussion is not simply “not close” but in fact large, i.e. “far from”.

  • Sicher ist das die eigene Traubenproduktion bislang {nicht annähernd} ausreicht , um auch nur den Bedarf des einheimischen Marktes zu befriedigen . ‘What’s certain is that their own grape production so far is not nearly sufficient to satisfy the demand of the domestic market’

This points up a difference to nahezu: that adverb does not seem to like negative contexts, and in fact may be a positive polarity item:

  • ?#Dabei fällt die Körpertemperatur nicht {nahezu auf} die Umgebungstemperatur ab , … ‘During this the body temperature doesn’t drop to close to ambient temperature,
  • ?# * Aber kein ander er ist auch {nur nahezu} so groß wie Herschel . ‘But no other one is even close to being as tall as Herschel.’

um ein Haar, um Haaresbreite, ums Haar

These items seem very similar, given that they all involve the notion of (a) hair. However, there seem to be some differences in use. I will only talk about um ein Haar and um Haaresbreite here because I don’t have enough data bearing on the third item “ums Haar”.

Frequencies in Dewac

um Haaresbreite 349 um ein Haar 602 ums Haar 35 (many of these are compositional)

um Haaresbreite” has a counterfactual semantics: something is very close to happening, and the item indicates metaphorically how close to happening the event was.

  • Der THW Kiel … verpasste den Titelgewinn bei der Handball-EM für Vereinsmannschaften im spanischen Ciudad Real um Haaresbreite. ‘THW Kiel missed out on the title at the European club championship in Ciudad Real by a hair.’

um Haaresbreite” occurs preferably with verbs like “verpassen, vorbeischrammen an” etc which incorpora negation . With these verbs, “um Haaresbreite” could not be replaced by “fast” or ” beinahe” with the reading unchanged. If we modified the above as follows:

  • Der THW Kiel … verpasste fast/beinahe den Titelgewinn bei der Handball-EM für Vereinsmannschaften im spanischen Ciudad Real.

we would now be saying that the missing out did not happen and the team did in fact win!

um ein Haar” does sometimes occur in the same contexts such as “um Haaresbreite”, i.e. with verbs that predicate failure/non-occurence. However, predominantly, it occurs in counterfactual structures modifying verbs denoting the events that did not occur:

  • … und der Schreiner Georg Elser hätte mit seinem Bombenanschlag auf Hitler 1939 der Geschichte um ein Haar eine andere Wendung gegeben . ‘… and the carpenter Georg Elser would almost have given another direction to history with his bomb attack on Hitler in 1939’